Thanks to the wonders of social media, while I was out grocery shopping I received several interesting and useful responses to my previous post on the relationship between multivariate pattern analysis and simulation theory. Rather than try and fit my responses into 140 characters, I figured i’d take a bit more space here to hash them out. I think the idea is really enhanced by these responses, which point to several findings and features of which I was not aware. The short answer seems to be, no MVPA does not invalidate simulation theory (ST) and may even provide evidence for it in the realm of motor intentions, but that we might be able to point towards a better standard of evidence for more exploratory applications of ST (e.g. empathy-for-pain). An important point to come out of these responses as one might expect, is that the interpretation of these methodologies is not always straightforward.
I’ll start with Antonia Hamilton’s question, as it points to a bit of literature that speaks directly to the issue:
Antonia is referring to this paper by Oosterhof and colleagues, where they directly compare passive viewing and active performance of the same paradigm using decoding techniques. I don’t read nearly as much social cognition literature as I used to, and wasn’t previously aware of this paper. It’s really a fascinating project and I suggest anyone interested in this issue read it at once (it’s open access, yay!). In the introduction the authors point out that spatial overlap alone cannot demonstrate equivalent mechanisms for viewing and performing the same action:
Numerous functional neuroimaging studies have identified brain regions that are active during both the observation and the execution of actions (e.g., Etzel et al. 2008; Iacoboni et al. 1999). Although these studies show spatial overlap of frontal and parietal activations elicited by action observation and execution, they do not demonstrate representational overlap between visual and motor action representations. That is, spatially overlapping activations could reflect different neural populations in the same broad brain regions (Gazzola and Keysers 2009; Morrison and Downing 2007; Peelen and Downing 2007b). Spatial overlap of activations per se cannot establish whether the patterns of neural response are similar for a given action (whether it is seen or performed) but different for different actions, an essential property of the “mirror system” hypothesis.”
They then go on to explain that while MVPA could conceivably demonstrate a simulation-like mechanism (i.e. a common neural representation for viewing/doing), several previous papers attempting to show just that failed to do so. The authors suggest that this may be due to a variety of methodological limitations, which they set out to correct for in their JNPhys publication. Oosterhof et al show that clusters of voxels located primarily in the intraparietal and superior temporal sulci encode cross-modal information, that is code similar information both when viewing and doing:
Essentially Oosterhof et al trained their classifier on one modality (see or do) , tested the classifier on the opposite modality in another session, and then repeated this procedure for all possible combinations of session and modality (while appropriately correcting for multiple comparisons). The map above represents the combined classification accuracy from both train-test combinations; interestingly in the supplementary info they show that the maps do slightly differ depend on what was trained:
Oosterhof and colleagues also investigate the specificity of information for particular gestures in a second experiment, but for our purposes lets focus on just the first. My first thought is that this does actually provide some evidence for a simulation theory of understanding motor intentions. Clearly there is enough information in each modality to accurately decode the opposite modality: there are populations of neurons encoding similar information both for action execution and perception. Realistically I think this has to be the minimal burden of proof needed to consider an imaging finding to be evidence for simulation theory. So the results of Oosterhof et al do provide supporting evidence for simulation theory in the domain of motor intentions.
Nonetheless, the results also strengthen the argument that more exploratory extentions of ST (like empathy-for-pain) must be held to a similar burden of proof before generalization in these domains is supported. Simply showing spatial overlap is not evidence of simulation, as Oosterhof themselves argue. I think it is interesting to note the slight spatial divergence between the two train-test maps (see on do, do on see). While we can obviously identify voxels encoding cross-modality information, it is interesting that those voxels do not subsume the entirety of whatever neural computation relates these two modalities; each has something unique to predict in the other. I don’t think that observation invalidates simulation theory, but it might suggest an interesting mechanism not specified in the ‘vanilla’ flavor of ST. To be extra boring, it would be really nice to see an independent replication of this finding, since as Oosterhof themselves point out, the evidence for cross-modal information is inconsistent across studies. Even though the classifier performs well above chance in this study, it is also worth noting that the majority of surviving voxels in their study show somewhere around 40-50% classification accuracy, not exactly gangbusters. It would be interesting to see if they could identify voxels within these regions that selectively encode only viewing or performing; this might be evidence for a hybrid-theory account of motor intentions.
Leonhard’s question is an interesting one that I don’t have a ready response for. As I understand it, the idea is that demonstrating no difference of patterns between a self and other-related condition (e.g. performing an action vs watching someone else do it) might actually be an argument for simulation, since this could be caused by that region using isomorphic computations for both conditions. This an interesting point – i’m not sure what the status of null findings is in the decoding literature, but this merits further thought.
The next two came from James Kilner and Tal Yarkoni. I’ve put them together as I think they fall under a more methodological class of questions/comments and I don’t feel quite experienced enough to answer them- but i’d love to hear from someone with more experience in multivariate/multivoxel techniques:
James Kilner asks about the performance of MVPA in the case that the pattern might be spatially overlapping but not identical for two conditions. This is an interesting question and i’m not sure I know the correct answer; my intuition is that you could accurately discriminate both conditions using the same voxels and that this would be strong evidence against a simple simulation theory account (spatial overlap but representational heterogeneity).
Here is more precise answer to James’ question from Sam Schwarzkopf, posted in the comments of the original post:
2. The multivariate aspect obviously adds sensitivity by looking at pattern information, or generally any information of more than one variable (e.g. voxels in a region). As such it is more sensitive to the information content in a region than just looking at the average response from that region. Such an approach can reveal that region A contains some diagnostic information about an experimental variable while region B does not, even though they both show the same mean activation. This is certainly useful knowledge that can help us advance our understanding of the brain – but in the end it is still only one small piece in the puzzle. And as both Tal and James pointed out (in their own ways) and as you discussed as well, you can’t really tell what the diagnostic information actually represents.
Conversely, you can’t be sure that just because MVPA does not pick up diagnostic information from a region that it therefore doesn’t contain any information about the variable of interest. MVPA can only work as long as there is a pattern of information within the features you used.
This last point is most relevant to James’ comment. Say you are using voxels as features to decode some experimental variable. If all the neurons with different tuning characteristics in an area are completely intermingled (like orientation-preference in mouse visual cortex for instance) you should not really see any decoding – even if the neurons in that area are demonstrably selective to the experimental variable.
In general it is clear that the interpretation of decoded patterns is not straightforward- it isn’t clear precisely what information they reflect, and it seems like if a region contained a totally heterogeneous population of neurons you wouldn’t pick up any decoding at all. With respect to ST, I don’t know if this completely invalidates our ability to test predictions- I don’t think one would expect such radical heterogeneity in a region like STS, but rather a few sub-populations responding selectively to self and other, which MVPA might be able to reveal. It’s an important point to consider though.
Tal’s point is an important one regarding the different sources of information that GLM and MVPA techniques pick up. The paper he refers to by Jimura and Poldrack set out to investigate exactly this by comparing the spatial conjunction and divergent sensitivity of each method. Importantly they subtracted the mean of each beta-coefficient from the multivariate analysis to insure that the analysis contained only information not in the GLM:
As you can see in the above, Jimura and Poldrack show that MVPA picks up a large number of voxels not found in the GLM analysis. Their interpretation is that the GLM is designed to pick up regions responding globally or in most cases to stimulation, whereas MVPA likely picks up globally distributed responses that show variance in their response. This is a bit like the difference between functional integration and localization; both are complementary to the understanding of some cognitive function. I take Tal’s point to be that the MVPA and GLM are sensitive to different sources of information and that this blurs the ability of the technique to evaluate simulation theory- you might observe differences between the two that would resemble evidence against ST (different information in different areas) when in reality you would be modelling altogether different aspects of the cognition. edit: after more discussion with Tal on Twitter, it’s clear that he meant to point out the ambiguity inherent in interpreting the predictive power of MVPA; by nature these analyses will pick up a lot of confounding a causal noise- arousal, reaction time, respiration, etc, which would be excluded in a GLM analysis. So these are not necessarily or even likely to be “direct read-outs” of representations, particularly to the extent that such confounds correlate with the task. See this helpful post by neuroskeptic for an overview of one recent paper examining this issue. See here for a study investigating the complex neurovascular origins of MVPA for fMRI.
Thanks sincerely for these responses, as it’s been really interesting and instructive for me to go through these papers and think about their implications. I’m still new to these techniques and it is exciting to gain a deeper appreciation of the subtleties involved in their interpretation. On that note, I must direct you to check out Sam Schwarzkopf’s excellent reply to my original post. Sam points out some common misunderstandings (of which I am perhaps guilty of several) regarding the interpretation of MVPA/decoding versus GLM techniques, arguing essentially that they pick up much of the same information and can both be considered ‘decoding’ in some sense, further muddying their ability to resolves debates like that surrounding simulation theory.