Where Does Neural Data Come From?#


In disciplines such as psychology and neuroscience, data are most typically collected in experiments, in which variables are systematically manipulated by the researcher. Experiments, by definition, pre-suppose particular meaning in the data — the experimental designs, and types of measurements, are performed specifically with particular patterns of results in mind (hypotheses), which are in turn derived from theories. These theories provide ways of interpreting - providing meaning — to the data.


However, lots of data doesn’t come from experiments at all. For example, surveys simply ask lots of questions, and the data analysis usually looks for relationships between answers to different questions (correlations). Even in the context of an experiment, a number of measurements may be taken that aren’t expected to be directly affected by the experimental manipulations, but might help explain differences between individuals. For example, in psycholinguistics, a number of studies have shown differences in how people interpret ambiguous sentences, depending on their working memory capacity. In these studies, participants were not selected based on their working memory capacity; rather, this capacity was measured in each individual, which the experimental manipulation was the ambiguity of the sentences.

In some cases, data collection has a strong, or even exclusively, exploratory component: many measures are obtained without specific hypotheses concerning how they will affect other measurements, but with the more general hypothesis that some systematic and meaningful relationships can be identified amongst the measures taken. Indeed, such approaches are central when we move from a mindset of statistical analysis if experimental data to machine learning, classification, and prediction. For example, if we want to build a model that predicts whether a person will develop a particular disease or not, or how an already-diagnosed disease will progress, we will likely want to measure a wide range of variables that might be predictive, so that we can identify the optimal combination of variables that leads to the most accurate predictions.