What is Data?#
“Data” is, ultimately, information. My favourite definition of information is, “a difference that makes a difference” (Bateson, 1972). If we measure something, and it’s always the same, then there is no information. Difference is the heart of information. But, differences can be meaningful in some way (signal), or meaningless, such as variation due to random factors (noise). Information are those differences that mean something — that make a difference. Meaning can be derived in many ways, such as through visualizations (graphing), statistical analysis, or making predictions about future data. These are all ways of identifying systematic patterns in data.
In neuroscience, psychology, and other fields, data comprises all the information you might collect in a study. The types of data collected in neuroscience research spans such a wide range that it would be impossible to document them concisely here. To give a sense of the breadth of what we might call “data”, we can first consider that data are typically collected from individual “units”, be they humans, non-human animals, cells, etc.. Each of these units typically contributes many data points, including things such as age, sex, handedness, genotype, cell type, early experience (e.g., monocular deprivation), responses to questionnaires, behavioral responses (e.g., reaction times, button or lever presses, maze running patterns or times), electrophysiological measurements, images, sound files, or movies of behavior. We call this organization of data within such units nested data; for instance, age is nested within each animal, and animals in turn may be nested within groups (e.g., different genetic lines).
For the purposes of this class we assume that the data are collected in some way (measured), and stored digitally in files — typically as numbers or characters (words, sentences, etc.; remembering that even digital media like movies or sounds are, ultimately, files composed of numbers). Data should be, however, more than just a bunch of numbers. Ultimately, data science is about identifying meaningful patterns in data. This informs our definition of “data” — data are sets of measurements from which we aim to extract meaning. As such, we assume that the data have been collected in such a way that meaning can be derived from them.