Basic Statistics in Python: t tests with SciPy#
implement paired, unpaired, and 1-sample t tests using the SciPy package
An introductory course in statistics is a prerequisite for this class, so we assume you remember (some of) the basics including t-tests, ANOVAs, and regression (or at least correlation).
Here we will demonstrate how to perform t tests in Python. Future lessons will cover ANOVA and regression.
The t test#
A t test is used to compare the means of two sets of data. For example, in the flanker experiment we used in the previous section, we could compare the mean RTs for the congruent and incongruent conditions. t tests consider the size of the difference between the means of the two data sets, relative to the variance in each one. The less the distributions of values in the two data sets overlap, the larger the t value will tend to be. We can then estimate the probability that the observed difference occurred simply by chance, rather than due to a true difference — this is the p value. Typically, researchers use a p < .05 threshold to determine statistical significance.
t tests are implemented in the SciPy library, which “provides many user-friendly and efficient numerical routines, such as routines for numerical integration, interpolation, optimization, linear algebra, and statistics.” Each of those type of routines is in a separate sub-module of SciPy; the one we’ll want is
scipy.stats. We can import this specific module with the command:
from scipy import stats
We’ll also import some other packages we’ll need, and the flanker data from the previous lesson to work with:
import pandas as pd import numpy as np import seaborn as sns import matplotlib.pyplot as plt import glob # Import the data df = pd.read_csv('data/flanker_rt_data.csv') ## Aggregate the data across participants df_avg = pd.DataFrame(df.groupby(['participant', 'flankers'])['rt'].mean()).reset_index()
Let’s start by comparing the mean RTs for the congruent and incongruent flanker conditions.
Recall that we are working with repeated-measures data – for each participant, we have 160 trials across 4 conditions. t tests are not meant for within-condition repeated measures data — we need only one measurement per participant in each condition. This is for essentially the same reason discussed at the end of the previous section on repeated measures data: if we treat the within-participant variability the same as the between-participant variability, then we will tend to grossly under-estimate the true (between-participant) variance. When running a t test, this would result in erroneously large t values that could often falsely suggest a statistically significant result. So, we need to use the aggregated data,
The other important characteristic of our data are that, even though aggregation has reduced the data to one measurement per participant, we still have repeated measures, across the two conditions. The default assumption of a t test is that each of the two data sets being compared come from different samples of the population (often called a between-subjects design in Psychology). This means that t tests assume there is no relationship between any particular measurement in each of the two data sets being compared. When we have measurements from the same people in both data sets (a within-subjects. design), we need to account for this, or the t test will again suggest an inflated (incorrect) value. We account for this by using a paired t test. In SciPy, this is the function
ttest_rel(). (For a between-subjects — or independent groups design, which we will not cover here, you would use
Select the data#
ttest_rel() is as simple as giving it the two sets of data you want to compare, as arguments. We can pull these directly from our
df_avg pandas DataFrame. We’ll do this in a few lines of code below, first assigning each data set to a new variable, and then running the t test.
congr = df_avg[df_avg['flankers'] == 'congruent']['rt'] incongr = df_avg[df_avg['flankers'] == 'incongruent']['rt']
Let’s make sure you understand the code above before we go on. We’ve seen it before, but maybe not in exactly this form — and it is quite complex, but logical.
We start on the first line by selecting only the rows of the DataFrame associated with congruent trials, which returns a Boolean mask:
df_avg['flankers'] == 'congruent'
0 True 1 False 2 False 3 True 4 False ... 76 False 77 False 78 True 79 False 80 False Name: flankers, Length: 81, dtype: bool
We embed this inside another selector
df_avg[df_avg['flankers'] == 'congruent'], which applies the Boolean mask to the DataFrame, essentially saying, “select from
df_avg all the columns associated with congruent trials”.
df_avg[df_avg['flankers'] == 'congruent']
Finally, we add
['rt'] to the end to indicate that, having selected the incongruent rows, we actually only want the column with the RT values, because those are what we want to perform the t test on. The second line does the same thing for incongruent trials.
df_avg[df_avg['flankers'] == 'congruent']['rt']
0 0.455259 3 0.471231 6 0.417540 9 0.429758 12 0.419096 15 0.437178 18 0.548638 21 0.433748 24 0.437577 27 0.488892 30 0.539020 33 0.438167 36 0.462935 39 0.417553 42 0.410191 45 0.549622 48 0.568396 51 0.450102 54 0.528508 57 0.439243 60 0.570766 63 0.401993 66 0.462927 69 0.446840 72 0.628185 75 0.428642 78 0.431829 Name: rt, dtype: float64
This last result is what we assign to
congr (note, by the way, that this is a pandas Series, not a DataFrame).
incongr is a Series of the same length (the number of participants):
1 0.471838 4 0.499031 7 0.473012 10 0.506722 13 0.478367 16 0.453524 19 0.591644 22 0.492921 25 0.504452 28 0.527152 31 0.591181 34 0.518216 37 0.507257 40 0.507033 43 0.474612 46 0.554172 49 0.595977 52 0.513179 55 0.565531 58 0.501069 61 0.591022 64 0.428867 67 0.530722 70 0.490298 73 0.650769 76 0.494878 79 0.437926 Name: rt, dtype: float64
Run the t test#
Now we just pass
incongr as the first (and only) two arguments to
ttest_rel(), and print the results out with some explanatory text. Note that we have to write
stats.ttest_rel(), because we imported the library as
t, p = stats.ttest_rel(congr, incongr) print('Congruent vs. Incongruent t = ', str(t), ' p = ', str(p))
Congruent vs. Incongruent t = -10.209634805365013 p = 1.3739296579820675e-10
We can make the output nicer by rounding to a reasonable level of precision:
print('Congruent vs. Incongruent t = ', str(round(t, 2)), ' p = ', str(round(p, 4)))
Congruent vs. Incongruent t = -10.21 p = 0.0
Now that’s a results any researcher would be happy to see! The p value is not actually zero by the way, but note in the original output the p value was reported in scientific notation, ending in
e-10. This means that the p value is actually 0.00000000013739. We would typically report this as p < .0001, since we rounded to 4 decimal places (which is fairly typical for reporting p values).
1-tailed vs. 2-tailed p values#
By default, SciPy’s
ttest_ functions return 2-tailed p values. This means that the p value considers both possible directions of difference between the two conditions. In the present example, that means either RTs for congruent are faster than incongruent, or they are slower for congruent than incongruent. In contrast, a 1-tailed p value should be used if we have a specific prediction of a “direction” of the difference. Using a 1-tailed p value will tend to be less conservative, i.e., more likely to find a significant effect. This is because, for a given p threshold (e.g., \(\alpha = .05\)), a 2-tailed test effectively splits the p in half, and reflects a probability of 2.5% that the result occurred by chance in one direction (e.g., congruent slower) and a 2.5% probability of getting the revser result (e.g., congruent faster) by chance. In contrast, a 1-tailed test allocates all of the 5% chance probability to the likelihood of a difference in one direction (e.g., congruent faster).
Practically speaking, 2-tailed tests should be used by default, but if you have a specific a priori hypothesis regarding the direction of the difference, you can use a 1-tailed test. For example, for the flanker experiment we’re working with here, previous research would lead us to the congruent-faster hypothesis.
In the present example, it really doesn’t matter since the two-tailed p value is wildly significant. However, if you want to convert to one-tailed p values, you just need to divide p in half:
print('Congruent vs. Incongruent t = ', str(round(t, 2)), ' p (one-tailed) = ', str(p / 2))
Congruent vs. Incongruent t = -10.21 p (one-tailed) = 6.869648289910338e-11
Be careful about order of data values#
The above paired t test worked properly because in our pandas DataFrame, participants are listed in a consistent order. So when we create separate Series for congruent and incongruent, the same rows of the two Series belong to the same participant. However, this isn’t always guaranteed, and so it’s good practice to do things in a way that ensures proper pairing of participants between data sets.
pandas indexing allows us to do this. Recall that indexes are row labels. By default, when we read a CSV file to a DataFrame, the rows are indexed numerically starting from zero. Indeed, if you look back above at the contents of
incongr, you’ll see the indexes in the left column are discontinuous and different between the two, because each data point came from a separate row. To ensure alignment of each participant’s data across the two series, we can first use the participant ID as the index of
df_avg, and then create separate Series for each condition:
df_avg = df_avg.set_index('participant') congr = df_avg[df_avg['flankers'] == 'congruent']['rt'] incongr = df_avg[df_avg['flankers'] == 'incongruent']['rt']
Now when we look at the resulting Series, we see that the participant indexes are preserved:
participant s1 0.455259 s10 0.471231 s11 0.417540 s12 0.429758 s13 0.419096 s14 0.437178 s15 0.548638 s16 0.433748 s17 0.437577 s18 0.488892 s19 0.539020 s2 0.438167 s20 0.462935 s21 0.417553 s22 0.410191 s23 0.549622 s24 0.568396 s25 0.450102 s26 0.528508 s27 0.439243 s3 0.570766 s4 0.401993 s5 0.462927 s6 0.446840 s7 0.628185 s8 0.428642 s9 0.431829 Name: rt, dtype: float64
participant s1 0.471838 s10 0.499031 s11 0.473012 s12 0.506722 s13 0.478367 s14 0.453524 s15 0.591644 s16 0.492921 s17 0.504452 s18 0.527152 s19 0.591181 s2 0.518216 s20 0.507257 s21 0.507033 s22 0.474612 s23 0.554172 s24 0.595977 s25 0.513179 s26 0.565531 s27 0.501069 s3 0.591022 s4 0.428867 s5 0.530722 s6 0.490298 s7 0.650769 s8 0.494878 s9 0.437926 Name: rt, dtype: float64
Ensure pandas indexing is used in t tests#
What could go wrong?#
It turns out that SciPy’s
ttest functions ignore pandas indexes, so indexing on its own won’t ensure that the t test compares data points from the same individuals. We can see that by randomizing the order of the rows of the
incongr2 series, while preserving the relationship between indexes (participant IDs) and RTs (you can compare with above data to confirm that the same RT values are associated with the same IDs as in the original
df_avg = df_avg.reset_index() inc_arr = np.array(df_avg[df_avg['flankers']=='incongruent'].iloc[:, [0, 2]]) np.random.shuffle(inc_arr) incongr2 = pd.DataFrame(inc_arr, columns=['participant', 'rt']).set_index('participant') incongr2 = pd.Series(incongr2['rt']) incongr2
participant s5 0.530722 s6 0.490298 s9 0.437926 s19 0.591181 s2 0.518216 s11 0.473012 s21 0.507033 s10 0.499031 s4 0.428867 s14 0.453524 s23 0.554172 s20 0.507257 s25 0.513179 s27 0.501069 s24 0.595977 s3 0.591022 s22 0.474612 s8 0.494878 s12 0.506722 s17 0.504452 s15 0.591644 s18 0.527152 s26 0.565531 s7 0.650769 s1 0.471838 s13 0.478367 s16 0.492921 Name: rt, dtype: object
Now when we run the t test, the t value doesn’t match the t value that we got above with the properly-paired data, and in fact if you run the code below multiple times, you will get diferent t and p values each time due to the random shuffling.
t, p = stats.ttest_rel(congr, incongr2) print('Congruent vs. Incongruent t = ', str(round(t, 2)), ' p = ', str(round(p, 4)))
Congruent vs. Incongruent t = -3.04 p = 0.0053
Solution 1: Use DataFrame columns rather than extracting Series#
Above, we extracted the two data sets we wanted to compare with a t test from a DataFrame (df) to two pandas Series,
incongr. On the one hand, this simplifies the syntax of the t test command, but on the other hand we lose the structure of the pandas DataFrame. That is, in the DataFrame, the values from each participant are in the same row, and so we don’t have to worry about the order of the data values. We can run the t test on two columns of the pandas DataFrame, the code is just a little more complex to look at:
t, p = stats.ttest_rel(df_avg[df_avg['flankers'] == 'congruent']['rt'], df_avg[df_avg['flankers'] == 'incongruent']['rt']) print('Congruent vs. Incongruent t = ', str(round(t, 2)), ' p = ', str(round(p, 4)))
Congruent vs. Incongruent t = -10.21 p = 0.0
This is probably the best approach to use in most cases, because:
It ensures that the repeated-measures structure of the data is preserved
It uses less memory resources, because we aren’t copying columns of our DataFrame to new Series/vairables.
It may in fact seem overly convoluted to have first demonstrated the extract-to-Series approach, then explain that it’s not the ideal way to do things! However, for many people, it’s intuitive to extract subsets of data to perform further processing on. One point of this lesson was to illustrate how that can create problems, even though it might seem like a logical approach.
Solution 2: Use
.sort_index() to ensure paired data are aligned#
If you do choose to work with a pair of Series, the way we can ensure that the indexes of the two data sets align this is by re-ordering the data in both Series that we’re comparing (
incongr2 in this case), using pandas
t, p = stats.ttest_rel(congr.sort_index(), incongr2.sort_index()) print('Congruent vs. Incongruent t = ', str(round(t, 2)), ' p = ', str(round(p, 4)))
Congruent vs. Incongruent t = -10.21 p = 0.0
Long story short, it is good practice to index by participant ID, and use the
.sort_index() method when applying t tests to pandas Series or DataFrames, to ensure that values are appropriately paired.
Testing differences: one-sample t tests#
An alternative way to compare the congruent and incongruent conditions is to compute the difference in mean RTs between the two conditions for each participant (since it is a paired design), and then run a t test on the differences. In this case, we use a one sample t test, in which we compare the data set to zero. In other words, is the difference between the conditions basically zero, or is it significantly different from zero (i.e., a believable difference)?
We can compute the difference between two pandas Series easily just using the
- (minus) operator, so in this case we could use
congr - incongr
Note that this subtraction only works if the two Series are indexed by participant ID (or in some way that preserves the alignment of values between the two data sets). However, because we are subtracting two pandas objects, pandas recognizes the indexes in each and aligns them, even if the indexes aren’t in the same order in the two input Series. So we don’t have to worry about using
.sort_index() as we did above for paired t tests.
congr_vs_incongr = congr - incongr t, p = stats.ttest_1samp(congr_vs_incongr, 0) print('Congruent vs. Incongruent t = ', str(round(t, 2)), ' p = ', str(round(p, 4)))
Congruent vs. Incongruent t = -10.21 p = 0.0
Note that we get the same result from the 1 sample t test if we perform the subtraction on the two Series that have the same order of indexes, as when we perform the subtraction using
incongr2, which has a randomly shuffled order of indexes. We don’t need to explicitly
.sort_index() in this case:
congr_vs_incongr = congr - incongr2 t, p = stats.ttest_1samp(congr_vs_incongr, 0) print('Congruent vs. Incongruent t = ', str(round(t, 2)), ' p = ', str(round(p, 4)))
--------------------------------------------------------------------------- ValueError Traceback (most recent call last) Cell In, line 2 1 congr_vs_incongr = congr - incongr2 ----> 2 t, p = stats.ttest_1samp(congr_vs_incongr, 0) 3 print('Congruent vs. Incongruent t = ', str(round(t, 2)), ' p = ', str(round(p, 4))) File ~/mambaforge/envs/ncil/lib/python3.10/site-packages/scipy/stats/_axis_nan_policy.py:523, in _axis_nan_policy_factory.<locals>.axis_nan_policy_decorator.<locals>.axis_nan_policy_wrapper(***failed resolving arguments***) 521 if sentinel: 522 samples = _remove_sentinel(samples, paired, sentinel) --> 523 res = hypotest_fun_out(*samples, **kwds) 524 res = result_to_tuple(res) 525 res = _add_reduced_axes(res, reduced_axes, keepdims) File ~/mambaforge/envs/ncil/lib/python3.10/site-packages/scipy/stats/_stats_py.py:6931, in ttest_1samp(a, popmean, axis, nan_policy, alternative) 6929 raise ValueError("`popmean.shape[axis]` must equal 1.") from e 6930 d = mean - popmean -> 6931 v = _var(a, axis, ddof=1) 6932 denom = np.sqrt(v / n) 6934 with np.errstate(divide='ignore', invalid='ignore'): File ~/mambaforge/envs/ncil/lib/python3.10/site-packages/scipy/stats/_stats_py.py:1100, in _var(x, axis, ddof, mean) 1098 def _var(x, axis=0, ddof=0, mean=None): 1099 # Calculate variance of sample, warning if precision is lost -> 1100 var = _moment(x, 2, axis, mean=mean) 1101 if ddof != 0: 1102 n = x.shape[axis] if axis is not None else x.size File ~/mambaforge/envs/ncil/lib/python3.10/site-packages/scipy/stats/_stats_py.py:1072, in _moment(a, moment, axis, mean) 1068 mean = (a.mean(axis, keepdims=True) if mean is None 1069 else dtype(mean)) 1070 a_zero_mean = a - mean -> 1072 eps = np.finfo(a_zero_mean.dtype).resolution * 10 1073 with np.errstate(divide='ignore', invalid='ignore'): 1074 rel_diff = np.max(np.abs(a_zero_mean), axis=axis, 1075 keepdims=True) / np.abs(mean) File ~/mambaforge/envs/ncil/lib/python3.10/site-packages/numpy/core/getlimits.py:492, in finfo.__new__(cls, dtype) 490 dtype = newdtype 491 if not issubclass(dtype, numeric.inexact): --> 492 raise ValueError("data type %r not inexact" % (dtype)) 493 obj = cls._finfo_cache.get(dtype, None) 494 if obj is not None: ValueError: data type <class 'numpy.object_'> not inexact
Paired vs. 1-sample t tests?#
You’ll note the result of the 1-sample t test is the same as the paired t test above. This is expected, because in both cases we ran a t test to compare the difference between the same two sets of data. From a coding perspective, the paired t test is a bit simpler, because you don’t have to perform a subtraction on the data prior to running the t test.
The reasons we might want to run a 1-sample t test include cases where are data are already represented as a subtraction, or in some cases when we’re working with multiple variables, performing subtractions can be a way of simplifying our presentation of the results. As well, since pandas subtraction respects the indexes, computing differences and then 1-sample t tests can be a bit safer in ensuring that the proper within-participants nesting structure of your data is preserved.
t tests are used to compare the means of two sets of data to each other, or the mean of one set of data against a particular value (such as zero)
An unpaired t test is used to compare two independent sets of data (e.g., from two different samples of a population, two groups, etc.)
A paired t test must be used when the two sets of data come from the same samples (e.g., the same individual participants)
A 1-sample t test is used to compare the mean of one set of data against a specific value. This is often used to compare a data set to zero
Paired t tests and 1-sample t tests can both be used to determine whether differences between two samples are significantly different from zero (no difference).
In the 1-sample case, you must first compute the difference between the pairs of data in two conditions.
When working with pandas data objects, it is important to remember that SciPy’s functions (including
ttests) do not use pandas indexes. So when doing paired t tests, you must ensure that the data are listed in the same order in the two Series being compared.
The best way to ensure that the within-participant/repeated measures structure of the data is preserved when doing a t test, is to use two columns from a DataFrame that is indexed by participant ID.
One alternative is to use the
.sort_index()method on two series that are indexed by participant ID
Another alternative is to use the fact that pandas does respect its indexing when you subtract two Series, so if your data are indexed by participant ID, doing the subtraction followed by a 1-sample t test is a way of ensuring that the within-participants relationships between data sets are preserved.