Basic Statistics in Python: t tests with SciPy#

Learning Objectives#

  • implement paired, unpaired, and 1-sample t tests using the SciPy package


An introductory course in statistics is a prerequisite for this class, so we assume you remember (some of) the basics including t-tests, ANOVAs, and regression (or at least correlation).

Here we will demonstrate how to perform t tests in Python. Future lessons will cover ANOVA and regression.

The t test#

A t test is used to compare the means of two sets of data. For example, in the flanker experiment we used in the previous section, we could compare the mean RTs for the congruent and incongruent conditions. t tests consider the size of the difference between the means of the two data sets, relative to the variance in each one. The less the distributions of values in the two data sets overlap, the larger the t value will tend to be. We can then estimate the probability that the observed difference occurred simply by chance, rather than due to a true difference — this is the p value. Typically, researchers use a p < .05 threshold to determine statistical significance.

t tests are implemented in the SciPy library, which “provides many user-friendly and efficient numerical routines, such as routines for numerical integration, interpolation, optimization, linear algebra, and statistics.” Each of those type of routines is in a separate sub-module of SciPy; the one we’ll want is scipy.stats. We can import this specific module with the command:

from scipy import stats

We’ll also import some other packages we’ll need, and the flanker data from the previous lesson to work with:

import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import glob

# Import the data
df = pd.read_csv('data/flanker_rt_data.csv')

## Aggregate the data across participants
df_avg = pd.DataFrame(df.groupby(['participant', 'flankers'])['rt'].mean()).reset_index()

Paired t-test#

Let’s start by comparing the mean RTs for the congruent and incongruent flanker conditions.

Recall that we are working with repeated-measures data – for each participant, we have 160 trials across 4 conditions. t tests are not meant for within-condition repeated measures data — we need only one measurement per participant in each condition. This is for essentially the same reason discussed at the end of the previous section on repeated measures data: if we treat the within-participant variability the same as the between-participant variability, then we will tend to grossly under-estimate the true (between-participant) variance. When running a t test, this would result in erroneously large t values that could often falsely suggest a statistically significant result. So, we need to use the aggregated data, df_avg.

The other important characteristic of our data are that, even though aggregation has reduced the data to one measurement per participant, we still have repeated measures, across the two conditions. The default assumption of a t test is that each of the two data sets being compared come from different samples of the population (often called a between-subjects design in Psychology). This means that t tests assume there is no relationship between any particular measurement in each of the two data sets being compared. When we have measurements from the same people in both data sets (a within-subjects. design), we need to account for this, or the t test will again suggest an inflated (incorrect) value. We account for this by using a paired t test. In SciPy, this is the function ttest_rel(). (For a between-subjects — or independent groups design, which we will not cover here, you would use ttest_ind()).

Select the data#

Running ttest_rel() is as simple as giving it the two sets of data you want to compare, as arguments. We can pull these directly from our df_avg pandas DataFrame. We’ll do this in a few lines of code below, first assigning each data set to a new variable, and then running the t test.

congr = df_avg[df_avg['flankers'] == 'congruent']['rt']
incongr = df_avg[df_avg['flankers'] == 'incongruent']['rt']

Let’s make sure you understand the code above before we go on. We’ve seen it before, but maybe not in exactly this form — and it is quite complex, but logical.

We start on the first line by selecting only the rows of the DataFrame associated with congruent trials, which returns a Boolean mask:

df_avg['flankers'] == 'congruent'
0      True
1     False
2     False
3      True
4     False
76    False
77    False
78     True
79    False
80    False
Name: flankers, Length: 81, dtype: bool

We embed this inside another selector df_avg[df_avg['flankers'] == 'congruent'], which applies the Boolean mask to the DataFrame, essentially saying, “select from df_avg all the columns associated with congruent trials”.

df_avg[df_avg['flankers'] == 'congruent'].head()
participant flankers rt
0 s1 congruent 0.455259
3 s10 congruent 0.471231
6 s11 congruent 0.417540
9 s12 congruent 0.429758
12 s13 congruent 0.419096

Finally, we add ['rt'] to the end to indicate that, having selected the incongruent rows, we actually only want the column with the RT values, because those are what we want to perform the t test on. The second line does the same thing for incongruent trials.

df_avg[df_avg['flankers'] == 'congruent']['rt'].head()
0     0.455259
3     0.471231
6     0.417540
9     0.429758
12    0.419096
Name: rt, dtype: float64

This last result is what we assign to congr (note, by the way, that this is a pandas Series, not a DataFrame).

Run the t test#

Now we just pass congr and incongr as the first (and only) two arguments to ttest_rel(), and print the results out with some explanatory text. Note that we have to write stats.ttest_rel(), because we imported the library as stats and ttest_rel() is a function within stats.

If you were to look at the API for ttest_rel, you’d see that the function returns two values: the first is the t value, and the second is the p value. Therefore, on the left side of the assignment operator, we need to list two variables for outputs, which we call t and p

t, p = stats.ttest_rel(congr, incongr)
# Print the results with explanatory text
print('Congruent vs. Incongruent t = ', str(t), ' p = ', str(p))
Congruent vs. Incongruent t =  -10.209634805365013  p =  1.3739296579820675e-10

We can make the output nicer by rounding to a reasonable level of precision:

print('Congruent vs. Incongruent t = ', str(round(t, 2)), ' p = ', str(round(p, 4)))
Congruent vs. Incongruent t =  -10.21  p =  0.0

Now that’s a results any researcher would be happy to see! The p value is not actually zero by the way, but note in the original output the p value was reported in scientific notation, ending in e-10. This means that the p value is actually 0.00000000013739. We would typically report this as p < .0001, since we rounded to 4 decimal places (which is fairly typical for reporting p values), but Python simply rounds the value as we asked.

1-tailed vs. 2-tailed p values#

By default, SciPy’s ttest_ functions return 2-tailed p values. This means that the p value considers both possible directions of difference between the two conditions. In the present example, that means either RTs for congruent are faster than incongruent, or they are slower for congruent than incongruent. In contrast, a 1-tailed p value should be used if we have a specific prediction of a “direction” of the difference. Using a 1-tailed p value will tend to be less conservative, i.e., more likely to find a significant effect. This is because, for a given p threshold (e.g., \(\alpha = .05\)), a 2-tailed test effectively splits the p in half, and reflects a probability of 2.5% that the result occurred by chance in one direction (e.g., congruent slower) and a 2.5% probability of getting the reverse result (e.g., congruent faster) by chance. In contrast, a 1-tailed test allocates all of the 5% chance probability to the likelihood of a difference in one direction (e.g., congruent faster).

If you have a specific a priori hypothesis regarding the direction of the difference, you should use a 1-tailed test, because you increase your sensitivity to the predicted effect. For example, for the flanker experiment we’re working with here, previous research would lead us to the congruent-faster hypothesis. A 2-tailed test is more appropriate in exploratory analyses, if you don’t have a clear prediction as to which direction the difference might be in.

In the present example, it really doesn’t matter since the two-tailed p value is wildly significant. However, if you want to convert to one-tailed p values, you just need to divide p in half:

print('Congruent vs. Incongruent t = ', str(round(t, 2)), ' p (one-tailed) = ', str(p / 2))
Congruent vs. Incongruent t =  -10.21  p (one-tailed) =  6.869648289910338e-11

Be careful about order of data values#

Paired t tests are meant to compare two sets of data that are paired in some way. In our case, the data are paired because they come from the same participants. It’s important when passing the two sets of data to the t test function that the order of the data points is the same in both sets. The above paired t test worked properly because congr and incongr were extracted from a pandas DataFrame, in which each row holds data from a particular participant. So the order of the data points was the same in congr and incongr. This is an important consideration to keep in mind though — it’s always a very good idea to ensure your data are stored with indexes that ensure your pairings of data values are correct.

Solution 1: Use DataFrame columns rather than extracting Series#

Above, we extracted the two data sets we wanted to compare with a t test from a DataFrame (df) to two pandas Series, congr and incongr. On the one hand, this simplifies the syntax of the t test command, but on the other hand we lose the structure of the pandas DataFrame. That is, in the DataFrame, the values from each participant are in the same row, and so we don’t have to worry about the order of the data values. We can run the t test on two columns of the pandas DataFrame, the code is just a little more complex to look at:

t, p = stats.ttest_rel(df_avg[df_avg['flankers'] == 'congruent']['rt'],
                       df_avg[df_avg['flankers'] == 'incongruent']['rt'])
print('Congruent vs. Incongruent t = ', str(round(t, 2)), ' p = ', str(round(p, 4)))
Congruent vs. Incongruent t =  -10.21  p =  0.0

This is probably the best approach to use in most cases, because:

  1. It ensures that the repeated-measures structure of the data is preserved

  2. It uses less memory resources, because we aren’t copying columns of our DataFrame to new Series/vairables.

It may in fact seem overly convoluted to have first demonstrated the extract-to-Series approach, then explain that it’s not the ideal way to do things! However, for many people, it’s intuitive to extract subsets of data to perform further processing on. One point of this lesson was to illustrate how that can create problems, even though it might seem like a logical approach.

Solution 2: Use .sort_index() to ensure paired data are aligned#

If you do choose to work with a pair of Series, the way we can ensure that the indexes of the two data sets align this is by re-ordering the data in both Series that we’re comparing (congr and incongr_rand in this case), using pandas .sort_index() method:

t, p = stats.ttest_rel(congr.sort_index(), incongr.sort_index())
print('Congruent vs. Incongruent t = ', str(round(t, 2)), ' p = ', str(round(p, 4)))
Congruent vs. Incongruent t =  -10.21  p =  0.0

Long story short, it is good practice to index by participant ID, and use the .sort_index() method when applying t tests to pandas Series or DataFrames, to ensure that values are appropriately paired.

One-sample t tests#

An alternative way to compare the congruent and incongruent conditions is to first compute the difference in mean RTs between the two conditions for each participant (since it is a paired design), and then run a t test on the differences. In this case, we use a one sample t test, in which we compare the data set to zero. In other words, is the difference between the conditions basically zero, or is it significantly different from zero (i.e., a believable difference)?

Effectively, this is not different from a paired t test. However, there may be situations where your data are already stored as differences, and you want to know if the mean of those differences is significantly different from zero. In that case, you would use a one-sample t test.

We can compute the difference between two pandas Series easily just using the - (minus) operator, so in this case we could use congr - incongr

Note that this subtraction only works if the two Series are indexed by participant ID (or in some way that preserves the alignment of values between the two data sets). However, because we are subtracting two pandas objects, pandas recognizes the indexes in each and aligns them, even if the indexes aren’t in the same order in the two input Series. So we don’t have to worry about using .sort_index() as we did above for paired t tests.

congr_vs_incongr = congr - incongr
t, p = stats.ttest_1samp(congr_vs_incongr, 0)
print('Congruent vs. Incongruent t = ', str(round(t, 2)), ' p = ', str(round(p, 4)))
Congruent vs. Incongruent t =  nan  p =  nan

Note that we get the same result from the 1 sample t test if we perform the subtraction on the two Series that have the same order of indexes, as when we perform the subtraction using incongr_rand, which has a randomly shuffled order of indexes. We don’t need to explicitly .sort_index() in this case:

congr_vs_incongr = congr - incongr_rand
t, p = stats.ttest_1samp(congr_vs_incongr, 0)
print('Congruent vs. Incongruent t = ', str(round(t, 2)), ' p = ', str(round(p, 4)))
NameError                                 Traceback (most recent call last)
Cell In[13], line 1
----> 1 congr_vs_incongr = congr - incongr_rand
      2 t, p = stats.ttest_1samp(congr_vs_incongr, 0)
      3 print('Congruent vs. Incongruent t = ', str(round(t, 2)), ' p = ', str(round(p, 4)))

NameError: name 'incongr_rand' is not defined

Paired vs. 1-sample t tests?#

You’ll note the result of the 1-sample t test is the same as the paired t test above. This is expected, because in both cases we ran a t test to compare the difference between the same two sets of data. From a coding perspective, the paired t test is a bit simpler, because you don’t have to perform a subtraction on the data prior to running the t test.

The reasons we might want to run a 1-sample t test include cases where are data are already represented as a subtraction, or in some cases when we’re working with multiple variables, performing subtractions can be a way of simplifying our presentation of the results. As well, since pandas subtraction respects the indexes, computing differences and then 1-sample t tests can be a bit safer in ensuring that the proper within-participants nesting structure of your data is preserved.


  • t tests are used to compare the means of two sets of data to each other, or the mean of one set of data against a particular value (such as zero)

  • An unpaired t test is used to compare two independent sets of data (e.g., from two different samples of a population, two groups, etc.)

  • A paired t test must be used when the two sets of data come from the same samples (e.g., the same individual participants)

  • A 1-sample t test is used to compare the mean of one set of data against a specific value. This is often used to compare a data set to zero

  • Paired t tests and 1-sample t tests can both be used to determine whether differences between two samples are significantly different from zero (no difference).

    • In the 1-sample case, you must first compute the difference between the pairs of data in two conditions.

  • When working with pandas data objects, it is important to remember that SciPy’s functions (including ttests) do not use pandas indexes. So when doing paired t tests, you must ensure that the data are listed in the same order in the two Series being compared.

    • The best way to ensure that the within-participant/repeated measures structure of the data is preserved when doing a t test, is to use two columns from a DataFrame that is indexed by participant ID.

    • One alternative is to use the .sort_index() method on two series that are indexed by participant ID

    • Another alternative is to use the fact that pandas does respect its indexing when you subtract two Series, so if your data are indexed by participant ID, doing the subtraction followed by a 1-sample t test is a way of ensuring that the within-participants relationships between data sets are preserved.