# Basic Statistics in Python: t tests with SciPy

## Contents

# Basic Statistics in Python: *t* tests with SciPy#

## Learning Objectives#

implement paired, unpaired, and 1-sample

*t*tests using the SciPy package

## Introduction#

An introductory course in statistics is a prerequisite for this class, so we assume you remember (some of) the basics including *t*-tests, ANOVAs, and regression (or at least correlation).

Here we will demonstrate how to perform *t* tests in Python. Future lessons will cover ANOVA and regression.

## The *t* test#

A *t* test is used to compare the means of two sets of data. For example, in the flanker experiment we used in the previous section, we could compare the mean RTs for the congruent and incongruent conditions. *t* tests consider the size of the difference between the means of the two data sets, relative to the variance in each one. The less the distributions of values in the two data sets overlap, the larger the *t* value will tend to be. We can then estimate the probability that the observed difference occurred simply by chance, rather than due to a true difference — this is the *p* value. Typically, researchers use a *p* < .05 threshold to determine statistical significance.

*t* tests are implemented in the SciPy library, which “provides many user-friendly and efficient numerical routines, such as routines for numerical integration, interpolation, optimization, linear algebra, and statistics.” Each of those type of routines is in a separate sub-module of SciPy; the one we’ll want is `scipy.stats`

. We can import this specific module with the command:

```
from scipy import stats
```

We’ll also import some other packages we’ll need, and the flanker data from the previous lesson to work with:

```
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import glob
# Import the data
df = pd.read_csv('data/flanker_rt_data.csv')
## Aggregate the data across participants
df_avg = pd.DataFrame(df.groupby(['participant', 'flankers'])['rt'].mean()).reset_index()
```

## Paired *t*-test#

Let’s start by comparing the mean RTs for the congruent and incongruent flanker conditions.

Recall that we are working with repeated-measures data – for each participant, we have 160 trials across 4 conditions. *t* tests are not meant for *within-condition* repeated measures data — we need only one measurement per participant in each condition. This is for essentially the same reason discussed at the end of the previous section on repeated measures data: if we treat the within-participant variability the same as the between-participant variability, then we will tend to grossly under-estimate the true (between-participant) variance. When running a *t* test, this would result in erroneously large *t* values that could often falsely suggest a statistically significant result. So, we need to use the aggregated data, `df_avg`

.

The other important characteristic of our data are that, even though aggregation has reduced the data to one measurement per participant, we still have repeated measures, across the two conditions. The default assumption of a *t* test is that each of the two data sets being compared come from different samples of the population (often called a *between-subjects design* in Psychology). This means that *t* tests assume there is no relationship between any particular measurement in each of the two data sets being compared. When we have measurements from the same people in both data sets (a *within-subjects. design*), we need to account for this, or the *t* test will again suggest an inflated (incorrect) value. We account for this by using a **paired t test**. In SciPy, this is the function

`ttest_rel()`

. (For a between-subjects — or *independent groups*design, which we will not cover here, you would use

`ttest_ind()`

).### Select the data#

Running `ttest_rel()`

is as simple as giving it the two sets of data you want to compare, as arguments. We can pull these directly from our `df_avg`

pandas DataFrame. We’ll do this in a few lines of code below, first assigning each data set to a new variable, and then running the *t* test.

```
congr = df_avg[df_avg['flankers'] == 'congruent']['rt']
incongr = df_avg[df_avg['flankers'] == 'incongruent']['rt']
```

Let’s make sure you understand the code above before we go on. We’ve seen it before, but maybe not in exactly this form — and it is quite complex, but logical.

We start on the first line by selecting only the rows of the DataFrame associated with congruent trials, which returns a Boolean mask:

```
df_avg['flankers'] == 'congruent'
```

```
0 True
1 False
2 False
3 True
4 False
...
76 False
77 False
78 True
79 False
80 False
Name: flankers, Length: 81, dtype: bool
```

We embed this inside another selector `df_avg[df_avg['flankers'] == 'congruent']`

, which applies the Boolean mask to the DataFrame, essentially saying, “select from `df_avg`

all the columns associated with congruent trials”.

```
df_avg[df_avg['flankers'] == 'congruent']
```

participant | flankers | rt | |
---|---|---|---|

0 | s1 | congruent | 0.455259 |

3 | s10 | congruent | 0.471231 |

6 | s11 | congruent | 0.417540 |

9 | s12 | congruent | 0.429758 |

12 | s13 | congruent | 0.419096 |

15 | s14 | congruent | 0.437178 |

18 | s15 | congruent | 0.548638 |

21 | s16 | congruent | 0.433748 |

24 | s17 | congruent | 0.437577 |

27 | s18 | congruent | 0.488892 |

30 | s19 | congruent | 0.539020 |

33 | s2 | congruent | 0.438167 |

36 | s20 | congruent | 0.462935 |

39 | s21 | congruent | 0.417553 |

42 | s22 | congruent | 0.410191 |

45 | s23 | congruent | 0.549622 |

48 | s24 | congruent | 0.568396 |

51 | s25 | congruent | 0.450102 |

54 | s26 | congruent | 0.528508 |

57 | s27 | congruent | 0.439243 |

60 | s3 | congruent | 0.570766 |

63 | s4 | congruent | 0.401993 |

66 | s5 | congruent | 0.462927 |

69 | s6 | congruent | 0.446840 |

72 | s7 | congruent | 0.628185 |

75 | s8 | congruent | 0.428642 |

78 | s9 | congruent | 0.431829 |

Finally, we add `['rt']`

to the end to indicate that, having selected the incongruent rows, we actually only want the column with the RT values, because those are what we want to perform the *t* test on. The second line does the same thing for incongruent trials.

```
df_avg[df_avg['flankers'] == 'congruent']['rt']
```

```
0 0.455259
3 0.471231
6 0.417540
9 0.429758
12 0.419096
15 0.437178
18 0.548638
21 0.433748
24 0.437577
27 0.488892
30 0.539020
33 0.438167
36 0.462935
39 0.417553
42 0.410191
45 0.549622
48 0.568396
51 0.450102
54 0.528508
57 0.439243
60 0.570766
63 0.401993
66 0.462927
69 0.446840
72 0.628185
75 0.428642
78 0.431829
Name: rt, dtype: float64
```

This last result is what we assign to `congr`

(note, by the way, that this is a pandas Series, not a DataFrame).

Likewise, `incongr`

is a Series of the same length (the number of participants):

```
incongr
```

```
1 0.471838
4 0.499031
7 0.473012
10 0.506722
13 0.478367
16 0.453524
19 0.591644
22 0.492921
25 0.504452
28 0.527152
31 0.591181
34 0.518216
37 0.507257
40 0.507033
43 0.474612
46 0.554172
49 0.595977
52 0.513179
55 0.565531
58 0.501069
61 0.591022
64 0.428867
67 0.530722
70 0.490298
73 0.650769
76 0.494878
79 0.437926
Name: rt, dtype: float64
```

### Run the *t* test#

Now we just pass `congr`

and `incongr`

as the first (and only) two arguments to `ttest_rel()`

, and print the results out with some explanatory text. Note that we have to write `stats.ttest_rel()`

, because we imported the library as `stats`

.

```
t, p = stats.ttest_rel(congr, incongr)
print('Congruent vs. Incongruent t = ', str(t), ' p = ', str(p))
```

```
Congruent vs. Incongruent t = -10.209634805365013 p = 1.3739296579820675e-10
```

We can make the output nicer by rounding to a reasonable level of precision:

```
print('Congruent vs. Incongruent t = ', str(round(t, 2)), ' p = ', str(round(p, 4)))
```

```
Congruent vs. Incongruent t = -10.21 p = 0.0
```

Now that’s a results any researcher would be happy to see! The *p* value is not actually zero by the way, but note in the original output the *p* value was reported in scientific notation, ending in `e-10`

. This means that the *p* value is actually 0.00000000013739. We would typically report this as *p* < .0001, since we rounded to 4 decimal places (which is fairly typical for reporting *p* values).

### 1-tailed vs. 2-tailed *p* values#

By default, SciPy’s `ttest_`

functions return **2-tailed p values**. This means that the

*p*value considers both possible directions of difference between the two conditions. In the present example, that means

*either*RTs for congruent are faster than incongruent,

*or*they are slower for congruent than incongruent. In contrast, a

**1-tailed**should be used if we have a specific prediction of a “direction” of the difference. Using a 1-tailed

*p*value*p*value will tend to be less conservative, i.e., more likely to find a significant effect. This is because, for a given

*p*threshold (e.g., \(\alpha = .05\)), a 2-tailed test effectively splits the

*p*in half, and reflects a probability of 2.5% that the result occurred by chance in one direction (e.g., congruent slower) and a 2.5% probability of getting the revser result (e.g., congruent faster) by chance. In contrast, a 1-tailed test allocates all of the 5% chance probability to the likelihood of a difference in one direction (e.g., congruent faster).

Practically speaking, 2-tailed tests should be used by default, but if you have a specific a priori hypothesis regarding the direction of the difference, you can use a 1-tailed test. For example, for the flanker experiment we’re working with here, previous research would lead us to the congruent-faster hypothesis.

In the present example, it really doesn’t matter since the two-tailed *p* value is wildly significant. However, if you want to convert to one-tailed *p* values, you just need to divide *p* in half:

```
print('Congruent vs. Incongruent t = ', str(round(t, 2)), ' p (one-tailed) = ', str(p / 2))
```

```
Congruent vs. Incongruent t = -10.21 p (one-tailed) = 6.869648289910338e-11
```

## Be careful about order of data values#

The above paired *t* test worked properly because in our pandas DataFrame, participants are listed in a consistent order. So when we create separate Series for congruent and incongruent, the same rows of the two Series belong to the same participant. However, this isn’t always guaranteed, and so it’s good practice to do things in a way that ensures proper pairing of participants between data sets.

pandas indexing allows us to do this. Recall that indexes are row labels. By default, when we read a CSV file to a DataFrame, the rows are indexed numerically starting from zero. Indeed, if you look back above at the contents of `congr`

and `incongr`

, you’ll see the indexes in the left column are discontinuous and different between the two, because each data point came from a separate row. To ensure alignment of each participant’s data across the two series, we can first use the participant ID as the index of `df_avg`

, and then create separate Series for each condition:

```
df_avg = df_avg.set_index('participant')
congr = df_avg[df_avg['flankers'] == 'congruent']['rt']
incongr = df_avg[df_avg['flankers'] == 'incongruent']['rt']
```

Now when we look at the resulting Series, we see that the participant indexes are preserved:

```
congr
```

```
participant
s1 0.455259
s10 0.471231
s11 0.417540
s12 0.429758
s13 0.419096
s14 0.437178
s15 0.548638
s16 0.433748
s17 0.437577
s18 0.488892
s19 0.539020
s2 0.438167
s20 0.462935
s21 0.417553
s22 0.410191
s23 0.549622
s24 0.568396
s25 0.450102
s26 0.528508
s27 0.439243
s3 0.570766
s4 0.401993
s5 0.462927
s6 0.446840
s7 0.628185
s8 0.428642
s9 0.431829
Name: rt, dtype: float64
```

```
incongr
```

```
participant
s1 0.471838
s10 0.499031
s11 0.473012
s12 0.506722
s13 0.478367
s14 0.453524
s15 0.591644
s16 0.492921
s17 0.504452
s18 0.527152
s19 0.591181
s2 0.518216
s20 0.507257
s21 0.507033
s22 0.474612
s23 0.554172
s24 0.595977
s25 0.513179
s26 0.565531
s27 0.501069
s3 0.591022
s4 0.428867
s5 0.530722
s6 0.490298
s7 0.650769
s8 0.494878
s9 0.437926
Name: rt, dtype: float64
```

### Ensure pandas indexing is used in *t* tests#

#### What could go wrong?#

It turns out that SciPy’s `ttest`

functions ignore pandas indexes, so indexing on its own won’t ensure that the *t* test compares data points from the same individuals. We can see that by randomizing the order of the rows of the `incongr2`

series, while preserving the relationship between indexes (participant IDs) and RTs (you can compare with above data to confirm that the same RT values are associated with the same IDs as in the original `incongr`

Series):

```
df_avg = df_avg.reset_index()
inc_arr = np.array(df_avg[df_avg['flankers']=='incongruent'].iloc[:, [0, 2]])
np.random.shuffle(inc_arr)
incongr2 = pd.DataFrame(inc_arr, columns=['participant', 'rt']).set_index('participant')
incongr2 = pd.Series(incongr2['rt'])
incongr2
```

```
participant
s25 0.513179
s6 0.490298
s24 0.595977
s8 0.494878
s12 0.506722
s16 0.492921
s11 0.473012
s15 0.591644
s26 0.565531
s4 0.428867
s22 0.474612
s13 0.478367
s23 0.554172
s27 0.501069
s9 0.437926
s20 0.507257
s14 0.453524
s1 0.471838
s18 0.527152
s2 0.518216
s3 0.591022
s17 0.504452
s19 0.591181
s7 0.650769
s5 0.530722
s21 0.507033
s10 0.499031
Name: rt, dtype: object
```

Now when we run the *t* test, the *t* value doesn’t match the *t* value that we got above with the properly-paired data, and in fact if you run the code below multiple times, you will get diferent *t* and *p* values each time due to the random shuffling.

```
t, p = stats.ttest_rel(congr, incongr2)
print('Congruent vs. Incongruent t = ', str(round(t, 2)), ' p = ', str(round(p, 4)))
```

```
Congruent vs. Incongruent t = -2.89 p = 0.0078
```

## Solution 1: Use DataFrame columns rather than extracting Series#

Above, we extracted the two data sets we wanted to compare with a *t* test from a DataFrame (df) to two pandas Series, `congr`

and `incongr`

. On the one hand, this simplifies the syntax of the *t* test command, but on the other hand we lose the structure of the pandas DataFrame. That is, in the DataFrame, the values from each participant are in the same row, and so we don’t have to worry about the order of the data values. We can run the *t* test on two columns of the pandas DataFrame, the code is just a little more complex to look at:

```
t, p = stats.ttest_rel(df_avg[df_avg['flankers'] == 'congruent']['rt'],
df_avg[df_avg['flankers'] == 'incongruent']['rt'])
print('Congruent vs. Incongruent t = ', str(round(t, 2)), ' p = ', str(round(p, 4)))
```

```
Congruent vs. Incongruent t = -10.21 p = 0.0
```

This is probably the best approach to use in most cases, because:

It ensures that the repeated-measures structure of the data is preserved

It uses less memory resources, because we aren’t copying columns of our DataFrame to new Series/vairables.

It may in fact seem overly convoluted to have first demonstrated the extract-to-Series approach, then explain that it’s not the ideal way to do things! However, for many people, it’s intuitive to extract subsets of data to perform further processing on. One point of this lesson was to illustrate how that can create problems, even though it might seem like a logical approach.

### Solution 2: Use `.sort_index()`

to ensure paired data are aligned#

If you do choose to work with a pair of Series, the way we can ensure that the indexes of the two data sets align this is by *re-ordering* the data in both Series that we’re comparing (`congr`

and `incongr2`

in this case), using pandas `.sort_index()`

method:

```
t, p = stats.ttest_rel(congr.sort_index(), incongr2.sort_index())
print('Congruent vs. Incongruent t = ', str(round(t, 2)), ' p = ', str(round(p, 4)))
```

```
Congruent vs. Incongruent t = -10.21 p = 0.0
```

Long story short, it is good practice to index by participant ID, and use the `.sort_index()`

method when applying *t* tests to pandas Series or DataFrames, to ensure that values are appropriately paired.

## Testing differences: one-sample *t* tests#

An alternative way to compare the congruent and incongruent conditions is to compute the difference in mean RTs between the two conditions for each participant (since it is a paired design), and then run a *t* test on the differences. In this case, we use a **one sample** *t* test, in which we compare the data set to zero. In other words, is the difference between the conditions basically zero, or is it significantly different from zero (i.e., a believable difference)?

We can compute the difference between two pandas Series easily just using the `-`

(minus) operator, so in this case we could use `congr - incongr`

Note that this subtraction *only* works if the two Series are indexed by participant ID (or in some way that preserves the alignment of values between the two data sets). However, because we are subtracting two pandas objects, pandas recognizes the indexes in each and aligns them, even if the indexes aren’t in the same order in the two input Series. So we don’t have to worry about using `.sort_index()`

as we did above for paired *t* tests.

```
congr_vs_incongr = congr - incongr
t, p = stats.ttest_1samp(congr_vs_incongr, 0)
print('Congruent vs. Incongruent t = ', str(round(t, 2)), ' p = ', str(round(p, 4)))
```

```
Congruent vs. Incongruent t = -10.21 p = 0.0
```

Note that we get the same result from the 1 sample *t* test if we perform the subtraction on the two Series that have the same order of indexes, as when we perform the subtraction using `incongr2`

, which has a randomly shuffled order of indexes. We don’t need to explicitly `.sort_index()`

in this case:

```
congr_vs_incongr = congr - incongr2
t, p = stats.ttest_1samp(congr_vs_incongr, 0)
print('Congruent vs. Incongruent t = ', str(round(t, 2)), ' p = ', str(round(p, 4)))
```

```
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In [19], line 2
1 congr_vs_incongr = congr - incongr2
----> 2 t, p = stats.ttest_1samp(congr_vs_incongr, 0)
3 print('Congruent vs. Incongruent t = ', str(round(t, 2)), ' p = ', str(round(p, 4)))
File ~/mambaforge/envs/ncil/lib/python3.10/site-packages/scipy/stats/_stats_py.py:6077, in ttest_1samp(a, popmean, axis, nan_policy, alternative)
6074 df = n - 1
6076 d = np.mean(a, axis) - popmean
-> 6077 v = _var(a, axis, ddof=1)
6078 denom = np.sqrt(v / n)
6080 with np.errstate(divide='ignore', invalid='ignore'):
File ~/mambaforge/envs/ncil/lib/python3.10/site-packages/scipy/stats/_stats_py.py:1250, in _var(x, axis, ddof, mean)
1248 def _var(x, axis=0, ddof=0, mean=None):
1249 # Calculate variance of sample, warning if precision is lost
-> 1250 var = _moment(x, 2, axis, mean=mean)
1251 if ddof != 0:
1252 n = x.shape[axis] if axis is not None else x.size
File ~/mambaforge/envs/ncil/lib/python3.10/site-packages/scipy/stats/_stats_py.py:1223, in _moment(a, moment, axis, mean)
1220 mean = a.mean(axis, keepdims=True) if mean is None else mean
1221 a_zero_mean = a - mean
-> 1223 eps = np.finfo(a_zero_mean.dtype).resolution * 10
1224 with np.errstate(divide='ignore', invalid='ignore'):
1225 rel_diff = np.max(np.abs(a_zero_mean), axis=axis,
1226 keepdims=True) / np.abs(mean)
File ~/mambaforge/envs/ncil/lib/python3.10/site-packages/numpy/core/getlimits.py:474, in finfo.__new__(cls, dtype)
472 dtype = newdtype
473 if not issubclass(dtype, numeric.inexact):
--> 474 raise ValueError("data type %r not inexact" % (dtype))
475 obj = cls._finfo_cache.get(dtype, None)
476 if obj is not None:
ValueError: data type <class 'numpy.object_'> not inexact
```

### Paired vs. 1-sample *t* tests?#

You’ll note the result of the 1-sample *t* test is the same as the paired *t* test above. This is expected, because in both cases we ran a *t* test to compare the difference between the same two sets of data. From a coding perspective, the paired *t* test is a bit simpler, because you don’t have to perform a subtraction on the data prior to running the *t* test.

The reasons we might want to run a 1-sample *t* test include cases where are data are already represented as a subtraction, or in some cases when we’re working with multiple variables, performing subtractions can be a way of simplifying our presentation of the results. As well, since pandas subtraction respects the indexes, computing differences and then 1-sample *t* tests can be a bit safer in ensuring that the proper within-participants nesting structure of your data is preserved.

## Summary#

*t*tests are used to compare the means of two sets of data to each other, or the mean of one set of data against a particular value (such as zero)An

**unpaired**is used to compare two independent sets of data (e.g., from two different samples of a population, two groups, etc.)*t*testA

**paired**must be used when the two sets of data come from the same samples (e.g., the same individual participants)*t*testA

**1-sample**is used to compare the mean of one set of data against a specific value. This is often used to compare a data set to zero*t*testPaired

*t*tests and 1-sample*t*tests can both be used to determine whether differences between two samples are significantly different from zero (no difference).In the 1-sample case, you must first compute the difference between the pairs of data in two conditions.

When working with pandas data objects, it is important to remember that SciPy’s functions (including

`ttest`

s) do not use pandas indexes. So when doing paired*t*tests, you must ensure that the data are listed in the same order in the two Series being compared.The best way to ensure that the within-participant/repeated measures structure of the data is preserved when doing a

*t*test, is to use two columns from a DataFrame that is indexed by participant ID.One alternative is to use the

`.sort_index()`

method on two series that are indexed by participant IDAnother alternative is to use the fact that pandas does respect its indexing when you subtract two Series, so if your data are indexed by participant ID, doing the subtraction followed by a 1-sample

*t*test is a way of ensuring that the within-participants relationships between data sets are preserved.