Working With Multiple Data Files Using Copilot#
In this lesson we will revisit some of the material covered at the end of the introductory chapter on Python. Specifically, reading multiple data files, manipulating the data using pandas, and deriving some basic information from the data.
The instructions are very high-level, because we want you to work on writing Copilot prompts yourself, based on high-level instructions rather than us giving you step-by step guidance. The version of this lesson in the online textbook, however, shows the worked example with prompts and Copilot-generated code.
This lesson is broken into three parts, as separate notebook files, due to its length. This first section provides an overview of the task. The subsequent sections walk through generating and debugging the code to perform these steps, using GitHub Copilot.
Your task is to perform the operations below, using Python code generated by a combination of your own writing, and GitHub Copilot’s suggestions. For your own learning, you are best off to start each step by trying to write the code yourself. If it works, great – you’ve learned something in Python! Congratulations. But, tTe tasks below intentionally ask you to do things that we haven’t taught in this course yet. That’s because we want you to learn how to use Copilot to help you out when you don’t know how to do something.
If you try writing your own code and it doesn’t work, then you can try again, or use Copilot to help you out. Copilot is meant to speed up your coding, so in practice it’s preferable to use it to help you out when you get stuck, than spend too much time trying to figure out how to do something yourself. On the other hand, once you have the solution from Copilot, it’s important to try to understand what it is doing. You may have to look up some of the functions it’s using, if you’re not familiar with them.
If you’re ignorant of what the code you generated is doing, you risk errors. Indeed, in the tutorial below we will see an example of Copilot’s generating erroneous code. Even if the code works as desired, it is important for you to be able to explain correctly what you did. Also, look for ways to check that the results you get are correct (using code).
The Data and What to Do With It#
There are three data files in the
s3.csv. Each file contains the reaction times (RTs) from 10 trials of a relatively simple task in which participants had to indicate which direction a briefly-presented arrow was pointing. The RTs are in seconds (s). Each file contains the RTs from a different participant. In each file there are three columns. You can determine what the columns are by looking at the first row (header) of each file.
Generate code that reads in the three files and combines them in a pandas DataFrame. Include error-checking code that determines you loaded in the correct number of trials (30) and the correct number of columns (3). If the number of trials or columns is incorrect, print an error message.
Save the dataframe to a file called
all_data.csv, in the
Once you have the DataFrame, calculate:
the mean RT for each participant
the mean RT across all participants
the 95% confidence intervals (CIs) for the mean RT for each participant
the 95% CIs for the mean RT across all participants
Finally, print a table that includes all of the results: with columns for mean RT, lower 95% CI, and upper 95% CI, and with one row for each participant and the bottom row showing the mean and CIs across participants.
How to Approach This Exercise#
A critical skill to develop in coding is problem decomposition, or programmatic thinking — in other words, breaking down a task into smaller and smaller components, so that you can write the code to perform each step in the logical sequence. The instructions above are written in a sequential way, so you should be able to identify each individual step that you need to take.
It is good practice, when working in Jupyter notebooks, to write the code for each step in a separate cell. This makes it easier to test each step, and to go back and change things if you need to. It also makes it easier to see what you’ve done, and understand what the code is doing. As well, it’s sometime helpful to put Markdown cells in between code cells, to provide longer explanations than might be appropriate for a comment in the code itself. You can also make notes about things you might want to change later, or interpretations of the output of the code.
Using the Copilot Assistant#
As noted above, we encourage you to try writing the code for each step of the instructions above. Coding is a procedural skill that you only learn by doing — and the more you learn the better you will be at solving the bugs that Copilot-generated code will inevitably have. But, if you get stuck, or encounter an instruction that you haven’t yet learned how to perform in Python, then by all means use Copilot. Just like writing your own code, it is best if you use a separate notebook cell for each Copilot prompt. That way, if you need to edit the code, you can do so without having to re-run all of the code that Copilot generated. It’s also easier to debug because you can see the output of each cell and now exactly which line generated the error.
As you will see, it takes some trial and error – and critical thinking – to generate prompts that produce the code you want. But, it’s a good way to learn how to use Copilot effectively. And if you do it right, you will learn a lot about Python along the way.