Artificial Intelligence-Assisted Coding#
The world of coding and data science has changed dramatically and seismically since 2021, with the public release of large language models (LLMs) specifically trained to help write code. These models, which are trained on massive amounts of text data, can be used to generate code that performs a variety of tasks. For example, the GPT-3 model can generate code based on written directions from a user. Whereas in the past, even experienced coders and data scientists relied heavily on web searches, online forums, and documentation to find code that performs a desired task, now they can simply write a description of what they want, and the model will generate code for them. This can a huge time-saver, and allows us as data scientists to instead focus on what we want to do with our data, rather than how to do it.
In this course, you will learn how to use AI-assisted coding tools to help write code to perform data science tasks — most notably, GitHub Copilot. This has rapidly become the way coders and data scientists work, so it’s not “cheating” — it’s a new tool that can empower you to be more productive. Most importantly, AI assistants will allow us to focus on what we want to do with data, and get us insights faster. These aspects of the course are the most important, but prior to AI tools it took a long time to get there, due to the slow and often tedious process of learning how to code.
At the same time, it’s important to recognize that AI-generated code is not perfect, and requires careful evaluation and debugging. LLMs essentially predict the next word in a sentence, based on the previous words — and the same is true when they generate code. They are trained on large samples of existing code that is publicly available (most of which, hopefully, actually works), and do their best to provide you with appropriate code to your prompt. But they are not perfect, and sometimes they will generate code that doesn’t work, or doesn’t actually perform the operation(s) that you intended. The less common a task is, the fewer examples of it the model will have been trained on, and the less likely it is to generate good code for the task. So it’s critical to learn how to evaluate the code that AI tools generate, and make sure it does what you want it to do. This will also be a focus of this course.