Working with Jupyter Notebooks#


Questions#

  • How do I work with and navigate Jupyter notebooks in Codespaces or VS Code?

Learning Objectives#

  • be able to open a Jupyter notebook

  • be able to navigate a Jupyter notebook

  • be able to enter and run code in a Jupyter notebook

  • be able to enter and format rich text in a Jupyter notebook, using Markdown


You will not be able to perform the tasks described here in the online textbook itself. This, and all subsequent sections that demonstrate coding, will require you to use the Jupyter notebooks either through Codespaces, VS Code, or some other software that allows you to run Jupyter notebooks. The instructions for how to do this are in the previous section, Set Up Your Computer for Data Science.

Jupyter notebooks#

You are currently viewing a Jupyter notebook file. Jupyter notebooks are composed of cells. Cells can be of three types: code, Markdown, or raw. This cell you’re reading is Markdown, a simple language for formatting rich text. The cell below is a code cell, where you can write and run Python commands. Raw cells are “raw” text — they aren’t fancy-formatted Markdown, and they aren’t run-able as code. They also aren’t terribly useful.

Cells have two modes: Edit and Command. In Edit mode, you will see a flashing cursor, and you can type into the cell and edit it. In Command mode, manipulate it in certain ways (e.g., deleting a whole cell, or moving it). Only one cell is ever in Edit mode at any given time.

This text you are currently reading is a Markdown cell. Assuming you are viewing it in VS Code, and click once on this text, the cell should become active in command mode, so you should see a some color cue that it is now selected (the specifics depend on the color theme you are using). If you double-click on this (or any other Markdown) cell, the text will change to a fixed-width font and you’ll see the Markdown formatting tags (like # for headings), and a flashing cursor will appear where you clicked. Try it! To return to command mode, press the Shift & Enter keys on your keyboard simultaneously, or hit the Escape key. You can also just click on another cell, however in that case this cell will remain in Edit mode — i.e., with a fixed width font and the Markdown tags visible — until you press Shift & Enter or Escape.

So in summary:

  • Single-clicking a cell will select it in Command mode

  • Double-clicking a cell will put it in Edit mode

  • Pressing the Shift & Enter keys on your keyboard simultaneously will execute a cell. Execution will format a Markdown cell, or run the code in a code cell.

Notebook files can get really long. You can see an outline of the file (a list of all the Markdown headings in your notebook) by clicking Outline button in the toolbar at the top of your Jupyter Notebook window. This will open the Outline viewer in the lower left corner of VS Code. The outline is only visible when your Activity Bar is in Explorer mode (i.e., when your files are listed in the top left region).

Run your First Code Cell#

Below is a code cell with some very simple Python code. You haven’t started learning Python yet, but as you can see, at its simplest Python can act like a calculator. Before you run this cell, though, you need to tell VS Code what language you want the notebook to run code cells in. Recall that Jupyter notebooks can run code in many different languages, including Python, R, and Julia.

To tell VS Code that you want to run Python code, click on the words Select Kernel in the top right corner of the notebook. At this point, the options you see will depend on whether you’re running in a Codespace, VS Code, or something else. If you see base from the drop-down menu, choose that one. Otherwise, click on Python environments and select the Python environment you want to use. It may simply be called Python followed by a version number (like 3.12.1), or it may have a name like neural_data_science, if you installed Python on your own computer.

Once you have selected your kernel, try executing the cell (running it) and see what happens.

Whereas for Markdown cells the Escape key "executes" the cell and renders it formatted, pressing the Escape key will not execute a code cell. Shift-Enter works for both Markdown cells (formatting the cell) and code cells (running the code in the cell). The currently-selected code cell also has a "Run Cell" button in the toolbar (which plooks like a "Play" icon, or triangle pointing right) --- clicking this will also execute the cell.
1 + 1
2

Take note of a couple of things here. Firstly, the output of a code cell appears in the notebook directly underneath the cell. The output is updated each time you run the cell. If you want to clear the output of a cell, you can click on the ellipsis (…) in the lower right corner of the cell (under the []), and select Clear Outputs from the menu that appears.

Secondly, after running the cell, the text to the left of the cell changed from [] to [1]. The number indicates that the cell was run, and the numbers increase sequentially as you run cells. So if you run the above cell again, the [] number will go up by one. Likewise, when you run the cell below, the number will be one larger than the number in the previously-executed cell.

In general, it’s a good idea to run Jupyter notebook cells sequentially, from top to bottom. But sometimes, when you’re writing and debugging your code, you’ll run a cell many times before you get it right, and sometimes you’ll run a few cells and then go back and run them again. So the numbers are useful in keeping track of what you’ve done, in what order. If a process takes a while to run, then the cell will show [*] while the cell is running.

Run the cell below and look at the output.

1 + 1
2 + 2
3 + 3
6

The cell contained three comments, or operations — one on each line — but you only see the output of the last line! This is how Jupyter operates: the output that appears below the cell only for the last commend that was run in the cell, that generated output. This is not normally what you want. Thus in Jupyter if you want to see the output of commands you run in a cell, you should embed the command inside a print() statement, as shown below:

print(1 + 1)
print(2 + 2)
print(3 + 3)
2
4
6

Soon, you’ll learn about data types. For now, it’s worth noting that if you want to output text in Python, you need to put it inside quotation marks (Python doesn’t care if you use single or double quotes, as long as you start and finish with the same one). So, running the cell below produces an error (do it!):

print(Hello World)
  Cell In[4], line 1
    print(Hello World)
          ^
SyntaxError: invalid syntax. Perhaps you forgot a comma?

The output above is a Python Error message. We won’t worry about the details of it now — we’ll get into understanding error messages later. For now, just note that the error message is telling us that there is a problem with the code we wrote.

Putting the text we want to print inside quotation marks will fix the problem. Make that change in the cell above, so it reads print("Hello World"), and run the code again

# Even in code cells, you can add random comments that are not Python code 
# as long as you start each line with a hash mark, like these lines are.

# These lines are called "comments" and it is good coding practice to use them.

# Comments allow you to make notes about your code. When you're starting out, you may want to add comments so that the next time you look at your code, you remember what a particular line or section does. Or, you may want to leave a "note to self" if you want to come back and add things to a particular section later.

# Note the above line extends off the right side of the window in VS Code. It's good programming style to put in line breaks
# manually for your comments, so that they are more easily readable.
# Run this cell. We'll use it later.
x = 1

The Jupyter Notebook toolbar#

VS Code has a very detailed user interface, with lots of buttons. When you open a Jupyter notebook, you’ll see even more! You’ll gradually get familiar with many of these functions, although some you may never use in this course. For now, we’ll go through the most important functions and concepts.

Inserting New Cells#

At the top of this notebook window, you should see a toolbar with a bunch of buttons. The first two buttons are for inserting new cells: Code and Markdown cells, respectively. You can also create a new cell above the current one, when in Command mode, if you press the a key, or a new cell below the current one if you press b. Try inserting a new code cell below this one.

Deleting Cells#

You can delete a cell by clicking on the cell, and then clicking the trash can icon that appears over the top left corner of the cell. You can also delete a cell in Command mode by pressing the d key twice (i.e., press d, then d again). If you accidentally delete a cell, you can undo it by selecting Undo from the Edit menu. Try deleting the empty cell above this one, that you just created.

Running Cells#

We’ve already described how to run a cell. The Run All button in the toolbar will run all of the cells in the notebook, in order. This is useful if you want to re-run the entire notebook, for example if you’ve made changes to some of the cells and want to re-run everything to make sure it all works. But most of the time, you will be writing and testing code cell-by-cell, so you’ll want to run the cells individually, check their output, and then move on to the next cell.

One handy feature in VS Code, though, is that if you right-click in the area to the right of a code cell, you’ll get a menu that includes the option, Execute Above Cells. This will run all the cells above that cell. This can be handy if you’ve made changes to a cell, and want to re-run that cell and all the cells above it.

Try it out: right-click in the area to the right of the cell below, and select Execute Above Cells. Note that this will not produce output for the cell you clicked on, just for all the cells above it. This can be useful if you are trying to get a particular cell to work, which depends on the code that came before it.

print('x =', x)
x = 1

Now go ahead and run the cell above, to print out the value of x

Clearing Cell Outputs vs. Clearing Memory#

The Clear All Outputs button will clear all output from every cell in your notebook. This is handy if you want to start fresh at the top of the notebook, but be careful! You’ll need to run all the cells, in order, to get the output back. On its own this can cause some confusion, because clearing outputs doesn’t clear things from Python’s memory. Prove this to yourself by selecting Clear All Outputs and then running the cell below.

print('x =', x)
x = 1

The Python kernel is a live, running instance of Python that will interpret your Python commands and produce output. Each notebook you open will start a new Python kernel (sometimes, as with this one, you’ll need to select the kernel the first time you open the file). As long as the kernel is running, the things you do in Python will be stored in memory (RAM). This will make more sense as you start to use Python, but when you’re running Python, you generally don’t just run commands and get output. You also read data files into memory, and store the results of one command in memory for use in the next command. For example, a bit earlier in this notebook you were instructed to run a cell with the code x = 1, so now your kernel has a variable x in memory, with the value 1. CLearing cell outputs doesn’t affect what the kernel has in memory. If you want to clear the memory, you need to restart the kernel.

The Restart button will restart the Python kernel running in the notebook, and clear all variables from memory — but not the cell outputs. This is useful if you’ve run a bunch of cells, and you want to start fresh. For example, if you’ve defined a bunch of variables, and you want to start over with new values for those variables, you can restart the kernel and then re-run the cells that define the variables.

Long story short: if you want to clear the output from a single cell, click on ellipsis (...) in the upper right corner of the cell, and select Clear Cell Outputs. If you want to clear the output from all cells, you should probably also restart the kernel, to avoid any confusion or unexpected behaviors.

Let’s try that now: click the Restart button, then run the cell below (which is the same as the one you ran above).

print('x =', x)
x = 1

You get an error, because you restarted the kernel and thus the variable x is no longer defined. However, if you now click Run All, all the above cells will be run, including the one that defined x as having a value of 1, and the code cell immediately above this one will generate correct output, and not an error.

Undo Operates on Text, Not Python#

When you’re working in a Jupyter notebook, you’re typing in a text editor. The text editor is connected to a Python kernel, which is running in the background. When you type in a cell, you’re typing in the text editor. When you run a cell, the text editor sends the contents of the cell to the Python kernel, which runs the code. The kernel then sends the results of the code back to the text editor, which displays the results.

When editing the cells of a Jupyter notebook, there is an undo function (in the Edit menu), which works similarly to other software you’ll have used, sequentially undoing things you’ve typed in cells.

Importantly, the undo function does not undo the results of commands you’ve run in Python. Continuing with our previous example, this means that if you ran the Python command x = 1, then ran “undo”, Jupyter would delete your typing, x = 1. However, in the kernel’s memory, x would still equal 1. So you could run print(x) the output would be 1.

Summary#

  • You can edit and run code in Jupyter notebooks. Results will appear in the notebook

  • Jupyter notebooks can contain a mixture of code, Markdown, and raw cells

  • Markdown is useful for annotating your code with rich text

  • Comments can be put in code cells as well; Python ignores anything on a line after a #

  • Each Jupyter notebook runs a kernel (e.g, Python) that maintains information in RAM as long as it is running

  • Operations occur in the order that you run cells in a notebook - not necessarily the order the cells appear in the notebook. It’s good practice to run cells in the order they appear in the notebook, and restart the kernel if you need to make changes and re-run.

  • Restarting the kernel will clear everything from RAM for that notebook

  • Undo modifies text (the code in a cell), but not what is in Python’s memory. You cannot undo changes to the kernel; you need to restart the kernel and start over if you want to erase all variables from memory.