Introduction to Data Visualization

Introduction to Data Visualization#

Visualization is a powerful tool in data science. Data sets are typically very large, and making inferences from data requires understanding the relationships between different data points. This can be difficult to do simply from looking at a table of numbers, especially if the table is large or if there are many dimensions to it (e.g., several experimental conditions, possibly in crossed or nested designs). The human brain is able to readily make inferences about data, and identify patterns, from visual information. Visualizations can thus provide concise and clear representations of data that facilitate interpretation.

On the other hand, poor visualization can cause confusion, or create misunderstanding. It’s important to develop a comfort level with visualizing data in different ways, and understand best practices in representing data. This includes choosing visualizations that are appropriate to the data, and the questions you’re asking of the data. It also involves designing the visualization with readability and interpretability in mind — ideally designing visualizations such that a viewer does not need to refer to a text description (e.g., a figure’s caption) in order to understand what they’re seeing. Visualizations should also be designed with accessibility in mind — considering factors like color blindness to ensure that individuals are not prevented from understanding your visualizations.

In this chapter, we will introduce core Python visualization packages, Matplotlib and Seaborn, as well as talking about topics such as accessibility and the choice of color maps.