Short tutorial: visualizing data in Python with Altair (2024)

A short step-by-step tutorial on Altair to visualize data in Python.
Author
Affiliation
Hermann Hesse

TensorScience

Published

December 23, 2023

Introduction

I want to share something that made my work with data in Python a whole lot better: Altair. It’s a great library to make graphs and charts especially ff you like keeping your code clean and straightforward. The way it integrates with Pandas and lets you build up your charts step by step is what makes it stand out for me.

Introduction to Altair and Data Visualization

Built on a foundation of Vega and Vega-Lite, Altair offers a clear syntax for crafting stunning visual narratives out of raw data.

Here’s why I use Altair when presenting data-driven insights. Firstly, Altair’s design philosophy prioritizes a tight integration with Pandas, the go-to library for data manipulation in Python. This means your dataframes funnel directly into visualizations with minimal friction. Also, its declarative syntax allows you to describe what you want the chart to represent, rather than dictating how to render the chart, which for me, is a real timesaver.

Let’s begin with a simple example to get our feet wet. Importing Altair is just a matter of a standard Python import statement. Assuming you’ve got the Altair package installed (pip install altair or conda install -c conda-forge altair), here’s how you start.

import altair as alt
from vega_datasets import data

The dataset we’ll play with is a built-in example dataset, ideal for getting a feel of Altair’s capabilities without overwhelming newcomers. Vega datasets package (vega_datasets) is a goldmine for learning and experimenting with sample data.

cars = data.cars()

We now hold in our hands a dataset about various cars and their attributes, like miles per gallon (mpg), number of cylinders, and so on. Let’s visualize the relationship between miles per gallon and horsepower using a scatter plot.

chart = alt.Chart(cars).mark_point().encode(
x='Horsepower:Q',
y='Miles_per_Gallon:Q',
color='Origin:N'
)
chart

We create a chart object, specify the type of mark (mark_point() for scatter plot points), and encode the axes. ‘Q’ stands for Quantitative, which means the data is numerical. ‘N’ is for Nominal, representing categorical data. We’ve also added color encoding based on the car’s origin to inject some added insight.

Now stare at the magic Altair conjures – an interactive scatter plot where each point represents a car, hovering over it reveals more about the data it embodies.

The expressiveness of Altair is in its simplicity. With only a few lines of code, data jumps off the static spreadsheet and transforms into an interactive story. And this is just the tip of the iceberg. Altair empowers you to build complex and layered visualizations by stacking simple building blocks.

I find that reading through the Altair documentation helps solidify understanding, and when coupled with practice, turns fledgling data storytellers into seasoned visualizers.

Admittedly, and importantly for you to remember, learning to visualize data with Altair or any other library is iterative. Don’t be discouraged if your first chart doesn’t live up to your expectations. Experimentation begets mastery, and Altair is an astute mentor.

This introduction brushes the surface of Altair. Coming up, you’ll learn how to set up the environment for using Altair, spawn your first chart, and eventually delve into more advanced customizations and interactivity. And as you progress, remember the core principle that guides Altair: concise, declarative, and analytical. These are the hallmarks of proficient visual storytelling with data.

Setting Up Your Environment for Altair

Before we jump straight into designing stunning visualizations with Altair, let’s set up our environment so that everything runs smoothly. I’ll walk you through the process step-by-step, just as I did when I first explored Altair.

First thing’s first, make sure you have Python installed. Altair is a Python library, so it’s a must-have. Download Python from the official site (https://www.python.org/) if you haven’t got it on your machine already - I recommend going for Python 3.6 or newer, as Altair works best with these versions.

With Python ready to go, open your terminal and install Altair using pip – Python’s package installer. Here’s the command that I used:

pip install altair vega_datasets

Altair has a couple of dependencies, like Pandas and Vega, but pip handles that for you, installing everything you’ll need to start plotting. In case you’re curious, Vega is the visualization grammar that underlies Altair, providing a declarative language for creating, saving, and sharing interactive visualization designs.

Next up, you’ll want an Interactive Development Environment (IDE) to code in. I prefer working with Jupyter Notebooks, especially for data visualization, because they’re super user-friendly, and you can see your charts right beneath your code blocks. To set that up, run:

pip install notebook

Then launch it with:

jupyter notebook

Your default web browser will pop open with Jupyter’s interface. Create a new notebook and you’re ready to go. (Side note: If you prefer, JupyterLab is an alternative with more features.)

Now, let’s test if Altair was installed properly. In your new notebook, import Altair with the following code:

import altair as alt

If there are no errors, congratulations – Altair is ready to be used.

Since Altair is predicated on directly working with data, it’s important to know how to import it. The vega_datasets package, installed earlier, gives access to a range of datasets for practicing, which is useful for learning the ropes. Here’s how to load a simple dataset:

from vega_datasets import data

cars = data.cars()

The above code imports a classic dataset about various cars and their characteristics. To make sure you’ve got it, run print(cars.head()), and you should see the first five entries of the dataset.

Finally, let’s create a basic chart to ensure everything is fully operational. Altair’s API is incredibly intuitive. With just a few lines, I put together a scatter plot:

chart = alt.Chart(cars).mark_point().encode(
x='Horsepower',
y='Miles_per_Gallon',
color='Origin',
tooltip=['Name', 'Year']
)

chart

That block of code right there is all it takes to create a simple, interactive scatter plot. When you run it, you should see beautiful, circular data points representing different cars on your notebook, colored by their origin, and when you hover over them, it’ll display the car’s name and year – pretty neat!

With these steps complete, you’ve successfully set up your environment for diving into Altair. The rest of this tutorial will get into the real fun: crafting your first chart, customizing it, and building interactivities that’ll surely impress anyone who sees your work.

Creating Your First Chart with Altair

Creating your first chart using Altair is a rewarding step into the world of Python data visualization. I’ll guide you through the process of constructing a basic bar chart, which is not only useful but also a solid foundation for crafting more complex visualizations later on.

Firstly, make sure you’ve got your data ready. I’ll use a simple example: a dataset of fruits and their corresponding counts. Imagine you’re a fruit seller tracking inventory. Here’s what the data might look like:

import pandas as pd

# Creating a simple DataFrame
data = pd.DataFrame({
'Fruit': ['Apples', 'Oranges', 'Bananas', 'Grapes', 'Peaches'],
'Count': [23, 17, 35, 29, 12]
})

With our data in a tidy DataFrame, we can start crafting a chart. Import Altair – make sure it’s installed via pip install altair – as I won’t cover the setup here.

import altair as alt

Altair’s API works wonders with its succinct and intuitive structure. Start by defining a Chart object, passing it your data, and use the mark_bar() method to indicate that we’re creating a bar chart.

# Initializing the chart object
chart = alt.Chart(data)

Note the chart variable; this will be your handle for adding various components to the chart.

Next, specify your axes by binding them to columns in your dataset. In Altair, this is done using the encode() function, where you assign the x and y channels to the data fields.

# Binding the axes to the dataset columns
chart = chart.mark_bar().encode(
x='Fruit',
y='Count'
)

Once you encode the axes, you’ve essentially instructed Altair on what to plot: ‘Fruit’ names on the x-axis and their ‘Count’ on the y-axis. Marvel at how easy that was.

Now, to actually see the fruit of your labor (pun intended), you need to display the chart. In a Jupyter notebook, merely typing chart suffices. In a script, you might want to save it as an HTML file, or use an appropriate renderer for your environment.

# Display the chart
chart

And there you have it – your first chart with Altair. The result should be a clear, straightforward bar chart displaying our hypothetical fruit inventory. If running this in a Jupyter notebook or an IDE that supports chart rendering, you’ll see a neat visualization pop up.

Getting into the habit of examining your chart after creation is good practice. Do the bars represent what you expected? Is the data correctly ordered? Such self-audits help you spot errors or reveal insights into the dataset you might have missed.

What you’ve just done is the first step in data storytelling. By visualizing the fruit counts, you’ve provided a snapshot of the inventory in a form that’s quicker to digest than a raw table of numbers.

Now that wasn’t so hard, was it? You’ve laid the groundwork for more intricate and informative visualizations while reinforcing the basics of using Altair. Stay curious, experiment with other mark types like mark_line() or mark_point(), and remember that each chart you build enhances your fluency in the language of data.

Advanced Chart Customizations and Interactivity

In advancing our journey into data visualization with Altair, I want to show you how to turn your static charts into interactive visual masterpieces. You’ll see that with a bit of extra code, you can enable users to filter, zoom, and modify the data presented in real-time, which makes for a powerful tool when exploring datasets and presenting findings.

Let’s begin by customizing our charts with some more advanced features. Suppose we’ve got a scatter plot. Here’s how we can add tooltips that display additional data when you hover over a point:

import altair as alt
from vega_datasets import data

cars = data.cars()

chart = alt.Chart(cars).mark_point().encode(
x='Horsepower:Q',
y='Miles_per_Gallon:Q',
color='Origin:N',
tooltip=['Name', 'Year', 'Horsepower', 'Miles_per_Gallon']
).interactive()

chart

Now, this is functional, but to bring our data to life, interactivity is key. Here’s how we can add zoom and pan options easily:

chart = chart.encode(
x=alt.X('Horsepower:Q', scale=alt.Scale(zero=False)),
y=alt.Y('Miles_per_Gallon:Q', scale=alt.Scale(zero=False))
).properties(
selection=alt.selection_interval(bind='scales')
)

chart

With selection_interval(bind='scales'), users can now click and drag to zoom into specific areas of the chart, and pan around to explore different sections of the data.

For a bit of dynamism, let’s introduce a dropdown menu that filters our scatter plot based on the origin of the cars:

input_dropdown = alt.binding_select(options=cars['Origin'].unique())
selection = alt.selection_single(fields=['Origin'], bind=input_dropdown, name='Country of ')
chart = chart.add_selection(
selection
).transform_filter(
selection
)

chart

A dropdown is created, and the chart updates based on your selection, quite captivating for users who want to compare data subsets.

One of the strengths of Altair is its declarative nature, which conveniently translates our intentions into stunning visuals. To demonstrate, let’s highlight specific data points with conditions. Imagine you’re interested in cars with high efficiency:

highlight = alt.selection(type='single', on='mouseover',
fields=['Miles_per_Gallon'], nearest=True)

base = alt.Chart(cars).encode(
x='Horsepower:Q',
y='Miles_per_Gallon:Q',
color=alt.condition(highlight, 'Origin:N', alt.value('lightgray'))
).add_selection(
highlight
)

base

Now moving the mouse over the data points brightens up those with high Miles_per_Gallon.

To wrap up our tutorial, I suggest experimenting with these customizations. Try combining filters, selections, and other chart types like bar, line, or area charts. The possibilities are nearly endless and incredibly engaging.

For further exploration, check out the Altair documentation or dive deeper with examples from the Altair GitHub repository.

Remember, the beauty of Altair lies in your creativity with the data. The charts are as eloquent as the story you’re trying to tell. So, go ahead, take these building blocks, and build something that not only you but also others will find insightful.