Vis — Scientific Data Visualization, in Python and R

Learn why a good chart is good — then build it yourself, in matplotlib and ggplot2.

Welcome

This is an interactive lesson on scientific data visualization. Every idea is taught twice — once in Python (matplotlib) and once in R (ggplot2) — so you can follow it in whichever language you already know, and compare.

A chart is communication — formidling. You have found something in your data, and the chart’s only job is to carry that finding to a reader as clearly and honestly as possible. That framing drives every choice in this lesson: pick the form that answers the question, show the data honestly, state the takeaway, and make it readable by everyone. Accessibility — colourblind-safe colour, a second cue besides colour, a written takeaway, the numbers in a table, and a screen-reader description on every figure — isn’t a separate topic here; it’s simply part of communicating well, because a finding only half your audience can read is only half communicated.

Everything runs in your browser. The first time you run a Python or R cell, the language downloads itself (a few seconds), then your code executes locally — nothing is sent to a server. Edit any cell and re-run it to experiment.

We use one dataset throughout: the Palmer Penguins — body measurements for three penguin species. Both languages read the exact same CSV file, so the two tracks line up cell for cell.

Setup

The two cells below load each language’s tools and read the shared data file (data/penguins.csv). They run automatically when the page loads. Each prints the first few rows so you can see we are working from identical data.

Python

R

Same data, both languages

Both cells read data/penguins.csv — the identical file, mounted into each in-browser engine. Any difference you see later is about the chart, never the data.

Principles

A chart is an argument made with ink. These six ideas run through every example below; you’ll see each one break in a weak chart and work in a strong one.

Choose the chart by intent. Decide what question the reader should be able to answer at a glance, then pick the form that answers it:

Intent	Question	Typical chart
Relationship	Do two variables move together?	scatter
Comparison	Which category is bigger?	bar
Distribution	How is one variable spread out?	histogram / density
Trend	How does something change over time?	line
Part-to-whole	How do pieces sum to a total?	stacked bar

Pick the wrong form and even correct data misleads.

Honest axes. Start bar charts at zero; don’t truncate or distort a scale to exaggerate a difference. The geometry should be proportional to the numbers.
Direct labelling over legends. A legend makes the eye bounce between a key and the data. Put the label on the thing it names whenever you can.
A title that states the takeaway. “Bill length vs depth” names the axes you can already read. “Bill shape separates the three species” tells the reader what to conclude.
Declutter. Every gridline, border, and tick competes with the data for attention. Remove what doesn’t help the reader answer the question.
Colourblind-safe palettes. Red/green is the single most common confusion (~8% of men). We use viridis — perceptually uniform and safe — in every strong chart.

Chart 1 — Don’t hide your data behind a summary

The problem. A boxplot crushes a whole distribution into five numbers (median, quartiles, whiskers). That’s tidy — but it can hide the very things you care about: a gap in the middle, a skew, a small sample. Strikingly, two completely different distributions can produce the same box (Weissgerber et al., 2015). So a summary is a starting point, not the whole story.

The weak version — a bare boxplot

Body mass by species. It reads cleanly, but you’re trusting five summary numbers per group: you can’t see the shape of each distribution or how many penguins sit behind each box.

The strong version — show the distribution and the data

Keep a summary, but add two layers: a violin (a mirrored density curve — the fatter it is, the more penguins at that mass) and the raw points, jittered sideways so they don’t stack. Now you see the summary, the shape, and every data point — a lightweight “raincloud” plot (Allen et al., 2019).

New ggplot2 pieces: geom_violin() draws the density; geom_jitter() is geom_point() with a small random horizontal nudge so points don’t overlap; stat_summary(fun = median, ...) drops a dot at each group’s median. In matplotlib the equivalents are ax.violinplot() plus an ordinary scatter whose x-values we jitter by hand.

What changed, and why

Violin + points replace the bare box: the shape and the sample size are now visible, not just five numbers. The median dot keeps the one summary worth keeping.
Viridis, colourblind-safe. Here species is also on the x-axis, so colour is a bonus signal, not the only one — which is exactly why we don’t also vary the shape. Redundant encoding earns its place only when colour would otherwise be the sole cue (you’ll see shape used as a backup cue in Chart 4). Reaching every reader is part of communicating well, not a bolt-on.
No legend — the x-axis already names each group (direct labelling).
Title states the finding.

What this shows: Gentoo penguins are clearly the heaviest (median ≈ 5000 g), while Adelie and Chinstrap overlap heavily around 3700 g — something the bare boxplot’s tidy boxes understated.

The numbers behind it

For readers using a screen reader — and anyone who just wants the values — here is the same result as a table. (quarto-live renders a returned data frame as an HTML table automatically, so no extra package is needed.)

Quick check — distribution honesty. You’ve just seen why a tidy boxplot can still mislead. Test the idea before moving on:

A reviewer says your boxplot of body mass by species is fine. Why might you still prefer a violin or raincloud plot?

a) The boxplot can hide multimodality and the real shape of the distribution; a violin or raincloud shows it.

b) Violin plots are always more accurate than boxplots.

c) Boxplots can't show the median or quartiles.

d) The boxplot uses the wrong colour palette.

Chart 2 — Compare across groups without clutter

The problem. Put several groups in one panel and they overplot — and you end up leaning on colour alone to tell them apart. That’s hard for anyone to read, and impossible for a reader who can’t distinguish the colours. Small multiples — the same chart repeated once per group — give each group room to breathe (Tufte, 1983).

The weak version — everything in one panel

Bill length vs depth for all three species at once, separated only by colour (and a red/green palette at that). The clouds overlap, and colour is the only thing telling species apart.

The strong version — small multiples

Give each species its own panel, with the same axes across all three, so you compare shapes directly without untangling colours. Because the panel itself names the species, colour is no longer doing essential work — so we drop it to a single ink-saving hue.

New ggplot2 piece: facet_wrap(~ species) splits the plot into one panel per species and, by default, keeps the axes identical across panels. In matplotlib, plt.subplots(1, 3, sharex=True, sharey=True) gives the same effect — a row of panels sharing their scales.

What changed, and why

One panel per species (small multiples) — no overplotting; you read three simple clouds instead of one tangled one.
Shared axes across panels, so differences in position are real, not an artefact of each panel rescaling itself.
Colour dropped to one hue. Separating by panel means colour was no longer the only cue, so keeping it would be pure decoration. (When you must keep groups in one panel, add shape as a backup cue — that’s Chart 4.)

What this shows: the three species occupy distinct, barely-overlapping bands of bill shape — which is exactly why a penguin’s bill measurements alone classify its species so well.

Chart 3 — Never show a mean without its spread

The problem. A bar of group means is one of the most common charts in science — and one of the most misleading. It squeezes every group to a single height and hides how spread out the data are; two very different distributions can produce the same bar (Weissgerber et al., 2015). The mean is only half the story — the spread is the other half.

The weak version — a bar of means

Mean flipper length per species. Three tidy bars, but you can’t tell whether the penguins cluster tightly around each mean or vary widely, nor how much the groups overlap.

The strong version — mean and spread

Plot the raw penguins (jittered), then lay the mean on top with an interval of ± 1 standard deviation. Now the reader sees the average and how much the data scatter around it — including where the groups overlap.

New ggplot2 piece: we precompute a small summary table (mean and SD per species) and draw it with geom_errorbar() (the interval) and geom_point() (the mean), passing inherit.aes = FALSE so those layers read the summary table instead of the raw data. In matplotlib, ax.errorbar(..., yerr=sd) draws the same mean-with-interval marker.

What changed, and why

Raw points + mean + interval replace the bare bar: the average is still there, but now so are the spread and the overlap between species.
± 1 SD shows the spread of the data — not the uncertainty of the mean. A standard error would be far narrower and answer a different question; pick the interval that matches the claim you’re making.
Viridis; species is also on the x-axis, so colour is a bonus cue.
Title states the finding.

What this shows: Gentoo flippers are clearly longest, but Adelie and Chinstrap overlap heavily — a bar of means made all three look cleanly distinct when two of them barely are.

Quick check — spread vs. uncertainty. The interval you draw makes a claim. Make sure it’s the claim you mean:

You want your error bars to show how much individual penguins vary in body mass within each species. Which do you plot?

a) ± 1 standard deviation

b) ± 1 standard error of the mean

c) ± 1 standard error — it's the more rigorous choice

d) a 95% confidence interval

Chart 4 — Make the finding the point

The problem. A technically-correct chart can still be mute: it shows the data but leaves the reader to work out what matters. Communicating a result (“formidling”) means doing that work for them — say the finding in the title, highlight the group it’s about, and point at it.

The weak version — clean, but it says nothing

Flipper length against body mass. It’s a perfectly tidy scatter showing a positive relationship — but the title just names the axes, and every point looks equally important. What should the reader take away?

The strong version — argue the conclusion

Same data, but now it makes a case: Gentoos are pushed to the front (in colour and a different marker shape), everyone else recedes to grey, the title states the finding, and an arrow points straight at it.

This is where redundant encoding earns its place. In one panel the highlighted group is told apart from the rest by colour and by shape — so the emphasis survives for a colourblind reader, or in greyscale print.

New pieces: in matplotlib, ax.annotate(text, xy=…, xytext=…, arrowprops=…) draws the labelled arrow; in ggplot2, annotate("curve", …, arrow = …) does the same (grid::arrow() shapes the arrowhead — grid ships with R, no install).

What changed, and why

Title states the finding, not the axes.
The key group is highlighted; the rest recede to grey context, so the eye goes where the argument is.
Redundant encoding: the highlight is carried by colour and shape together — never colour alone — so it holds up for colourblind readers and in greyscale.
A direct annotation names the finding right on the chart, no caption-hunting.

What this shows: a clean chart and a persuasive one can use the identical data — the difference is whether you make the reader find the point or hand it to them.

Quick check — when to double up cues. Redundant encoding (colour and shape) earns its place only sometimes:

When should you encode a category with both colour and shape (redundant encoding)?

a) Only when colour is the sole cue carrying that information — e.g. groups distinguished only by colour, not also by position.

b) Always — more cues are always more accessible.

c) Never — colour alone is fine as long as the palette is colourblind-safe.

d) Only when you have more than three categories.

Your turn — exercises

Now you apply the principles. Each exercise starts from a flawed chart. Edit the code, click Run Code to see your result, and use Hint or Show solution if you get stuck. The data (penguins) and libraries are already loaded from the setup cells above.

Exercise 1 — Make it accessible, twice over (R)

This scatter leans entirely on a hand-picked red / green / blue palette — the single most common colourblind confusion, and colour is the only thing telling species apart. Fix both: (1) swap to a colourblind-safe viridis scale, and (2) add a second cue — map shape to species too — so the chart still works in greyscale or for a colourblind reader. Run it, then check the hint and solution.

# (1) ggplot2 has viridis scales built in: scale_colour_viridis_d().
# (2) Add shape to the aes() so each species also gets its own marker:
#     aes(bill_length_mm, bill_depth_mm, colour = species, shape = species)

ggplot(penguins,
       aes(bill_length_mm, bill_depth_mm, colour = species, shape = species)) +
  geom_point() +
  scale_colour_viridis_d()

Exercise 2 — Add a takeaway title and labels (Python)

This scatter shows a real relationship, but it says nothing. Give it axis labels with units and a title that states the takeaway (what the reader should conclude — not just the variable names).

fig, ax = plt.subplots()
ax.scatter(penguins["flipper_length_mm"], penguins["body_mass_g"],
           color="#2C728E", alpha=0.6)
ax.set_xlabel("Flipper length (mm)")
ax.set_ylabel("Body mass (g)")
ax.set_title("Penguins with longer flippers are heavier", fontweight="bold")
for s in ["top", "right"]:
    ax.spines[s].set_visible(False)
plt.show()

Exercise 3 — Pick the right chart type (R)

Someone used a line chart to compare counts across species. A line implies a trend along a continuous axis — but species is a category, so a line drawn between them is meaningless. Switch to the chart type built for comparing categories.

counts <- as.data.frame(table(species = penguins$species))

ggplot(counts, aes(species, Freq)) +
  geom_col()

Capstone — Judging an AI-made chart

Producing a chart is nearly free now: describe what you want and an AI writes the plotting code in seconds. That doesn’t remove the work — it relocates it. The hard part was never typing geom_point(); it’s the judgment around it — choosing the right chart for the question, keeping it honest, making it readable by everyone, and saying what it shows. An AI does exactly what you ask and stops there: unless you tell it otherwise, it won’t pick a colourblind-safe palette, won’t trade the legend for direct labels, won’t write a takeaway title, and — most importantly — won’t warn you when the chart type is wrong for your question.

So the durable skill isn’t making charts; it’s evaluating and repairing them. (A capable model will do most of this if you ask for “an accessible, publication-quality figure” — but you have to know to ask, and how to check what comes back.) Let’s judge a realistic example.

What the AI handed you

Here is a faithful version of what a plain request — “plot bill length vs bill depth by species” — gives you in each language. Nothing here is broken. Run them.

Judge it against what the lesson taught

The AI got the easy calls right: a scatter is the correct form for a relationship, and the axes aren’t distorted. But every judgment call is missing:

Not colourblind-safe. ggplot2’s default hues (and matplotlib’s default colour cycle) aren’t safe — and colour is the only thing separating the species.
A legend, not direct labels — your eye ping-pongs between the key and the points.
No takeaway title — it names the axes you can already read.
Mild overplotting — points sit on top of one another at full opacity.
The ggplot version even prints a “Removed 2 rows” warning: it silently dropped the penguins with missing bills and didn’t flag it for you.

None of these are exotic. They’re the gap between code that runs and a chart that communicates.

The same chart, judged and fixed

Now apply the lesson: a viridis palette, a second cue (marker shape) so colour isn’t load-bearing, direct labels instead of a legend, a takeaway title, and a little alpha for the overplotting. (In Chart 2 we fixed a busy single panel by faceting; this is the other route — keep one panel, but make every encoding pull its weight.)

Where to go next

Re-read the Principles with these exercises in mind.
Try the other language for an exercise you just solved — the ideas transfer, only the syntax changes.
Swap in your own CSV (same columns) and see which principles still apply.