# Seeing Data

## Plotting geographic data on a world map with PythonChris McDowallAug 12

Last week I promised to write a blog post detailing how I created this public transport animation. On reflection, it’s a topic best dealt with over a few sessions. Let’s start simple. How might you plot lots of geographic data on a map? In this post I will show you how to programmatically create a map of the World’s top ten most populated cities. It will end up looking something like this.

This tutorial involves programming with the Python language and some additional modules. If you have never programmed before it might be a tad confusing. I suggest you first read one of the many fine introductory Python tutorials to get your head around the language. The rest of this post will assume that you understand how to install and import modules, how to write and run a script and the fundamentals of Python data structures.

Before we begin you will need to install the following:

• Python 2.5, 2.6 or 2.7.
• NumPy (a Python extension that adds support for multi-dimensional arrays and a host of whiz-bang high-level mathematical operations).
• Matplotlib (a flexible 2D plotting library for Python that “tries to make easy things easy and hard things possible”).
• Basemap (a Matplotlib extension for plotting data on geographic projections).

Right, let’s make a map. Wikipedia has a list of World metropolitan areas by population that can serve as a nice test dataset. I will demonstrate how to turn this data into a labelled proportional symbol world map.

First we need to import our various libraries:

from mpl_toolkits.basemap import Basemap
import matplotlib.pyplot as plt
import numpy as np

Next we will set up our map. Basemap supports many different geographic projections. For this exercise I am using the Robinson projection.

# lon_0 is central longitude of robinson projection.
# resolution = 'c' means use crude resolution coastlines.
m = Basemap(projection='robin',lon_0=0,resolution='c')
#set a background colour
m.drawmapboundary(fill_color='#85A6D9')

Basemap comes packaged with some global geographic data at four resolutions: ‘crude’, ‘low’, ‘medium’ and ‘high’. Let’s draw the continent and country datasets using the ‘crude’ outlines.

# draw coastlines, country boundaries, fill continents.
m.fillcontinents(color='white',lake_color='#85A6D9')
m.drawcoastlines(color='#6D5F47', linewidth=.4)
m.drawcountries(color='#6D5F47', linewidth=.4)

We can also ask Basemap to draw lines of longitude and latitude.

# draw lat/lon grid lines every 30 degrees.
m.drawmeridians(np.arange(-180, 180, 30), color='#bbbbbb')
m.drawparallels(np.arange(-90, 90, 30), color='#bbbbbb')

Populate three arrays of equal length with the latitude, longitude and population values for each city. Normally we would read the data from a file, database or service but in the interest of simplicity I have typed them directly into the script.

# lat/lon coordinates of top ten world cities
lats = [35.69,37.569,19.433,40.809,18.975,-6.175,-23.55,28.61,34.694,31.2]
lngs = [139.692,126.977,-99.133,-74.02,72.825,106.828,-46.633,77.23,135.502,121.5]
populations = [32.45,20.55,20.45,19.75,19.2,18.9,18.85,18.6,17.375,16.65] #millions

Use our basemap object to convert the latitude/longitude values into map display coordinates.

# compute the native map projection coordinates for cities
x,y = m(lngs,lats)

Multiply each population by itself to create a scaled list of values. These will be our circle display sizes.

#scale populations to emphasise different relative pop sizes
s_populations = [p * p for p in populations]

Use the matplotlib scatter function to plot the circles. Note the use of the zorder parameter. This ensures that the scattered circles will be rendered on top of the continents.

#scatter scaled circles at the city locations
m.scatter(
x,
y,
s=s_populations, #size
c='blue', #color
marker='o', #symbol
alpha=0.25, #transparency
zorder = 2, #plotting order
)

Loop though the unscaled population values and the display coordinates. Label each circle with the city population rounded to the nearest million people.

# plot population labels of the ten cities.
for population, xpt, ypt in zip(populations, x, y):
label_txt = int(round(population, 0)) #round to 0 dp and display as integer
plt.text(
xpt,
ypt,
label_txt,
color = 'blue',
size='small',
horizontalalignment='center',
verticalalignment='center',
zorder = 3,
)

#add a title and display the map on screen
plt.title('Top Ten World Metropolitan Areas By Population')
plt.show()

Run your script and you should see a map. Lovely. I recommend you experiment further by tweaking parameters, importing other datasets and using alternative plotting methods.

This is the basic overview of how I typically plot geographic data in Python. Next time I’ll take this a step further and show you how to map, export and animate temporal geographic data.

## Animating Auckland’s public transport network – Take TwoChris McDowallAug 04

In late January I created an animation of Auckland’s public transport network with data from the MAXX Auckland transport Google transit feed. As I noted in the post, there were several issues with the video. I carved out a little time this week to revise the animation and I am happy with the progress. The new video is embedded below.

There is a lot of detail in this animation. It’s best viewed in high definition with fullscreen mode on.

Version Two distinguishes between buses (teal), ferries (blue) and trains (red). I also tidied some of the more obvious errors with the ferry route geometry data. This involved manually tweaking the route geometries stored in the transit feed “shapes.txt” table for many of the ferries. I still need to adjust a few (I’m looking at you, Rangitoto Island). I have not corrected the bus data. The main issue with bus data are the erroneous harbour crossings from the North Shore to the Auckland CBD.

Next week I will provide an overview of how the animation was created.

## How do you Visualise a Conversation?Chris McDowallJun 24

Over the next couple of days I am helping run an online event called Magnetic South – an online game about the long term future of Christchurch. It is part of Christchurch City Council’s Share an Idea suite of initiatives and will input to the development of the Central City Plan. The game is being run on the ‘Foresight Engine’, a platform created by the Institute for the Future to support people thinking together about issues that are important to them in a way that is both productive and fun. If you want to explore possible futures for Christchurch then please create a player and get started!

A big challenge for the Magnetic South team involves making sense of the many micro-forecasts people contribute during the event. Previous Foresight Engine events have produced thousands of individual contributions. The nature of the game is such that these snippets of texts should not be read in isolation. Most player contributions build off earlier forecasts and, in turn, prompt additional responses. To unpack what happened during the Magnetic South event the various micro-forecasts need to be contextualised.

To assist our analysts I have written some code to visualise the structure of the many conversations that occur. My initial design represented each conversation as a sideways tree constructed from squares and lines. Each square represents a contribution someone made to the conversation. Filled squares represent contributions from the person who started the conversation. Unfilled squares represent contributions made by other participants. Lines connect related contributions — the square on the right being a response to the square on the left.

Here is a very simple example based on data created during an earlier Foresight Engine event that explored a future where water is as expensive as energy. The example conversation consists of just three comments. Read from left to right, someone kicks off the conversation with a Dark Imagination forecast. The unfilled square represents someone else responding to the original forecast at some point in the event. The rightmost square signifies original forecaster replying. There were no further contributions to this conversation.

Below is a slightly more complex example. Someone makes a micro-forecast concerning Capturing all sources of energy that are currently not very considered: e.g. muscle power at gyms (rowers etc.), heat from car engines…’ This prompts two responses, one of which (‘They are here and now. The Piezoelectric powered Disco in London is my favourite example’) leads to an extended back and forward exploration of the idea.

Although they showed the scale of the conversation, there was no way to know how many people took part. Yesterday afternoon I added some new code to add numeric codes into the boxes. Each time a new person enters a conversation they get a number. The person who started the conversation is number 1. The first person to respond is number 2… and so on. This makes it possible to see things like “how many people were involved in the conversation” or “was this conversation dominated by a couple of people, or are there are a whole heap of folks chipping in”? Here is the same conversation rendered with the new code.

It is now possible to see that that five people took part in the conversation. Most of the exchanges actually occurred between the second and the third participants. A couple of other people (numbers 4 and 5) made a single contribution.

I want to emphasise that these figures are not intended to be read in isolation from text generated during the game. Instead they are tools to provide researchers with an overview of the structure of the many conversations that occur to help them navigate and contextualise the raw text. Specifically, I am attempting to convey the scale, complexity and rhythm of the various exchanges that happen inside the event. Throughout today’s Magnetic South event I will create similar diagrams and combine them with textual descriptions and share them through the game blog. Please get involved  we would love to hear your ideas on the future of Christchurch!

Here are some more diagrams that I created from the E=H20 event data. Even in the absence of text, they convey a sense of what happened.

## Visualising the New Zealand Budget 2011 with TreemapsChris McDowallMay 20

It is really difficult to grasp the significance of lots of big numbers. It is even trickier when the numbers are organised in a hierarchy. For example, yesterday afternoon Bill English, the Minister of Finance, delivered his third budget, outlining the nation’s revenues and expenses. The budget includes details such as how the government plans to spend \$21 billion dollars on social development in the coming year of which \$9.5 billion will be spent on superannuation, almost \$1.9 billion on the domestic purposes benefit, nearly \$1.6 billion on accommodation assistance… and I’m already lost.

Treemapping is a technique for visually comparing groupings of numbers. A treemap represents a hierarchy of numeric values as a set of nested shapes – usually rectangles. I’m fond of this technique for several reasons, not least because it plays on existing associations. Big rectangles represent big numbers. Rectangles inside other rectangles indicate that one thing is part of another thing. Rectangles can even be shaded to depict an additional variable, for example, an increase or decrease over time.

Last night, with the assistance of Keith Ng, I created an interactive treemap of the New Zealand budget. It is available here. I encourage you to have a play with it. Unfortunately, given the large volume of data that needs to be processed and rendered, the visualisation struggles on every browser except Google Chrome. If you have Chrome on your computer, I highly recommend that you use it to view the visualisation.

There are two levels to the treemap. The top level shows the breakdown of various expenditure areas. The bigger the rectangle, the more money is being spent. Green rectangles indicate that spending has significantly increased since the last budget; red rectangles indicate a decrease. When you click on any spending area, the visualisation will dive into that area and show a detailed breakdown.

If you are interested in learning more about treemapping, I suggest you read Ben Shneiderman’s account of how he developed this class of visualisation and Thomas Kerwin’s survey of treemap techniques. Wikipedia has handy list of software if you want to create your own treemaps.

## Ring of Fire – Animated Map of World Earthquakes (Jan 1 – Mar 12, 2011 GMT)Chris McDowallMar 13

This video was made very quickly and could use some work. I post it here in case you find it interesting. I suggest you watch it full-screen, in high definition with scaling off.

The animation depicts two and half months of 2011 USGS earthquake data. Blue circles represent deep seismic activity recordings (>= 40km deep). Red circles represent shallow seismic events (<40km deep). Each event leaves behind a red dot to show the overall pattern. The animation ends the day after the 8.9 quake that hit Japan on March 11 and includes the shallow 6.3 Christchurch quake.

I am keen to spend some time improving this animation. Perhaps I will find some time in the coming weeks. I intend to release the programming code as an open source project – I would love to see the community build on this stuff.

## Mapping a Day in the Life of TwitterChris McDowallNov 25

Program a map to display frequency of data exchange,
every thousand megabytes a single pixel on a very large screen.
Manhattan and Atlanta burn solid white.
Then they start to pulse, the rate of traffic threatening to overload your simulation.
Cool it down. Up your scale.
Each pixel a million megabytes.
At a hundred million megabytes per second, you begin to make out certain blocks
in midtown Manhattan, outlines of hundred-year-old industrial parks
ringing the old core of Atlanta...

William Gibson, Neuromancer

Last week I hooked a computer up to the Twitter data streaming API and, over the course of a day and a bit, grabbed every tweet that had geographic coordinates. I wrote a Python script to parse the 2GB of JSON files and used Matplotlib with the Basemap extension to animate 25 hours of data on a world map. The resulting animation plots almost 530,000 tweets — and remember these are just tweets with geo-coordinates enabled.

I recommend you full-screen this video, turn scaling off and high definition on.

The animation begins at 5am on November 18, Greenwich Mean Time (United Kingdom). This corresponds to midnight Eastern Standard Time, 9pm Pacific Time (Nov 17) and 6pm in New Zealand (Nov 18).

There are many interesting things to notice. Here are a few:

• It is possible to infer the passage of the sun across the map as data begins to stream out of mobile phones and desktops and previously dark patches of the map begin to glow white.
• At 8:00, 9:00 and 10:00 GMT waves of tweets pass across the United States from East to West. This is an automated Twitter service that tweets local news for specific ZIP codes.
• Turn your attention to Indonesia. Jakarta glows as brightly as New York and San Francisco.
• Note the black spots. With the exception of a few cities, such as Lagos and Johannesburg, Africa remains the dark continent.

Each frame of the animation represents one minute of tweets. The animation runs at ten frames per second. I represent each tweet as a small white circle at two percent opacity. At the moment a tweet occurs I plot it at ten point size. Every minute that passes I drop the marker size by one point until it disappears.

Many thanks to Pierre Roudier for his sage counsel and bug spotting skills. I will post videos focusing on particular parts of the world over the coming days.

## Friday morning video — Watching Packets FlyChris McDowallNov 12

This video by Carlos Bueno below depicts in slow-motion what happens when one computer requests an image from another computer over the Internet. I think it’s really neat.

Imagine that the big grey circle on the left is your computer and the circle on the right is some website that you visited. Each flying circle represents a unit of data called a packet. The small green circles represent control packets. The larger blue ones are data packets. In this case it’s an image broken down into many tiny parts.

The exchange begins with a handshake that establishes the rules for communication. We then see a slow ramping up of data being transferred between machines before the full speed download begins. While you watch the video, it might help to imagine the following conversation taking place.

Client: "Hey, Server. Are you there?"
Server: "Yup. I'm here."
Client: "Can you give me that image of a cat playing the piano?"
Okay. Here's some of the image of a cat playing the piano."
Client: "Got it!"
Server: "Great.  Here's some more of the image".
Client "Got it!"

…and so on.

Many thanks to Aaron Hicks for his technical knowledge and fine narration of a hypothetical exchange between machines.

## Sienna Latham on Alchemy, Women and Data VisualisationChris McDowallNov 10

This evening’s post is a guest contribution from my wife, Sienna Latham. She writes below about the role of data visualisation in her historical research.

I recently submitted my master’s thesis on English women’s chymical activities during the reign of Elizabeth I (1558-1603), exploring the intersecting histories of science, medicine, magic, women and religion. ‘Chymistry’ is a composite term acknowledging the fact that no clear, consistent distinctions were drawn between alchemy and chemistry in the early modern era. Like their male compatriots, my subjects harnessed chymical theories and techniques for both esoteric and pragmatic purposes. They practiced iatrochemistry (medical alchemy), incorporated chymical metaphors into creative works and sought the fabled philosopher’s stone, which promised both riches and freedom from disease. It’s useful to remember that science as we know it did not exist for Elizabethans, who engaged with God’s creation as natural philosophers and explorers; like scripture, the ‘Book of Nature’ was a potential source of divine knowledge and revelation.

So what does historical research have to do with data visualization? Not that much. Evidence of their low-profile relationship is mostly geographical, with maps hinting at the very different worldviews of the long-dead. Genealogical trees have been used to articulate family histories for centuries, while scanned manuscripts and printed images elucidate the corresponding text and ostensibly speak for themselves. That’s about it for the sorts of figures you would expect to find in a history thesis. But as I approached the end of the writing and revision process, it became clear that the complex social and familial relationships I had spent months examining required a different kind of visual representation. I’d expected variety in my subjects’ locations, beliefs, practices and communities, so the patterns that emerged were all the more striking, suggesting the influence of the queen’s own chymical affiliations. Each woman was a Protestant member of the gentry. More significantly, these gentlewomen all had strong ties to the court, known chymists and, indeed, to one another. But how best to convey these connections I had documented throughout the thesis? I turned to my husband Chris for help in compiling and representing this early modern social network.

He asked me to create a spreadsheet delineating my subjects’ relationships to each other, Queen Elizabeth, and the English chymical community surrounding John Dee, mathematician, magus and astrologer to the queen. By way of example, the heiress Margaret Hoby, author of the earliest known diary by an Englishwoman, ties everyone together quite neatly with her three marriages. She first wed Walter Devereux, the younger brother of Robert, who married Frances Walsingham, cousin to Grace Mildmay. Her second husband, Thomas Sidney, was the brother of Mary Sidney Herbert – incidentally, their late brother Philip, the most famous of the Sidneys, had been Frances’ first husband. Through her third husband, Thomas Posthumous, Hoby was linked to Margaret Clifford, whose brother John Russell married her mother-in-law. Though she lived in remote Yorkshire, Margaret Hoby’s childhood training in the home of Henry and Katherine Hastings as well as her close relationship with the chymist John Thornborough and his second wife connected her to the queen. As you can see, multiple marriages, shorter life expectancies, the importance of extended family and courtly affiliations led to extremely complicated relationships.

I used Google Docs to share this spreadsheet with Chris, who performed digital alchemy (er, wrote some code) and created the diagram above. For simplicity’s sake, we omitted the nature of the relationships; I had already described these in the text and felt that the figure’s emphasis should be on the community it portrayed. He assigned a color to each of my subjects and coded the links. In most cases, the connections between these women are indirect, as described earlier. However, direct links display both colors, as you can see with Mary Sidney Herbert and Margaret Clifford’s ties to the queen, whose central location underscores the importance of the court. Family members and peripheral figures appear in grey. While this representation does not contain any new information, it provides a clear and concise reference for the network of relationships described in the pages it follows and precedes. This image also clearly supports my assertion that community played a vital role in the transmission of chymical knowledge during Elizabeth’s reign. These particular women gained access to a male-dominated realm in part because they knew the ‘right’ people. They engaged with a royal court convinced of chymistry’s efficacy (though oftentimes painfully aware of its practitioners’ shortcomings) and skillful at appropriating its evocative images and metaphors for their own purposes, including the queen’s iconography.

A few accessible recommendations for anyone new to early modern magic and science and keen to read more: Keith Thomas’s seminal Religion and the Decline of Magic, Deborah Harkness’s fascinating The Jewel House and Charles Webster’s recent biography on Paracelsus, the itinerant Swiss physician-prophet. See also reasonably affordable works on chymistry by Allen Debus, Leah DeVun, William Newman, Tara Nummedal, Lawrence Principe and, so long as you balance them with other perspectives, Frances Yates. My supervisor Glyn Parry has an excellent book on John Dee due out soon through Yale University Press. For those on a budget – and those who aren’t – Adam McLean’s website is a tremendous resource for anyone curious about alchemy.

## The Zen of Open DataChris McDowallOct 12

This morning I was writing code in a programming language called Python. I hit a sticky problem and turned to an arcane feature of the language known as the “The Zen of Python” for guidance. There I read the words, “In the face of ambiguity, refuse the temptation to guess,” and I was enlightened.

Open data has been on my mind lately. Open data is a philosophy and practice advocating that data should be freely available to everyone, without restrictions. Following the experience I related above, I began to wonder what “The Zen of Open Data” might look like. I wrote something over morning tea that tries to boil down all the stuff I have heard and read on the topic over the past two years and posted it to the New Zealand Open Government Ninjas forum.

I share it below for anyone who might be interested.

Image by Sienna.

The Zen of Open Data, by Chris McDowall

Open is better than closed.
Transparent is better than opaque.
Simple is better than complex.
Accessible is better than inaccessible.
Sharing is better than hoarding.
Linked is more useful than isolated.
Fine grained is preferable to aggregated.
(Although there are legitimate privacy and security limitations.)
Optimise for machine readability — they can translate for humans.
Barriers prevent worthwhile things from happening.
‘Flawed, but out there’ is a million times better than ‘perfect, but unattainable’.
Opening data up to thousands of eyes makes the data better.
Iterate in response to demand.
There is no one true feed for all eternity — people need to maintain this stuff.

Many people inadvertently contributed to this text. One particularly strong influence was a panel discussion between Nat Torkington, Adrian Holovaty, Toby Segaran and Fiona Romeo at Webstock’09.