Book review: The Visual Display of Quantitative Information
The Visual Display of Quantitative Information, by Edward R. Tufte.
This book sounds kind of boring, but it’s actually the bible of how to represent data visually.
Okay, that may still sound boring.
However, if you consider that it’s hard to actually know something well without looking at the data, and that we are often far better at seeing data visually then we are at digesting it in tables, you’ll understand it may be helpful to understand when a visual display of data is actually deceiving you, and how to make your own displays of data better.
The core ideas I got from reading this book are: Think about what visual elements can be removed while still representing the data truthfully and in full. Think of how multiple sets of data relate to each other and think through the different types of visual displays to see what fits that best. Make sure to look for causes of exceptions in data patterns because that’s often what matters most.
An excellent graph is worth a million words
You may remember my praise for the speech by Hans Rosling in which he talks about the state of the world and how it’s changed over the last few decades (see my earlier post: http://www.mathoda.com/archives/54).
Professor Rosling used some incredible dynamic graphing software, created specifically to fit his needs, that allowed him to show a tremendous amount of data at the same time, and then show how it changed through time.
That software, called Gapminder, and the data Professor Rosling used is now easily accessible here, for anyone to play with. Definitely worth checking out: http://tools.google.com/gapminder/
Finding good statistics online, solved?
Yesterday I noted how hard it is to find good statistics online and then said:
Which got me to thinking that the Internet could really use a database that takes in all kinds of statistical information, that anyone can write or possibly overwrite (Wikipedia style), is searchable, and in which the information can be easily displayed. Something that isn’t so silo’ed inside organizations, but rather is a place everyone can publish to when they have useful statistics. While text and audio and video have all moved to the Internet, statistics seem to be lagging behind.
C’mon Internet, I’ll give you 2 years. Hurry up and make it. A whole world is out there, needing better analysis.
Well, it didn’t take the Internet 2 years. A few hours after I made my post a new website called Swivel went public, which is similar to what I called for. Nicely played, Internet, nicely played.
See a discussion of Swivel here (http://www.techcrunch.com/2006/12/05/swivel-to-launch-this-week-communitize-your-data/) and the website here (http://www.swivel.com). It still needs some good data to populate it, and could be refined a bunch, but it’s on the right track.
Hmm. What should I ask the Internet for next?
December 7, 2006 2 Comments
A need for better data on the Internet
In late October of 2006 I stumbled upon 1997 Unicef figures that showed street dwellers in Europe greatly exceeded street dwellers in the United States, as a proportion of their overall populations. Europe: 3 million street dwellers out of 460 million people (ratio of 1 to 153); USA: 750,000 people out of 300 million (ratio of 1/400).
I pondered the situation, asking “What explains this difference? Is it that a freer form of market capitalism has allowed the USA to more productively generate wealth and provide housing to its people, allowing for less destitute people, despite (or because of?) a social safety net that isn’t as rigorous? Is it related to the density of people per square mile? If I can find data, comparing the ratio of street dwellers to population in affluent Western European countries seems like it would reveal something interesting.
The critiques I received from my friends John Maris and Forrest Roche were that (a) the 1997 UNICEF data I looked at may be outdated, with more recent studies perhaps suggesting US homeless is 3.5 million, while German goverment data might mean European homeless has decreased, (b) how street dwellers are defined matters alot, (c) one really needs to compare the wealthy European countries to America and not all of Europe to America, and (d) Europe does a better job of equitable income distribution.
I’ve looked at more data and have confirmed my initial suspicion that the statistics vary a lot based on who is doing the measuring, and what they are measuring. Some statistics measure chronic homelessness, others temporary homelessness. None of the studies I read talk about potentially interesting questions, like what is the quality of the homes being provided to the just-out-of-homelessness?
One benefit of my initial use of the UNICEF study was that the Europe vs USA comparison was consistent in what was being measured, and was the same organization doing the measuring. Unfortunately it is old, and I haven’t found a more current international comparison.
Finding any relevant statistics was actually surprisingly difficult, given how good the Internet is generally. I did find that UN-HABITAT in 2004 indicated European homeless as 3 million. Different sources for homelessness in the US show anywhere from 150,000 chronically homeless to 847,000 homeless per year, to 3.5 million homeless per year. (The 3.5 million figure is based on people who fall into homelessness at least once that year, even if just for a day). What might be a more meaningful statistic is a comparison of how many days homeless someone stays in Europe versus the USA.
Which got me to thinking that the Internet could really use a database that takes in all kinds of statistical information, that anyone can write or possibly overwrite (Wikipedia style), is searchable, and in which the information can be easily displayed. Something that isn’t so silo’ed inside organizations, but rather is a place everyone can publish to when they have useful statistics. While text and audio and video have all moved to the Internet, statistics seem to be lagging behind.
C’mon Internet, I’ll give you 2 years. Hurry up and make it. A whole world is out there, needing better analysis.