Last week, I created revised versions of graphs of car crash statistics by state (including an interactive version), from a post by Mona Chalabi at 538.
Since I was working on those at the last minute in the middle of the night, to be included as an example in a lecture on creating effective figures and tables, I just read the data off printed versions of the bar charts, using a ruler.
I later emailed Mona Chalabi, and she and Andrew Flowers quickly posted the data to github.com/fivethirtyeight/data
. (That repository has a lot of interesting data, and if you see data at 538 that you’re interested in, just ask them!)
I was curious to look at how I’d done with my measurements and data entry. Here’s a plot of my percent errors:
Not too bad, really. Here are the biggest problems:
-
Mississippi, non-distracted: off by 6%, but that corresponded to 0.5 mm.
-
Rhode Island and Ohio, speeding: off by 40 and 35%, respectively. I’d written down 8 and 9 mm rather than 13 and 14 mm.
-
Maine and Indiana, alcohol: wrote 15.5 and 14.5 mm, but typed 13.5 and 13 mm. In the former, I think I just misinterpreted my writing; in the latter, I think I wrote the number for the state below (Iowa).
It’s also interesting to note that my “total” and “non-distracted” were almost entirely under-estimates: probably an error in the measurement of the overall width of the bar chart.
Also note: @brycem had recommended using WebPlotDigitizer for digitizing data from images.