Introducing Clarify
An Open Source Elections-Data URL Locator and Parser from OpenElections
By Geoff Hing and Derek Willis, for Knight-Mozilla OpenNews Source Learning
State election results are like snowflakes: each state—often each county—produces its own special website to share the vote totals. For a project like OpenElections, that involves having to find results data and figuring out how to extract it. In many cases, that means scraping.
But in our research into how election results are stored, we found that a handful of sites used a common vendor: Clarity Elections, which is owned by SOE Software. States that use Clarity genferally share a common look and features, including statewide summary results, voter turnout statistics, and a page linking to county-specific results.
The good news is that Clarity sites also include a “Reports” tab that has structured data downloads in several formats, including XML, XLS, andCSV. The results data are contained in .ZIP files, so they aren’t particularly large or unwieldy. But there’s a catch: the URLs aren’t easily predictable. Here’s a URL for a statewide page:
http://results.enr.clarityelections.com/KY/15261/30235/en/summary.html
The first numeric segment—15261 in this case—uniquely identifies this election, the 2010 primary in Kentucky. But the second numeric segment—30235—represents a subpage, and each county in Kentucky has a different one. Switch over to the page listing the county pages, and you get all the links. Sort of.
The county-specific links, which lead to pages that have structured results files at the precinct level, actually involve redirects, but those secondary numeric segments in the URLs aren’t resolved until we visit them. That means doing a lot of clicking and copying, or scraping. We chose the latter path, although that presents some difficulties as well. Using our time at OpenNews’ New York Code Convening in mid-November, we created a Python library called Clarify that provides access to those URLs containing structured election results data and parses the XML version of it. We’re already using it in OpenElections, and now we’re releasing it for others who work in states that use Clarity software.