New York I love you, but your data is hard…
By Eric Mill
The New York State Board of Elections publishes a large set of election results in a central place. Naturally, they are all PDFs. But these are especially frustrating PDFs:
- They are clearly tabular, which strongly suggests that they lived a past life as spreadsheets.
- For some years, they already publish Excel versions of the election results! This is more than suggestion: they clearly generate these PDFs using spreadsheets. They're just not published.
- Though the numbers are tabular, the candidate names are styled to be diagonal. So, when you extract text from the PDF version, the tables stay intact, but the names are exploded across the sky like stars.
I spent a few months trying in vain to find the right staffer who would post the rest of the Excel files, or at least tell me that there were no Excel files to publish. The furthest I got was a one-line, 3-week-delayed email from someone telling me they'd see what they could get for me, with no further response for months.
And so, inspired by Sandra Fish's successful Wyoming public records request, I filed my own FOIL request with the Board of Elections, using their handy web form. I called their Public Information office first before filing, to ask what they thought of the request, and they offered no opinion other than to just go ahead and file it.
In my request on August 26, I asked for "any and all election results, between 1994 and 2012, that the New York State Board of Elections has in spreadsheet form (where the file format is one of: .XLS, .XLSX, .TSV, or .CSV)." and pointed to the existing Excel files for 2010 as an example.
The Director of Public Information is legally obligated to respond within 5 business days, but by September 11, I still hadn't heard. After an email and two calls over the next few days, they emailed me all of 12 Excel files on September 16.
The spreadsheets they sent me are somewhat mystifying.
- 5 of them, from 2010, are slightly updated versions of only some of the many 2010 Excel files they already publish.
- 1 of them is new from 2008 — the Presidential general election results, currently only published in PDF. Nothing for the many other 2008 elections.
- 6 of them are new from 2012, and contain all 6 general election results for 2012, currently only published in PDF.
It's not clear why I got a mix of spreadsheets that are currently available and unavailable, or why these were the only ones sent. Could they really have thrown away all of their other 2008 general election spreadsheets? Do they never use spreadsheets for primary or special elections?
The format for primary and special election results the NY Board of Elections publishes are in different formats than general elections, so perhaps these are managed differently (though they are still suspiciously tabular). But all of their general election results from 2008 onwards use the same format, clearly generated in the same manner.
And what about the spreadsheets they already publish for 2006? The 2004 general elections use the same format as 2006, but no spreadsheets are available.
Whether it's secrecy or simply disarray, the New York Board of Elections could be doing a much better job at serving the public. I've filed another, more detailed FOIL request, this time by mail, with the Board of Elections to try to get to the heart of how election information flows through the state of New York. It shouldn't take this much work: election results are data. The Board should publish them that way.
Eric Mill builds software, writes on tech policy, and helps manage the Sunlight Foundation's international program. His projects at Sunlight include Congress API, its government search and alert system Scout, and a Congress app for Android phones. He also leads Sunlight's involvement in many of the projects at github.com/unitedstates. Outside of Sunlight, Eric runs a blog at konklone.com, and works on numerous open source projects at Github.