This post is a little out of the ordinary for TVHE but I thought I’d talk a bit about the tools I use every day. If you’re anything like me you spend an inordinate amount of time cleaning up data. Even something as simple as using GDP in a regression involves finding the right page on the ONS/Stats NZ website, downloading a CSV, then importing the CSV while being careful not to catch any of the extra junk in there. If you’re unlucky enough to get an Excel spreadsheet you usually have to also spend time cleaning up the sheet to import into your favourite programme. I have spent many hours wishing that the ONS/Stats NZ provided a FRED-like API so I could just suck in the latest time series from their server in my code. Something’s wrong when it’s easiest to get the latest UK data from the St Louis Fed!
Thankfully, there is an amazing website that has recently launched called Quandl, which indexes millions of time series from around the world. Importantly, they offer custom-built packages for interacting with their data in R, Python, Stata, Java etc. So now I can grab the UK’s GDP series in Python like this
The first time I did it was great, but then I realised I only had the Blue Book’s annual series and I really needed the quarterly series. Unfortunately Quandl didn’t scrape the Quarterly National Accounts so I thought I was out of luck. I sent an email to their helpdesk asking about adding new series and, incredibly, got an email back from a dev within a day offering to upload all the datasets I could provide urls for. So now, a couple of weeks later, if you want the quarterly series it’s as easy as
I think this is the beginning of a beautiful friendship for me, but Quandl really need your help. They’re growing fast and keen to expand so, if you know of a dataset they should index, send them an email and make it happen for all of us!
PS. If you’re a Python user you’ll be thrilled to hear that Quandl’s package returns pandas dataframes! You may be surprised at how happy that made me.