While our hack days are a focal point for igniting open data in Bath, the unseen and ongoing work that goes into building a quality data store is arguably the most important thing we do. I thought it’d be good to share what we do behind the scenes.
I’ve tried to condense these down into some practical tips that might be useful if you’re involved in your own local open data initiative. We’re all learning about the best way to create local open data projects, so sharing what we know is essential.
1. Know Your Tools
We’ve been working with Socrata to host our data and this has given us an excellent start. The hosted infrastructure means that we’ve been able to very quickly get stuck in to the practical activity of collecting and publishing data.
But whichever data platform you choose, take time to explore its full set of features and understand how it can support what you want to achieve. Data platforms are becoming increasingly sophisticated and, while not everyone may need all of these features immediately, its good to understand the options available.
Some of our earlier events have really been about kicking the tires on the platform to understand how it works so we can then tailor it to our specific needs.
2. Understand Your Data
Related to this is the need to have a good understanding about what kinds of data you’re likely to be publishing and how these different data types can best be represented and made available to users. You’re likely to need different workflows to deal with a range of different types of data sources. Understanding how datasets connect together, e.g. through shared identifiers, can also help ensure they’re structured in useful ways when published.
For example we’ve been dealing with fine-grained readings from air quality sensors, transactional data, points of interest and boundaries. All of which are published in different ways, with different frequencies and need to be differently structured in order to be useful.
3. Engage with Your Data Sources
For any city data project, access to local council data is going to be crucial. In Bath we’ve been very lucky that B&NES have been really engaged with what we’re doing. This has made it much easier to get access to useful data. And by working as a combined team, both inside and outside of the council, we’ve been able to share the work of tidying data tidied, loading and clearly documenting it.
But its important to remember that not all of the data comes from the council. Data can come from other sources which are too important to overlook:
- central government — there’s a wealth of statistical and other datasets already on data.gov.uk which can be sub-setted to create locally useful versions
- (local) businesses — we’ve been reaching out to businesses operating in the area to engage with them as possible data sources
- the local citizens — I’ve been amazed and delighted at how much local activity already exists around cataloguing and curating local information, we’ve also been reaching out to local curators to offer to host their data in the store. Every area has its local experts and enthusiasts, they just may not think of themselves as data curators (yet!)
4. Build a Curation Team
Giving the breadth of data and data sources involved, the work of curating and loading that data needs to be shared out. We have a core team of curators who are particularly enthusiastic about getting data opened up in useful ways. This is the team that has been doing much of the behind the scenes, working to prepare for the hack days and get the datasets loaded.
We’ve been having regular “curators nights” — which are open for anyone to attend — to get people together to plan and prioritise the areas we need to focus on and to divide up the work to be done. These events are also a good opportunity for people to share skills and knowledge. As well as drink a pint or two!
5. Define Your Best Practices
If you’ve got a number of people contributing data to your store then its a good idea to ensure that people are working consistently. Having some best practices in place for how metadata and documentation will be created, or agreeing on common workflows for data uploads, will help keep things on track.
For example we’ve written up a short document describing how metadata should be entered for our datasets. This also helps to bring new people on-board very quickly.
We’ve also agreed that all our data loading code should be open sourced to make it easy for others to pick up and re-run. That code might also be useful for people in other areas. You can see what we’ve created so far in the BathHacked and DataSulis github accounts.
6. Be Agile
We’re also operating like a startup: Defining short term objectives (the hack days are great ways to focus attention) and trying to learn and improve as we go. We want Bath: Hacked to be one of the leading examples of open data in the UK. That will take time and effort, but we’ll get there!
We’ve also been using startup tools to help organise the team. As well as github we’re also using Slack for messaging and internal document sharing, and Trello for planning and prioritisation. The majority of these tools have free tiers that make them essential if, like us, you have a big vision but a small budget.
As an example of how we’re using these tools, we’ve just started using Trello to organise our data loading activities. We’ve defined the key stages for working on our datasets, with the process being:
- Ideas — a list of potential new datasets
- Ready for Loading — dataset has an identified source and data is available under an open licence
- In Progress — datasets that people are currently converting or loading
- In Review — datasets ready for internal review so we can check they follow best practices and the licensing is correct
- Done — which is where we raise a glass 🙂
So at any time we can easily see what datasets we have in the pipeline and who amongst our team of amazing volunteers is working on which datasets.
Anyway, that’s a short overview of how we’re starting to curate data in Bath: Hacked. Hopefully it might be useful for others who are also bootstrapping their local open data projects. Also, if you’ve got comments or suggestions then let us know. We’d love to hear from you!
Now come get your hands dirty!
Hacked 2.1: Past, Present and Future is now two weeks away and we’re fully booked. If you’re still hoping for a space then you’ll need to get yourself on the waiting list.
We’re really excited that so many people want to come and create amazing things using local open data.