When to make use of Open Source data?

  • 13 April 2022
Many customer projects could benefit from additional data from other sources. However, this comes with its own complications such as:

  • How is it maintained?
  • How is it stored?
  • Are the customers OK with this approach?
  • How is it sourced?

Open source data can be a huge resource for a lot of projects. The shining example of this, to me, is the OpenStreetMap project that, most recently, had an extra drive to increase their map coverage for COVID-19 efforts for developing regions, though this is far from a one-off. There is a whole community of libraries built on top of this for GIS applications.

I think it’s worth noting that your question could just as easily apply to pretty much every library you use in programming. Pandas and numpy are open-source code bases that anyone can contribute to, and these underpin so many other libraries. And they, themselves, fit into python - an open-source language. Linux itself is not safe from this, as in a recent controversy. Code can (usually) do far more damage to your system than a data source (where an opportunist might wish to inject something that could be interpreted as a system command, if read in a specific way). 

All of those questions, then, are basically prompts for yourself as part of your general day-to-day when you use anything in programming. There is risk and reward for anything open-source.

I haven’t used loads of open source data but one I’ve found really useful is the ONS postcodes data. It lets you map between postcodes, longitude and latitudes, and regions (at various different levels). It’s a reliable data set as it’s gathered by an official government body and it also has the advantage that it doesn’t change very often so you don’t need to worry too much about needing to refresh it regularly. 

If the data you’re dealing with only has postcodes then it can be really useful to use this kind of data to map up to regions or smaller areas, I’ve used it on London postcodes to map to boroughs for example for some analysis.