I spent some time during the last weeks playing with the Open Data published by the City of Milan. I did not have a clear goal in mind, except for building some interesting visualization of the Public Transport coverage of the city grounds.
A quick exploration of the dataset seemed to be encouraging: while most of the data was relatively useless, some datasets were indeed promising and worth spending some time. While at the end of the week I was able to get the result I had in mind (the heatmap you can find in this post), I was left with that lingering feeling of dissatisfaction that accompanies me when I see good initiatives that can be dramatically improved by changing a few specific features.
Presentation of data
If the purpose of a website is to publish data, data should be at the center. However, while CSV data sets featured a preview option, there was absolutely no way to preview topological data. Of course geographical displays are a more complex problem to solve but as of 2013 there are many libraries that can effortlessly visualize geographical features. Topological data is presented in a textual catalogue, with abundant descriptions and numerous fields of metadata, but there is no map. The screenshot below is the page on the website that describes the data about Parco Nord (a park where I used to go running). Note that it does not offer any hint about what the data look like. Compare this with the element below: (almost) the same data visualized on GitHub as a GeoJSON file. I believe this format is much more effective in communicating what the data look like. I suspect you will agree with me. https://gist.github.com/abahgat/6334117
Choosing the right format
Topological data offered by the initiative is coded using the Shapefile data format, introduced in the 1990s for use with desktop GIS software. It is a very rich and powerful format but it encodes data as a set of compressed binary files, making it unusable with modern web applications without doing some prior processing. While Shapefiles are great for professional GIS users, for an Open Data initiative to reach the most developers, using a text based format like KML or GeoJSON would have been a wiser choice, as it lowers the barrier for the general public to consume open data information. Both formats are sufficiently rich to encode structured information: the map below is a good example (and the raw file is still human-readable). https://gist.github.com/abahgat/6359868
The end result
After spending some time on this I ended up creating a GitHub repository with the data I played with converted to GeoJSON, ready for use with web applications, and wrote a simple visualization of the coverage of the city of Milan by the public transport network (the image you can see at the beginning of this post). Now, it would be great if whoever is responsible for Milan’s Open Data could look into making information available through better formats, leveraging Google Maps Engine or GitHub’s support for GeoJSON. While we wait for that to happen, if you convert more data to GeoJSON, feel free to fork opendata-milano on GitHub and contribute there.