Our Chief Operational Officer (COO) Steven Revill has been delivering the Scottish Government’s Open Data Training Pilot for 9 months now and one of the most common discussions cropping up during the sessions is metadata. Here Steven reflects on the discussions he’s had with training delegates about metadata.
Where we are now
The starting point for a lot of the discussions on metadata is the Scottish Government’s Open Data Strategy and Resource Pack, supplemented by other examples. The Scottish Government’s Resource Pack sets out a pragmatic 3 stage process moving from an organisation’s internal standard to Dublin Core and then onwards to DCAT (Data Catalog Vocabulary).
So far, workshop attendees range from those who have hardly used the term metadata to those who have a working knowledge of DCAT. For the majority there is a good understanding of metadata and its benefits but not necessarily the connection to the practical elements of creating metadata.
We have been hearing that the management of metadata in organisations is often done on an individual basis with no clear common standards being used and in fact is seems the perceived need for a standard is not really understood. My understanding from discussions is that data is often used only for the purpose for which it is collected, maintained or updated only for the necessary period and often no more is done. When data is shared it is usually shared with practitioners so closely aligned to it that a level of understanding is already high and little needs to be said. Because data sharing is not often the norm in a data landscape, neither are metadata standards.
This is not to say that people are not seeing the value when it comes to open data, especially people we’ve met working in libraries or with maps. One of my favourite quotes from a participant at the training is that ‘DCAT would be the norm on the Internet if the Internet was created by librarians!’
For those with a Geographic Information Systems (GIS) background, the use of metadata to help people access and use data is quite normal. We’ve also trained geographers who deliver a type of open data such as delivering against the EU INSPIRE Directive and applying the UK Gemini 2 Specification for metadata. But whilst these groups or teams have data sharing agreements or may have implemented standards around metadata, they are often not any type of standard such as Dublin Core.
Overall, the tone of the training indicates to me that most people understand the concept of metadata and how it applies to open data but when it comes to open data delivery, they are more concerned about other elements like a simple agreement for the publication of datasets or deciding where to host them.
Interestingly, metadata is rarely mentioned by delegates in post training questionnaires and maybe that is because it is an issue that can be resolved either through embedding new processes or by providing an ad-hoc process at the point of publication. As one delegate said, 'Applying new metadata standards on historic data sets would in itself be incorrect'. Regardless, it is an issue which can to a certain extent be parked until a point it is absolutely necessary.
Discussions I’ve had during training indicate that moving towards more adoption of metadata use and publication will get the best traction and results if it is embedded in data governance processes throughout the organisation and that open data needs to be a part of the governance processes from the outset. In my opinion, all tools used to store and manage data should assist this process and all tools used to publish the data should simplify and allow for mandating of open data metadata.
Like with open data there is more to be done, and whilst the embedding of metadata and processes for open data publication should be the long term goal, there is also a benefit in getting started to further fuel the demand. Making data open is putting data on the Internet to be 'useable by all' as noted in the Scottish Government’s Open Data Strategy. To do this, 'all' need to understand the data and the data provider must provide enough additional information to describe the data itself to allow people to use and reuse it. We call this data about the data, metadata.