About

What is open data?

The Open Data Institute defines open data as follows:

Open data is data that anyone can access, use and share. Governments, businesses and individuals can use open data to bring about social, economic and environmental benefits.

In order for open data to become userful, it must be made available to the public in a non-restricted and machine-readable format, and this is what we seek to achieve with this Portal.

How is Valls City Council Open Data Portal structured?

Data is presented in resources: tables, files or maps in open and structured formats, such as CSV or JSON, for instance. These resources are grouped in datasets according to their nature. Further, datasets are classified in categories, depending on the area, or areas, to which they belong or are related. A resource can only belong to one dataset, while a dataset can belong to one or more categories.

Thus, for instance, dataset Population contains several resources with demografic data, and belongs to a category (or group) called Demographics, while dataset Budget belongs to categories Economy and Public Sector and contains resources with data related to the City Council Budget.

Implementation

To implement the Open Data Portal we have used open source tools installed in containers, dockerized, so as to make their installation and eventual move or scaling easier.

An infrastructure of interconnected containers has been built such that it allows for automatic and unattended data ingestion and transformation, as well as uploading of data to the Portal, all following an established schedule.

Finally, an application we called OpenDChain has been built, developed in-house, which replicates all the Portal's data to the distributed file network IPFS, in order to guarantee its avaliability, and stores its hash on a blockchain, in order to ensure traceability.

The Portal

The Portal's engine is CKAN, the open source tool, installed in a docker container to which some geographic data visualization plugins have been added.

The tool provides an API that allows us to upload and modify data, and their metadata, remotely and unattended.

How to feed the Portal?

Data is obtained from the City Council's internal databases, and from external web sites, such as AOC's Transparency Portal, or a meteorology service.

The choosen tool is logstash, also open source. This tool allows us to configure an ETL circuit (pipeline) with specialized plugins for each fo the three fases: input, filter and output.

Input plugins fetch the data at established intervals from the sources. Here we have used plugins for SQL and http sources.

The filter plugins adapt data, if necessary, to CKAN's needs. They, for instance, transform date and time data to the propper format, add fields, or substitute dots for commas.

Finally, output plugins upload the data to CKAN through it's API, generate a .CSV file and call OpenDChain's API to let it know there's new data to process.

Permanence of data

OpenDChain, upon receiving logstash's call, replicates the .CSV file to the distributed file network IPFS.

Traceability of data

Once the .CSV has been replicated, OpenDChain recovers it's hash and stores it in a blockchain transaction on the ropsten network, which is one of Ethereum project's test networks.

Then, using CKAN's API, it modifies the dataset's metadata to add the IPFS file's URL and that of the blockchain's transaction, so that this information is now available right from the Open Data Portal.