DAFNI’s data roadmap
My role at DAFNI is to structure the data and its description so emerging data flows deliver high level DAFNI features. I also work with users to understand their needs, look at ways to enrich datasets with additional information about what they contain, and explore ways to connect data together. I’m also working with the team on ways of looking further ahead – making data ready for machine processing, for example.
Establishing the data foundations will drive the delivery of the features we can offer. This means obtaining the right kind of data, in line with industry best practice on security and permissions, to build up a library of datasets. Compiling data in ways that best serve DAFNI users, plus making the best possible technological and architectural choices, impacts on what we can do today and tomorrow.
As an example, by adding geospatial information (for instance, the latitude and longitude of electrical substations) we can easily deliver some additional features. For example, we can draw an area in a map and filter the data to highlight the substations within the drawing.
This further allows us to carry out any operations involving two datasets: imagine one with Met Office flood predictions and the other one with important infrastructure assets. DAFNI users could then highlight potentially affected assets – this would be an example of a high level feature.
My team is very diverse and talented and we are constantly brainstorming useful analytics features, from simple logical operations to artificial intelligence. We always aim to be flexible and avoid making decisions that we can’t backtrack on later. We also try to maintain high standards of service whilst being ready to embrace change – anticipating the required features and changing data formatting, storage and visualisation standards of the future.
The total number of files currently held on DAFNI is already approaching 14,000. Even if the files are compressed at source, we always store them decompressed for maximum convenience to our users – so that when they click on them they can avoid the time and processing power required to decompress the dataset.
Open data essentials
When we started planning our data strategy, one of our first actions was to make UK Government open data available. For example, the 2011 Census, which covers many subjects, from finance to population and migrations of people. We grabbed the data in bulk and then built custom datasets from that – so users can search using an official ID or a dataset.
Successfully uploading the whole census as a bulk service was a great test of the robustness of the platform. I built the data handling in DAFNI so that users can track datasets and upload files, and uploading the census in bulk was a good test of how to automate the whole process in a seamless way.
We’ve also taken many datasets from government sources and agencies which are required by law to be open and public, including Companies House data and the UK House Price Index. I’m working on ways of automatically getting datasets like these and others not limited to Open Government Licenced data into DAFNI as soon as they are published; and on ways of offering users more intelligent and useful ways of accessing the data, so that they can easily find relevant datasets. We can then plan further and varied access, according to their needs of users.
I’ve also added data from Newcastle University’s Urban Observatory Pilot Project, whose data is updated in real-time by sensors all over Newcastle. We have used their interface to access the data and store some of their data on DAFNI. We currently offer sample assets from the Observatory on DAFNI, and plan to examine user feedback to know how best to further develop the dataset. So far, I have compiled all the data for a month and then divided it into different units, so for DAFNI users the datasets are already classified into air quality, pedestrians, drains, etc., to make it more convenient. Geospatial datasets, so essential to our main theme of infrastructure, is a large category of data that we offer. The assets our users need to map or deal with, from power stations to roads and houses, are located nationwide.
We see maps as part of the presentation layer (visualisation) fed by a series of data layers. We want to make working with these layers and the data flows emanating from their analysis convenient enough for the researchers to gain useful insights as seamlessly as possible. There should be comprehensive support for mapping intelligence.
What’s valuable to DAFNI’s researchers, is giving them the tools to associate data from one source with other datasets – be they datasets that we hold at DAFNI, or that they already have.
We’re currently working with commercial entities and holders of proprietary data, such as utilities and other infrastructure networks including roads, rail and telecoms, to explore how we might host some of those their datasets on DAFNI. These sorts of datasets present unique challenges, as they contain valuable information but, with privacy and security guidelines and regulations to adhere to, and other considerations, are more complex to host than open data. We are also developing ways to further enrich the data, and working on the creation of increasingly sophisticated data linking.
Our objective is to position and futureproof DAFNI as the leading modelling and analytics platform for infrastructure research.
A follow-up blog will explore how we are future-proofing DAFNI’s capacity to hold and manipulate data, our use of standards such as ISO and W3C, and how we’re preparing for GQL (the future graph querying standard).