Opportunities and barriers to data sharing identified at UKCRIC workshops

Unsplash image for UKCRIC blogIntroduction

As part of DAFNI’s data sharing project ‘Data Infrastructure for National Infrastructure’ (DINI), DAFNI commissioned two workshops on data sharing with UKCRIC’s (the UK Collaboratorium for Research on Infrastructure and Cities) Stakeholder Advisory Group and Urban Observatory network. The DINI project was funded by the Department for Science, Innovation and Technology (DSIT), aimed at better and safer use of data in research and funded by UK Research and Innovation (UKRI)’s Digital Research Infrastructure (DRI) programme, a £129m initiative aimed at developing a system that’s interconnected, human, FAIR and sustainable.

UKCRIC is a multidisciplinary network of UK universities, connecting research with policy and practice in infrastructure and urban systems. It works with stakeholders to better understand and address complex infrastructure challenges, essential to tackling aging infrastructure in a period of population growth, sustainability concerns, and the impacts of climate change.

The first workshop ran in May 2024 and the second in September 2024, with the final report submitted to DAFNI in December 2024.

Key challenges the UKCRIC workshops aimed to address

The first event was chaired by Mark Enzer, Chief Technical Officer at Mott MacDonald. He expertly marshalled members of UKCRIC’s Stakeholder Advisory Group, comprising professional practitioners in the infrastructure and urban systems sectors, to discuss the challenges of, and opportunities for, data sharing between industry and academia.

Joanne Leach, Executive Manager, UKCRIC, chaired the second event, focusing on the challenges of- and opportunities for- data sharing in urban observatory settings. She led a group of delegates invited for their expertise in urban/infrastructure observatories or living labs, comprising mainly academics specialising in data, urban systems and infrastructure.

The approach

Participation in both events was by invitation only. UKCRIC delivered the sessions online to maximise attendance from across the UK. Before the events, attendees were provided with information about the DINI project, its aims, and purpose together with potential discussion points and questions to be asked during the events. Questions were prepared to guide discussion on the need for a national research cloud to support data sharing.

Both workshops focused upon the same central question – “How can the UK transform its data into research assets that can be used to benefit society?

What are current practices in data sharing?

Current best practice in data sharing focuses on enabling FAIR (Findable, Accessible, Interoperable, Reusable) data practices.

There are many questions in play. For example, there is a seemingly ever-present discussion about the intricacies and difficulties of data sharing, such as ‘Who is liable if poor-quality data results in harm?’. Other areas of regular discussion in the data sharing community include:

  • If data is shared once, is the sharer committed to sharing future versions of the data?
  • Can shared data be un-shared?
  • What legal agreements are needed?’

Proportionality and purpose are established cornerstones of data gathering. However, the proliferation of digital- and big data are testing these, together with increased recognition that the question being asked of data is not always known at the time it is collected.

Without a purpose, the sharing of data is unlikely to succeed. The importance of data descriptions and purpose is increasing, driven by data proliferation and big data.

Traditionally, data would not be collected without a decent description and knowledge of the structure itself. First and foremost, data ontology is constructed by a user coming from a singular viewpoint, which is then reliant upon subsequent users understanding that very viewpoint.

Nevertheless, for Artificial Intelligence (AI) and machine learning, unstructured data can be more useful, especially in terms of looking for patterns. It is also true that the structure of data is not always known beforehand and it can be that AI can expose these structures over time. A data creator may lose sight and control of their data. This is viewed variously as a concern and an opportunity.

Urban observatories

For the purpose of the second workshop, urban observatories were defined using a model devised by UKCRIC. UKCRIC created six urban observatories throughout the UK, located in Birmingham, Bristol, Cranfield, Newcastle, Sheffield, and Manchester.

For each, research teams placed static and mobile sensors in the city to gather data about the sustainability and resilience of the urban environment. Sensors gathered continuous and near-continuous data on air quality, people and vehicle movements, water quality, and weather. The data was made publicly available, and users could see, analyse and download the data.

Much has changed since UKCRIC first set up its urban observatories over five years ago. Cities are now used to having sensors in the fabric of their built environments. These sensor networks are more expansive than those first implemented by UKCRIC and are often maintained by Local authorties, providing consistent data over time. These same Local Authorities are also more willing to allow researchers to access data from their sensors, and this is allowing for new and novel collaborative research opportunities.

Urban observatories also now include infrastructure sensors, allowing for improved condition monitoring, essential for hard-to-access areas such as bridges.

Benefits of data sharing

At the first event, participants acknowledged that distinctions should be made between pre-commercial and post-commercial data sharing. Pre-commercial data sharing addresses issues that are faced by an entire group or sector, the solving of which benefits the whole group or sector. In contrast, post-commercial data sharing has the potential to deliver market advantage to a specific organisation.

Each data sharing use-case has a different benefit profile, but the distinction between the two is not always clear or considered in data sharing paradigms.

Data sharing from commercial and government entities is often pre-commercial, leading to improvements in engineering standards, decreased risk and, by extension, decreased overengineering. The group concluded that increased use of pre-commercial data in a national data cloud has the potential to increase the impact of data and the capacity and capability for data analysis as a nation. This would enable data to be used in new and innovative ways and ultimately contribute to better government policies.

At the second event, it was emphasised that sharing data makes others aware of the data you have available, and this can prevent duplication of effort and lead to new collaborations. Coming together in a geographical space, such as an urban observatory, can lead to new partnerships – and bringing together data from multiple sources into a single computational structure can enable multidisciplinary research.

The main recommendations from the workshops

The workshops identified six key functionality components of a data cloud:

  1. Custodianship. Being not simply a keeper and controller of data, but a caretaker of it.
  2. Skills: Highly qualified people working on signposting and curating data, including protecting against data that is poor quality, and removing data that is out-of-date.
  3. Signposting: Enabling services to support easier ways for users to find and access data, establish benchmarks, and conduct insights and analytics.
  4. Horizon scanning: Assessing future data and data needs and making missing data explicit.
  5. Reliable and sustainable systems: Monitoring systems run by experts to assure the quality of the processes that generate and extract data before they are sent to the data cloud.

The main outcomes of the workshops

Both the industry representatives, who participated in the first session, and the academics, who participated in the second, face pressures that influence their ability to share data. However, they also recognise the opportunities afforded by sharing data and see the potential in taking a ‘data-sharing first’ approach to their work. The participants see value in establishing a national data cloud and believe that the challenges to doing so are surmountable and that now is the right time to tackle them.

An important feature of a national data cloud is set to be the enabling of modelling across datasets and local, national and regional scales. This will require data that is standardised and interoperable. The necessary groundwork to make data interoperable, available and knowable is fundamental to the data cloud’s long-term sustainability. Without it, the data within the data cloud will not be sharable.

There is substantial work still to be done to establish confidence in data. The emerging field of ‘computational epistemology’ has a role to play, a term coined to describe data that includes additional information for the purpose of providing confidence in the data itself.

The data cloud’s default position should be that shared data will be made public. From this starting point, questions can be asked about why certain data can’t be made public, what would need to be put in place to make it public, and if they can’t be made public, what is the next least-restrictive sharing model that applies.

Understanding data to be brought into a national data cloud is the first step in developing the data cloud. The cloud could take the form of a centralised data depository, a federation of knowledge stores (an option favoured by the workshop participants), or a catalogue of available data with pointers to their locations.

Whichever is the case, investment in hardware, software and people will be needed. The proposed national data cloud, and data sharing infrastructure more generally, must be recognised as a class of infrastructure that falls under the remit of the National Infrastructure and Service Transformation Authority (NISTA). Data sharing infrastructure cannot be owned by one group and must be a shared resource.

The future

UKCRIC continues to research data sharing, further to the publication of the DAFNI-DINI report, which draws together recommendations and suggestions as to what DAFNI can do to deliver these next steps. Further insights from the UKCRIC workshops and the full UKCRIC report are available on the UKCRIC website.