WSL: How data sharing can identify leakages in vital water infrastructure
Introduction
The Water Systems Leakage (WSL) project was led by DAFNI Champion and Strategy Board Member Professor Liz Varga as part of the 2024-2025 data sharing project ‘Data Infrastructure for National Infrastructure’ (DINI). This exciting programme of work was instigated by the Department for Science, Innovation and Technology (DSIT), aimed at better and safer use of data in research and funded by UK Research and Innovation (UKRI)’s Digital Research Infrastructure (DRI) programme. The ‘DRI’ is a £129m initiative aimed at developing a research data system that’s resilient, interconnected, FAIR and sustainable.
With almost 20% of water resources in the UK wasted in seepage and escapes, this project used data to enhance leakage detection in water distribution systems, identify barriers to data sharing, and propose solutions that would facilitate cooperation between stakeholders. The research identified barriers to high-quality data collection and standardisation and the need for data confidentiality and high-performance computing in collaborative Trusted Research Environments, to enable access to infrastructure data for effective detection of leaks. In addition, the study provides a comprehensive data description framework for leakage detection, serving as a foundation and reference for the development of machine learning and other advanced data-driven methods.
The DAFNI-DINI project allowed Professor Varga and her team at University College London (UCL) to further explore the efficacy of water systems, building on a Manchester Prize gAIn Water project with DSIT, which involved creating algorithms for predicting water systems leakage.
The challenges
Water is a critical infrastructure, providing clean, potable water into homes,industry and agriculture. However, there are barriers in accessing and sharing water data, with concerns regarding data protection and the potential to expose security vulnerabilities.
Data from utilities is shared via individual trusted working partnerships. “Academics need to invest in building and developing relationships with utilities on the use cases for data sharing,” explained Professor Liz Varga, Chair in Complex Systems from the Department of Civil, Environmental and Geomatics Engineering, UCL. “We had to build relationships and restart earlier conversations in order to convince water companies that exploratory work on new methods for systems leakage management was worthy of investigation.”
Water companies do collect data, however it can be project-focused and so become out of date quickly. Ensuring the right scale for sharing of data insights is also critical to avoid pinpointing potential high-water consumers. Even with the right water distribution system data, other data relating to the same spatial and temporal periods needs to be examined to arrive at meaningful recommendations for the prioritisation of leakage management. The selection of these complementary datasets should be guided by water companies’ domain expertise, which itself constitutes an important dimension of the broader data-sharing challenge.
Professor Varga says, “The key research gaps became quickly apparent. The absence of a comprehensive method to overcome data-sharing barriers for water companies was coupled with an absence of tailored data standards for leakage management, and an urgent need for a holistic framework to assess benefits and prioritise barriers.”
The approach
Fellow DINI project, Icebreaker One (IB1) explored the requirements and impact of supporting an improved sharing of national infrastructure data with publicly funded researchers, focusing on energy, water and transportation. IB1 devised a framework, outlining the many different types of barriers to data sharing, such as commercial sensitivity and legality. The WSL team used IB1’s framework and literature review, supplemented with their own review of grey literature from public databases and water supply companies, to validate these barriers via interviews with utilities, academics, and technicians. In parallel, the WSL team also developed a structured data description for leakage detection, providing a reference foundation to support future AI application developments in this domain. The insights enabled the compilation of data standards and an ontology, as well as uncovering potential solutions for the further adoption of AI-based tools for leakage detection.
Key findings
With the identification of 22 barriers and 19 solutions to data sharing in water systems, the team decided to create a comprehensive reference library (wiki) with links to all the different types of data that would be required by companies to manage leakage. Findings covered data on customer complaints of leaks, soil types (some soils have a propensity to exacerbate leakage), and repair work (which identified a very wide range of leakage types).
Cutting losses from water leakage is just one aspect of meeting future water demand. Other strategies include building new reservoirs and treatment plants, desalination and pumping over large distances, refurbishing and extending existing facilities, and interconnectors to enhance drought resilience and supply security. However, these strategies are much more expensive than focusing on reducing leakage, especially when water being distributed already comes with a cost (from treatment, pumping, etc.)
The team spoke to organisations, such as the Open Data Institute and STREAM, based at Northumbria University. STREAM is a collaboration between UK water companies, supported by industry and civil society partners with a vision to unlock the potential of water data to benefit customers, society, and the environment. Tellingly for both, their missions for open data have so far been unsuccessful due to the barriers identified. However, STREAM has been successful in creating a collaborative network with most of the UK’s water companies.
Water flow is measured in pre-determined districts, called district metered areas (DMAs) although the boundaries of districts can change especially when there are building developments. A DMA consists of hundreds of properties and include industrial, commercial and agricultural use. Water companies have insight when billing does not match pumped water, but because DMAs are sufficiently large, it is unclear where these losses occur. Comparing DMAs with similar types and numbers of consumption allows closer examination of potential leaks. At a property level, the homeowner may be paying for leakage at the mains without knowing it. It is only when bills are compared with similar sized households that these come to light and investigations can be made into water consumption and losses. Smart meters at property level can also provide better water usage data. Overall, the water company must charge for all the pumped water, so utility bills have to compensate for losses.
“The public ultimately overpays for leakage in water infrastructure,” explained Professor Varga. “Using proven methods from Artificial Intelligence to identify and prioritise leakage management leads to direct savings as well as positive environmental impact.”
Benefits of data sharing
Data sharing would lead to improved water leakage detection, with data from real-time sensors enabling early and proactive discovery, rather than a reliance on customer reports. It would foster innovation and allow researchers to validate models using real-world scenarios.
It would foster innovation and allow researchers to validate algorithms and models from diverse places, allowing consideration of soil types, pipe age, pipe construction materials and age.
Transparent access to leakage data would inform evidence-based decision-making for water companies and contribute to sustainable development goals, such as cost-effective maintenance scheduling. Standardisation of datasets would allow consistent solutions to be used in different regions and drive forward the potential for mutual aid and exchange between companies.
Barriers to data sharing
The 22 barriers identified covered both cultural and technical barriers such as discoverability and reliability, respectively. GDPR and commercial sensitivities limit access to business data, therefore reducing possibilities for collaboration and transparency. Data access for both academic researchers and the public is often extremely limited, affecting the validation of methods. In addition, sensor data is often incomplete, due to transmission and equipment failure issues, reducing its usefulness for advanced analysis and modelling.
The main recommendations from the 19 solutions
- A legal framework must be developed for data sharing agreements that protect commercial interests and allow researchers access.
- Standardisation initiatives need to be adopted for units, terminology and sampling protocols.
- Anonymisation techniques and datasets must be used to balance privacy concerns with data availability.
- Stakeholders need to be trained on the benefits and methods of data sharing, eliminating cultural resistance.
How could this work benefit society as a whole?
The main and immediate benefit would be cheaper water bills as households are currently paying for leaks inherent in existing infrastructure. With leakages stemmed, different uses for water savings could be realised, such as creating green hydrogen for cleaner energy systems.
Next steps
On the DAFNI platform, researchers can access the spreadsheet of data or ‘wiki’ as termed by the team, as well as the methodology for use cases. A journal paper has been published, which contains proposals for recommendations – ‘Addressing data sharing challenges for leakage management in water distribution networks: a multi-criteria decision-based (MCDM) assessment of barriers and solutions’, Journal of Environmental Management, 391, September 2025. DOI: https://doi.org/10.1016/j.jenvman.2025.126481
The team are now exploring potential collaborations and funding opportunities.
“I think the main impact of the WSL project has been in raising the profile of the genuine challenges in obtaining data about water systems challenges,” says Professor Varga. “The DAFNI platform has great potential to help with sharing data, using standards, ensuring data is securely managed, and that privacy is maintained. DAFNI enables sharing into one trusted research platform for use by multiple universities. Society would reap benefits from such controlled data sharing.”
Who’s involved?
Ruoqing Yin, PhD student at UCL led the project, with Professor Varga, and doctoral students Haonan Xu and Jiaqian Wei, from UCL. Professor Varga was at UCL at the time this project took place and is now Professor of Complex Systems at Loughborough University.
When did the project run?
The project started in August 2024 and completed in January 2025.