Part 2
Landscape analysis

ESFRI RIs for Digital Transformation

The digital transformation of Europe’s economies and societies is accelerating. It is entering a next phase, where the technologies are gradually blurring the limits between the physical, digital and biological spheres and push the frontier of what computers are capable to do. These new technologies, progressively coming to maturity and impacting all sectors of our lives and of the economy, build on the use of data, and often require the critical mass of data, users and connected nodes to be viable.

ESFRI is an active actor of this European dynamics. As it was stated already in the ESFRI Roadmap 2018 (page 117), the pan-European e-Infrastructures for Networking, High-Performance Computing and High-Throughput Computing are already well-established and provide production services used by international research and Research Infrastructures projects.

The fundamental principles of Open Science form the basis of the European Open Science Cloud (EOSC) initiative which will offer researchers a virtual environment with open and seamless services for storage, management, analysis and re-use of research data, across borders and scientific disciplines by federating existing data infrastructures. EOSC will deploy a European Research Data Commons where data are findable, accessible, interoperable and reusable (FAIR), and also as open as possible.

CONTRIBUTION TO EOSC

All the ESFRI RIs are at the forefront of data science. As providers of thematic quality data and services, which are FAIR compliant or are working to reach this objective, they have been making significant contributions to the cultural change towards open/FAIR data, open science and innovation, which is a main underlying concept for EOSC. On the other hand, RIs and their communities as key consumers/users of EOSC data and services within and across scientific domains, they are central for EOSC development, quality and sustainability. The broader the federation of thematic RIs in EOSC and the uptake of EOSC generic (horizontal) services, the better the chances for EOSC to be sustained in the longer term. Research Infrastructures are thus central in the research lifecycle and in all the aspects of Open Science and FAIR data/services inside the European Research and Innovation Area.

These activities are supported at EU level within the EOSC cluster projects which support also their participation and inputs to the development of the European Open Science Cloud. There are five thematic cluster projects – ENVRI-FAIR, EOSC-Life, ESCAPE, PaNOSC, SSHOC – who coordinate their actions towards EOSC and who are core partners in the EOSC Future project. It will integrate the services developed in the cluster projects to EOSC, making them fully compatible and accessible to the entire EOSC eco-system. The five science clusters bring together 72 world-class Research Infrastructures from the ESFRI Roadmap and beyond. The coordinators of the science cluster projects meet regularly to discuss joint activities, exchange views on technology choices and debate how the outputs from the projects can be best sustained once the projects end. The current dynamic between the science clusters may lead to a long-term collaboration for cross-disciplinary open science.

It is therefore a great opportunity and in the benefit of all stakeholders, including ESFRI RIs, thematic clusters and EOSC to make the most out of this venture. The experience gathered by ESFRI, ESFRI RIs and the ESFRI Clusters should be utilized to the maximum extent in the EOSC implementation, especially in this current second phase, fully reflecting the engagement and responsibility of RIs in and for Open Science. EOSC highlights the potential of the RIs with their data, software and services and their broader potential impacts. The RIs participation in the development of the EOSC are therefore a particularly vivid example to the digital transformation in the research and innovation field.

OPEN SCIENCE

Most of the RIs on the ESFRI Roadmap are at the forefront of Open Science movement and make important contributions to the digital transformation by transforming the whole research process according to the Open Science paradigm. The Research Infrastructures guarantee the quality of data and enable the exploration and use of data and codes produced by their users which can be the source of a new approach to the research questions, or at least lead to a reduction in research times and costs. There exist although differences in the concrete implementation of Open Science by Research Infrastructures. These differences are specific to each scientific field. The methods of producing research data are very different depending on the field of research and may have developed before the concept of Open Science was adopted.

Astronomy has a long history of open sharing of data and scientific findings, allowing a cumulative development of knowledge about our Cosmos and supporting the achievement of the conditions for transparency and international cooperation in science. In the modern era, astronomy is at the forefront of digitalisation for Open Science. ESO was early leader in this regard, establishing an open-access ESO Science Archive and setting up cooperative partnerships with other Research Infrastructures to ensure interoperability.

CTA is working to build an EOSC for astronomy, Astroparticle & Particle Physics and its application for science projects, including also a science analysis platform where the science community can access and combine data and analysis software from multiple ESFRIs and stage it for innovate analysis workflows, e.g. to perform multi-messenger analysis and push the digital transformation. Open science, and specifically the ability to find, access, inter-operate, re-use both SKAO data and software, is at the heart of SKAO’s software development culture and practices. The scientific return of a project at the scale of SKAO will only be maximised if the data products can be accessed by future users, to answer as yet unforeseen questions.

All the RIs from the environmental research field have a long tradition in providing open access to data. In particular, ACTRIS, IAGOS, ICOS ERIC and EURO-ARGO ERIC contribute with their data products to Copernicus, the European Union’s Earth observation programme, which provides open data and services to benefit all European citizens.

EPOS ERIC provides: (a) a portal (Integrated Core Services-Central hub, ICS-C) integrating FAIRly digital assets from ~250 asset suppliers; (b) an appropriate governance, legal and financial framework tackling longterm sustainability to secure the assets and the ICS-C; (c) a suitable approach for sharing best practices among data providers, domain-specific geoscience organizations and EPOS ERIC. In EPOS assets represent data, scientific products, services and software for solid Earth science. The EPOS federated approach is aimed at engaging national and international Research Infrastructures to share data, making them accessible and usable.

In the health domain ECRIN ERIC has developed tools and services to optimise data management in clinical research. ECRIN developed a data centre certification programme based on about 100 criteria and 3-days site audits to ensure compliance with GCP, FDA and EU regulations. About 15 centres are currently certified in Europe, and the certification programme is now going global, with certified centres in Asia (two in Japan, one in Korea).

In the Social Sciences the principle of open science especially in relation to data access has been strong for many decades. The European Social Survey ERIC for example makes all of its data freely available without privileged access for non-commercial use around the world and has over 160,000 registered data users.

BIG DATA VOLUME

Research Infrastructures constitute a central actor for the production or processing of research data because the majority of them produce, manipulate, process and/ or exchange data. The massive growth in demand for computing resources in recent years calls for a coherent and ambitious strategy at the levels of infrastructure capacity (networks, computing and processing, storage and archiving capacities), associated services, and more generally a rethinking of the place of research data.

User experiments at the European XFEL generate vast amounts of data in a very short period of time. The maximum burst data rate per scientific instrument that must be captured is currently 8 TB/s and translates, when operating scientific instruments in parallel, to a sustained data rate of over 40 GB/s. For a typical user experiment running over 6 days, this can lead to user datasets in excess of 1 PB. Together with the scientific user community, European XFEL is developing computational methods to efficiently analyse the data, both in real time and on the subsequently recorded data to derive meaningful scientific results.

The HL-LHC project presents unprecedented challenges in terms of data processing and storage. CERN operates a number of FAIR data related services that represent major contributions to EOSC, including access to petabytes of LHC experiments’ data together with associated training material and software via the CERN Open Data PortalCERN Open Data Portal
http://opendata.cern.ch/
according to a published open data policyCERN Open Data Policy for the LHC Experiments
http://opendata.cern.ch/docs/cern-open-data-policy-for-lhc-experiments
.

An important activity is currently being kicked off in ESRF for developing data compression algorithms. This is of high interest to all photon sources because all of them are confronted with a steady increase of the data produced by fast high-resolution detectors.

In the health scientific domain, over the last decades, advances in technologies such as genome sequencing and mass spectrometry have resulted in ever larger volumes of valuable research data being generated. The computational biology enabled through this has transformed our understanding of life at all levels and forms. ELIXIR is a distributed, virtual infrastructure where users access online the many hundreds of digital services that are run by ELIXIR Nodes. These include databases, software tools, computing services, interoperability resources and standards, and training in how to use those.

Moreover, new methods in bioimage informatics, including machine-learning approaches and artificial intelligence, are developing at breath-taking speed, opening new and exciting possibilities to fully exploit FAIR image data for life scientists and beyond. Currently, biomedical and life science researchers produce large-scale image data and, therefore, have acute needs for advanced image data analysis and imaging bioinformatics. However, these researchers are often equipped with limited computational resources and basic informatics skills. Consequently, they are not yet enabled to implement and use complex analysis workflows or to make their image data FAIR.

Euro-BioImaging ERIC is working to change this. Through Euro-BioImaging, users already can archive their image data and have access to community-accessible tools for image analysis and processing that can be used via the cloud (as part of EOSC-Life). Euro-BioImaging aims to fuel the new discipline of imaging bioinformatics and to integrate data research across different scientific domains to address larger questions and key societal challenges, e.g. health and ageing, climate change, and food security.

RIs in the Environmental domain produce vast amounts of data with on average a good level of Fairness. However, to study complex phenomena as the Earth System or the Climate System there is a need to increase High-Performance computing resources.

In data-intensive industries access to very large volumes of high-quality data is of primary importance. The most important tech companies are not based in Europe, consequently their activities aimed at acquiring huge amounts of data do not always meet European standards, especially when it comes to privacy and data protection. CLARIN ERIC offers a more sustainable alternative, emphasising the quality of data (which are accompanied with metadata and annotations) over the quantity. CLARIN data and tools can be used to develop, train and evaluate many language-related data-intensive technologies, such as Machine Translation, Automatic Text Simplification or Automatic Text Summarisation, Automatic Text Generation, Knowledge Extraction or Machine Learning. The multilingual and multimodal character of the data and resources available through CLARIN promotes a culture of global citizenship and appreciation of cultural diversity, which is also another contribution to shaping digital transformation in alignment with Europe’s values.

DIGITAL INNOVATION

The European Strategy of shaping Europe’s digital future can strongly be supported by large-scale RIs that have already proven to be an excellent environment for creating digital innovations. The associated cutting-edge research is always driven by novel technological opportunities, including digital technologies. Big Data, Data Processing and Analysis, Data access, High Data Rates, Modern Computing technologies are nowadays extremely important to face many global challenges (climate, environment, health, including the COVID-19). These are indeed common RI tools which are constantly evolving towards innovations on the grounds of RIs. A great example of very recent innovations in the field of digital technologies for competitiveness and fit for the Green Deal is a direct spin-off of the FAIR project, an innovative energy-efficient and sustainable data center called Green IT Cube which is currently one of the most efficient scientific computing centers in the world, using an innovative patented water cooling system of the racks and making it thus a great example of the digital sector’s policy to minimize carbon emission.

As a major analytical research facility, the ILL relies heavily on IT infrastructure to convert its experimental output into scientific knowledge. In this context, digital transformation holds the promise of major disruptive innovation. Machine learning can be applied to the recognition of patterns within the experimental data, making it possible to optimise the measurement strategy and fine-tune instrument parameters during operation and development. The ESRF has started an ambitious programme to work on machine learning methods. Machine learning is of importance for reducing noise in the data, automatically detect patterns which otherwise would go unnoticed, and for optimizing experimental conditions allowing to shorten the time required for the data acquisition process.

ACTRIS provides access to users to Virtual Research Environment to conduct specific experiments, simulations, and online data processing. LifeWatch ERIC seeks to understand the complex interactions between species and the environment, taking advantage of High-Performance, Grid and Big Data computing systems, and the development of advanced modelling tools to implement management measures aimed at preserving life on Earth. ELIXIR activities are at the heart of the digital transformation of life science research.

ELIXIR connects, coordinates and integrates bio - informatics resources across Europe, building a coherent life sci - ence infrastructure for the digital age and supporting everything from expert bioinformaticians to life science generalists and users from academia to industry.

Digital transformation is also a cornerstone of personalised medi - cine. In this context EATRIS ERIC and ECRIN ERIC are developing methodological standards which include the generation, through stratification cohorts, of multi-omic data with subsequent ma - chine-learning stratification to identify homogeneous patient clus - ters, then clinical trials will test treatment options driven by this complex profiling.

DARIAH ERIC mission is to empower research communities with digital methods to create, connect and share knowledge about cul - ture and society. Its distributed structure is optimally constructed to support digital transformation across its network.

In its 2021 round, the European Social Survey ERIC is including a special module on ‘digital social contacts at work and in family life’. The module will include items on opportunities for access to digital communication (e.g. Internet access at home), the need for them (e.g. lower co-residence) and trust in digital social contact (e.g. pri - vacy concerns), as complements to questions on workplace culture and available country information (e.g. on work related state poli - cies). These are likely to shape individual agency to establish digital social contact in a way that it facilitates work-life balance and en - courages relationship quality or well-being. The ESS is also building infrastructure for an on-line panel of the future with the hope to link a representative sample in a digitally designed Research Infrastruc - ture of the future.