SENSIBLE (PRIN PNRR 2022)
Small-data Early warNing System for viral pathogens In puBLic hEalth
SENSIBLE, PRIN PNRR 2022 by Italian MUR, grant n. P2022CNN2J
Visit https://sensible-prin.github.io/ for more info!
The objectives
SENSIBLE aims to address the following three objectives that are tightly linked with the project concept and methodology:
1) Derive effective methods for data-driven identification of emerging viral pathogens;
2) Build an objective framework for genomic surveillance in current and future epidemics; and
3) Implement an early warning system, to assist decision making in healthcare.
The team
SENSIBLE was born as a collaboration between my research group (Anna Bernasconi, project’s PI) at the Department of Electronics, Information, and Bioengineering (Politecnico di Milano) and the group of Prof. Matteo Chiara (co-PI) at Department of Biosciences (University of Milan).
The two research units will liaise on the three objectives of SENSIBLE, by co-designing methods for the data-driven identification of emerging viral pathogens, building the genomic surveillance framework, and implementing the early warning system.
Our units leverage on a long-standing collaboration based on our interest in common topics, addressed from complementary perspectives – allowing an organic approach to research. Our joint work has already produced ViruClust, an application for comparing SARS-CoV-2 genomic sequences and lineages in space; VariantHunter, which monitors the evolution of mutations and indicates possible emerging variants; and RecombinHunt, a method for identifying recombination events in different viral species.
Funding and duration
SENSIBLE has been funded within the MUR PRIN PNRR 2022 scheme; it has received funding for ~240K Euros (186K to Politecnico di Milano Research Unit and 54K to University of Milan Research Unit). The research will be carried out December 2023 — November 2025.
Motivation and context
In March 2020, COVID-19 was declared a global pandemic. The research community mounted an unprecedented effort for understanding the disease and its etiological agent, delivering effective diagnostics, planning vaccination programmes, and informing decision making and public health policies. In this context, genomic surveillance, the study of the evolution of a pathogen through the sequencing of its genome, was universally recognized as a first line of defense to contrast the pandemic. All viruses mutate as they replicate and spread in a population; the majority of mutations are not relevant from an epidemiological perspective. However, epidemiologically relevant mutations might confer a selective advantage and are rapidly fixed in the pathogen genome, leading to the emergence of “variants of interest” or “variants of concern”.
Global pandemics are an accelerating threat: loss of habitats, urbanization, and globalization create an environment conducive to infectious disease outbreaks and spread. In the wake of climate change, disease vectors such as Asian tiger mosquitos are now endemic in Europe – and linked to outbreaks of diseases (e.g., Zika virus, Chikungunya virus).
SENSIBLE aims to leverage the knowledge gained on COVID-19 for building novel methods that can handle and analyze pathogens’ genome sequencing data in current and future viral epidemics, and implement an early warning system based on data-driven analysis.
SENSIBLE will develop an integrated framework for genomics surveillance of human pathogens, based on the integration of: 1) data-based analysis to summarize patterns of evolution through space and time; 2) data and knowledge-based analysis (retrieval, computation, or prediction) to formulate testable biological hypotheses and identify epidemiologically relevant evolutionary events (positive selection, change in protein function/affinity, immune escape). The framework will be developed and validated using a selection of use cases from COVID-19; a final assessment will be performed on independent data from the recent Monkeypox (2022), Zika (2015-2016), and Ebola (2013-2016) epidemics.
SENSIBLE will advance the state of the art in understanding the various facets of genomic surveillance, depending on the available data and the domain context. We will merge the experience gathered on COVID-19 with the considerable knowledge corpus that has become openly available for viral pathogens research. The project results will have a substantial impact in the early characterization of novel viral pathogens and of their dangerousness in terms of prevalence, infectivity, and transmissibility. More importantly, the framework developed by SENSIBLE will provide a highly useful tool to assist decision makers in healthcare.
Our assets
The interplay of genomic data, clinical data, and epidemiological data offers a fertile ground for data science problems. We are interested in problems with well-grounded hypotheses that correspond to big data challenges. During the three years of the COVID-19 pandemic, we have familiarized myself with the bioinformatics aspects of virology and engaged in several discussions with experts in the domain. Important work has been produced on data integration and visualization systems, as well as on data analysis of the immunological impact of mutations, the correlation between mutations co-occurrence and viral evolution, recognition of variants in their early stages, and monitoring of the evolution of mutations [see Publications]. Collectively, these results demonstrate a great opportunity for data-driven services for viral genomic surveillance. This line of research will continue to grow thanks to the recently started SENSIBLE PRIN project. Our last works target the possibility of identifying viral recombination events on a pandemic scale (e.g., SARS-CoV-2 and Monkeypox) and the detection of reassortment events (e.g. in Influenza viruses), in collaboration with the renowned virologist Prof. Ilaria Capua (Johns Hopkins University).
Objectives
SENSIBLE aims to empower pandemic preparedness by building a general system for the genomic surveillance of pathogens in current and future viral human epidemics. The main outcome of the project will be an automated early warning system, based on data-driven analysis of small data, which will act as early as possible, to identify emerging viral pathogens and/or novel lineages of known pathogens that might pose an immediate risk for human health.
SENSIBLE aims to pave the way to a better strategy of genomic surveillance. To pursue this ambitious aim, we will target two settings:
– Regional settings, which need to properly monitor epidemics in their local territory and organize alerting methods for the bigger (country-level) organizations. Lombardia, the region of Italy where the COVID-19 pandemic originally spread, will provide the ideal use case for this scenario.
– Low-resources settings. The WHO has recently analyzed the context of a number of African countries where attempts to monitor COVID-19 and SARS-CoV-2 evolution have been undermined by under-staffed infrastructures and lack of resources.
Expected results
SENSIBLE aims to shift current paradigms in pathogens genomic surveillance. We move away from the traditional approach based purely on monitoring of the prevalence of viral variants and lineages, based on big data and, instead, we propose an integrated approach, that by leveraging a collection of different key evolutionary and epidemiological features will allow a more complete understanding of an epidemic, produce more informative results, and without the need for a large amounts of data.
We will tackle two key epidemiological questions and identify key metrics for raising alerts and early warnings in both scenarios:
– Minimal actionable data. What is the minimal amount of data production/availability required to set up an effective surveillance system? Can genomic surveillance be applied even in scarce/low resources settings? Notwithstanding the recent experience with COVID-19, these questions remain largely unanswered. We aim to provide resource-aware recommendations/guidelines and quantifiable metrics to assist health authorities in the set-up of “minimal” pathogen’s surveillance systems.
– Prioritization/ranking of emerging pathogens. If a new mutation or pattern of mutations arise in a human pathogen, how does this impact its epidemiological features? Our first focus in this case are changes that may provoke increased transmissibility, contagiousness, infectivity or immune system evasion. We plan to build a scoring system to rank and prioritize emerging virus/viral variants with enhanced epidemiological features.