Data-driven Discovery

UMC develops methods for data-driven discovery, facilitating the analysis of high volumes of incoming reports of potential side effects of medicines.

Detailed review of large numbers of reports may not be a feasible or effective approach to signal detection and assessment. Methods for data-driven discovery help guide the attention of human experts towards the most interesting and relevant case series to review and reveal important reporting patterns impacting signal assessment.

Key data-driven research at UMC

Constellations of clinically related adverse events

An area of intense research focus for UMC is methods to better represent and reflect information on adverse events. With vigiGroup cluster analysis, we seek to discover patterns in data from adverse event reports based on all recorded signs, symptoms, and diagnoses. Additionally, the algorithm groups adverse event reports based on the clinical conditions described in them, like human experts would.

With vigiVec distributional semantics, we can obtain data-driven vector representations of medicines and adverse events based on reporting patterns in VigiBase. This enables grouping of adverse events that tend to be reported in similar contexts, based on their position in the data-driven vector space and independent of their position in a hierarchical terminology like MedDRA.

Learn more:

Uppsala Reports article – 2020

Detecting the unexpected (YouTube presentation) 2020

Consensus clustering paper in Artificial Intelligence in Medicine, 2021

Improving signal detection with vigiGroup, 2021 podcast

Infographic - evaluating vigiGroup

Infographic - vigiGroup overview

Please accept cookies to view this video

Change cookie settings

Learn more:

A method for data-driven exploration to pinpoint key features in medical data and facilitate expert review, 2017

A feasibility study of drug-drug interaction signal detection in regular pharmacovigilance, 2020

Risk factor considerations in statistical signal detection, 2020

Risk factor identification

Knowing which individuals are at risk of developing specific side effects, can help patients and healthcare professionals make wiser therapeutic decisions in their use of medicines. Researchers at UMC are actively proposing and evaluating approaches to highlight patterns in observational medical data that may reflect risk factors related to specific side effects. Examples include drug-drug interactions, pharmaco-ethnic vulnerabilities, age- and gender-associated risks, pregnancy, and body mass index. UMC has also developed vigiPoint, a method for data-driven exploration of reporting patterns that can be used for this and several other purposes.

Statistical signal detection

Effective analysis of large collections of individual case reports may rely on statistical signal detection methods to direct and facilitate expert clinical review. Since the early 2010s, UMC has utilised vigiRank – a predictive model to help select case series for clinical review – in our signal detection work. It combines different aspects of strength of evidence, including quality and clinical content of individual reports, as well as trends in time and geographic spread. A similar algorithm for drug-drug interaction surveillance has also been developed and used.

Learn more:

The development and evaluation of triage algorithms for early discovery of adverse drug interactions, 2013

Improved statistical signal detection in pharmacovigilance by combining multiple strength of evidence aspects in vigiRank, 2014

vigiRank for statistical signal detection in pharmacovigilance: first results from prospective real-world use, 2017

Learn more:

A Bayesian neural network method for adverse drug reaction signal generation, 1998

Shrinkage observed-to-expected ratios for robust and transparent large-scale pattern discovery, 2013

Disproportionality analysis for pharmacovigilance signal detection in small databases or subsets, 2020

Disproportionality analysis

UMC was one of the first organisations to develop and deploy database-wide disproportionality analysis in the 1990s. Disproportionality analysis offers a systematic approach to identifying combinations of medicines and adverse events with higher reporting rates than may be expected and is based on how often each medicine and adverse event is reported in a database. It remains a core component of UMC’s approach to data-driven discovery in pharmacovigilance and is one of the aspects considered by vigiRank.

Confounding and heterogeneity

Disproportionality analysis is vulnerable to bias and confounding. Reporting patterns in large collections of individual case reports differ over time, by geographic regions and across demographic groups, which can lead to both missed and artificial associations. UMC has developed and evaluated several methods to mitigate these effects. They include shrinkage logistic regression to reduce masking by and signal leakage from co-reported medicines and vaccines, systematic use of subgrouping and stratification to reveal patterns unique to specific categories of reports, as well as a simple approach to unmasking associations that have been hidden by massive reporting of related medicine—adverse event combinations.

Learn more:

Large-scale regression-based patter discovery: the example of screening the WHO global drug safety database, 2010

Robust discovery of local patterns: subsets and stratification in adverse drug reaction surveillance, 2012

Outlier removal to uncover patterns in adverse drug reaction surveillance – a simple unmasking strategy, 2013

Performance of stratified and sub-grouped disproportionality analyses in spontaneous databases, 2016