Data-driven Discovery

UMC develops methods for data-driven discovery, facilitating the analysis of high volumes of incoming reports of potential side effects of medicines.

Detailed review of large numbers of reports may not be a feasible or effective approach to signal detection and assessment. Methods for data-driven discovery help guide the attention of human experts towards the most interesting and relevant case series to review and reveal important reporting patterns impacting signal assessment.

Key data-driven research at UMC


Constellations of clinically related adverse events

An area of intense research focus for UMC is methods to better represent and reflect information on adverse events. With vigiGroup cluster analysis, we seek to discover patterns in data from adverse event reports based on all recorded signs, symptoms, and diagnoses. Additionally, the algorithm groups adverse event reports based on the clinical conditions described in them, like human experts would. 

With vigiVec distributional semantics, we can obtain data-driven vector representations of medicines and adverse events based on reporting patterns in VigiBase. This enables grouping of adverse events that tend to be reported in similar contexts, based on their position in the data-driven vector space and independent of their position in a hierarchical terminology like MedDRA. 



Risk factor identification 

Knowing which individuals are at risk of developing specific side effects, can help patients and healthcare professionals make wiser therapeutic decisions in their use of medicines. Researchers at UMC are actively proposing and evaluating approaches to highlight patterns in observational medical data that may reflect risk factors related to specific side effects. Examples include drug-drug interactions, pharmaco-ethnic vulnerabilities, age- and gender-associated risks, pregnancy, and body mass index. UMC has also developed vigiPoint, a method for data-driven exploration of reporting patterns that can be used for this and several other purposes.   




Statistical signal detection

Effective analysis of large collections of individual case reports may rely on statistical signal detection methods to direct and facilitate expert clinical review. Since the early 2010s, UMC has utilised vigiRank – a predictive model to help select case series for clinical review – in our signal detection work.  It combines different aspects of strength of evidence, including quality and clinical content of individual reports, as well as trends in time and geographic spread. A similar algorithm for drug-drug interaction surveillance has also been developed and used.






Disproportionality analysis

UMC was one of the first organisations to develop and deploy database-wide disproportionality analysis in the 1990s. Disproportionality analysis offers a systematic approach to identifying combinations of medicines and adverse events with higher reporting rates than may be expected and is based on how often each medicine and adverse event is reported in a database. It remains a core component of UMC’s approach to data-driven discovery in pharmacovigilance and is one of the aspects considered by vigiRank.





Confounding and heterogeneity

Disproportionality analysis is vulnerable to bias and confounding. Reporting patterns in large collections of individual case reports differ over time, by geographic regions and across demographic groups, which can lead to both missed and artificial associations. UMC has developed and evaluated several methods to mitigate these effects. They include shrinkage logistic regression to reduce masking by and signal leakage from co-reported medicines and vaccines, systematic use of subgrouping and stratification to reveal patterns unique to specific categories of reports, as well as a simple approach to unmasking associations that have been hidden by massive reporting of related medicine—adverse event combinations.





Last modified on: February 22, 2024