You are on page 1of 27

Accelerating patient engagement using

Clinical Trial Big Data and Machine Intelligence

Session: Leveraging Pharma Intelligence Data and Statistical


Modeling to Inform Project Strategies
Shameer Khader, PhD
Senior Director, Data Science and Artificial Intelligence
AstraZeneca

@kshameer
2
Disclaimer – Content Slide
The views and opinions expressed in the following PowerPoint
slides are those of the individual presenter and should not be
attributed to Drug Information Association, Inc. (“DIA”), its directors,
officers, employees, volunteers, members, chapters, councils,
Communities or affiliates, or any organization with which the
presenter is employed or affiliated.
These PowerPoint slides are the intellectual property of the
individual presenter and are protected under the copyright laws of
the United States of America and other countries. Used by
permission. All rights reserved. Drug Information Association, Drug
Information Association Inc., DIA and DIA logo are registered
trademarks. All other trademarks are the property of their
respective owners.

3
From Biology to Therapy to Healthcare via Data

Traditional Approach Data-driven Approach


Healthcare Biology
• Traditional data types • Evolving data types
• Centralized • Decentralized
• GBs or TBs in size • Petabytes, exabytes…
• Structured • Semi or unstructured
• Stable data model • Evolving, flat data model
• Low-dimensional • High-dimensional
• Statistical approaches • Machine or deep learning
• Cohort size (~10K) Medicine Chemistry • Large cohort size (>10K)
• Hypothesis-driven • Data-driven

Pharmacology

K. Shameer et. al(2018) Machine learning in Cardiovascular Medicine, Are we there yet?
BMJ Heart pii: heartjnl-2017-311198. doi: 10.1136/heartjnl-2017-311198. 4
Role of AI in Clinical Trial Design, Planning, and Execution using
Machine Intelligence
Drug, Target or Companion
Diagnostics Development Clinical Trial Planning,
Costing and Optimization

Cohort Composition, Forecasting Clinical Acuity


Phenotyping, and Patient of Patient in Clinical Trials
Identification

Novel data modalities- Phenotyping and preemptive


5
driven drug development Prioritization of Pharma – HCP
Aritificial Intelligence in Cardiology; J Am Coll Cardiol. 2018 Jun, 71 (23) 2668-2679. 5
Data sources for real-world biomedical, healthcare and
pharma data

BioMe™ BioBank Program

Table from K. Shameer et. al: http://bib.oxfordjournals.org/content/early/2016/02/13/bib.bbv118.long

6
Emerging Technology landscape
MEASURE_VALUE

Alert 277,536
Wide Awake 51,390
Lethargic 24,551
Responds to nox ious stimuli 16,194
Sleeping, R esponds to stimuli 12,954
Alert;A ge appropriate 9,862
Dozing 5,565
Other (Com ment) 4,868
Comatose 3,337
Drowsy 3,022
Age appropriate 1,354
Alert;Responds to nox ious stimuli 437
Lethargic;Responds to nox ious stimuli 399
Alert;Lethargic 311
Age appropriate;Alert 223
Alert;Responds to nox ious stimuli;Age ap… 194
Responds onl y to tactile stimuli 160
Alert;Other (Com ment) 149
Lethargic;Other (Com ment) 76
Responds to nox ious stimuli;Lethargic 75
Lethargic;Alert 58
Responds to nox ious stimuli;Alert 47
Unrespons ive 38
No respons e to tactle stimuli 35
Responds to nox ious stimuli;O ther (Com me… 35
Lethargic;A ge appropriate 32
Patient abl e to c ount months bac kwards f… 27
Alert;Age appropriate;Responds to nox iou… 17
Alert;Patient abl e to count bac kwards fr… 15
Patient abl e to count bac kwards from 20 …
Lethargic;Com atose
Comatose;Other (Com ment)
13
12
10
MEASURE_VALUE
Other (Com ment);Alert 10
Other (Com ment);R esponds to nox ious stim… 10
Patient abl e to count bac kwards from 20 … 10
Patient abl e to c ount months bac kwards f… 10
Alert;Age appropriate;Patient abl e to co… 9
Responds to nox ious stimuli;Age appropri… 9
Lethargic;Responds to nox ious stimuli;Ag… 7

Alert 339,899
Other (Com ment);Lethargic 6
Alert;Com atose 5
Alert;Lethargic;Com atose 5
Age appropriate;Other (Com ment) 3
Alert;Age appropriate;Other (Com ment)
Alert;Age appropriate;Patient abl e to co…
Alert;L ethargic;A ge appropriate
Alert;Patient abl e to count m onths bac kw…
3
3
3
3
Voice 46,588
Comatose;Responds to nox ious stimuli
Alert;Lethargic;Responds to nox ious stim…
Alert;Patient abl e to count bac kwards fr…
3
2
2
Pain 16,963
Unresponsive 3,448
Comatose;Lethargic 2
Lethargic;Responds to nox ious stimuli;O t… 2
Other (Com ment);Com atose 2
Responds to nox ious stimuli;Com atose 2
Age appropriate;Alert;Lethargic
Age appr opriate;Responds to nox ious stim…
Age appr opriate;Responds to nox ious stim…
Alert;Age appropriate;Patient abl e to co…
1
1
1
1
Missing 7,616
Alert;Lethargic;Other (Com ment) 1
Alert;Responds to nox ious stimuli;Age ap… 1
Alert;Responds to nox ious stimuli;Lethar… 1
Alert;Responds to nox ious stimuli;Other … 1
Alert;Responds to nox ious stimuli;Patien… 1
Alert;Responds to nox ious stimuli;Patien… 1
Comatose;Responds to nox ious stimuli;Oth… 1
Mute 1
Other (Com ment);Alert;A ge appropriate 1
Responds to nox ious s timuli;Alert;Lethar… 1
Responds to nox ious stimuli;Alert;Other … 1
Responds to nox ious stimuli;L ethargic;Co… 1
l;Lethargic 1
Missing 1,390

Data from nursing Text-based description of Mapping to AVPU


flow sheets! neurological status of patients! Schema

• AI
• Phenotyping
• NLP/Text-
Mining
• NoSQL
• Data Model S"1$
T"1$ T"2$ T"i$

• Integration
S"2$

S"3$

:$

• Governance
:$

:$

S"n$

• Security
Badgeley MA & Shameer K, et.al; . BMJ Open. 2016 Mar 24;6(3):e010579. doi: 10.1136/bmjopen-2015-010579
K. Shameer et. al Mount Sinai Health Base (Unpublished data; CONFIDENTIAL)
Brief Bioinform. 2016 Feb 14. pii: bbv118. PubMed PMID: 26876889. 7
Endless possibilities; Two Key ideas

• Predictive Modeling of Trial-level


and Enrollment Trends

• Trial-level and Individual-Patient


Level Dropout Modeling and Driver
Analytics

8
Enhancing Clinical Trial Enrollment Lifecycle using
Algorithm Driven Approaches
Aim 1: Develop predictive
methods to improve patient
enrollment rates across
therapeutic areas

Aim 2: Reduce screen failures


and improve randomization rates
through machine learning
techniques

Aim 3: Reduce dropout rates


through machine learning based
dropout prediction models
9
1)

Predictive Modeling of Trial-level and


Individual-Patient Enrollment Trends

10
Need for Predictive Modeling of Patient Enrollment

11
Predictive Modeling Workflow

12
Feature selection, Cross-validation and Model evaluation

13
Results
Rate of Enrollment is negatively affected by Adverse Events

14
2)

Trial-Level and Patient-Level Dropout


Modeling and Driver Analytics

15
Motivation
• High patient attrition rates result in insufficient clinical sample
size and low power of treatment efficacy testing

• Early identification of hidden factors of attrition is important

• Constructive retention strategies with more effective interventions


• More efficient planning for clinical trial interim analysis

• Predicting patient level attrition provides significant information


during trial planning and execution

16
Trial-level Patient Dropout Modeling and Driver Analytics

17
Submitted to NPJ Digital Health
Analytics Workflow for Patient Dropout Modeling &
Analytics

18
Submitted to NPJ Digital Health
Cohorting: Mining Clinical Trial Databases (AACT)
o Reduce the number of categories in the AACT AACT
Elastic Fuzzy Mapping: “Protocol Deviation”

Protocol Deviation 12.296675


16,191 unique
Drop Withdrawal Table by utilizing modern NLP

High
protocol deviation 11.855679
Reasons Protocol deviations 11.429338
Protocol deviation 11.250852

tools. 22.9%

Example Matches
Inclusion Criteria Deviation 6.9321914
Reduction @
Eligibility Criteria Deviation 6.508074
Threshold of 5
Deviation with contract date 6.2381854

Mid
Inclusion/exclusion criteria deviation 6.22583
Deviation from the Eligible Criteria 5.660916

o We employed the following techniques: Case Normalization,


Fuzzy Mapping
Deviation from exclusion criteria 5.6546254

• NLP Preprocessing 14,482 unique Did Not Qualify for Part D Per Protocol 2.5466282
Patient unwilling to comply with protoco 2.54537
Reasons Protocol closed due to lack of accrual. 2.5376623

Low
• ”Fuzzy Matching” (Levenshtein Distance) in
protocol changed due SC changed to cord 2.5376623
Elastic Fuzzy Matching to Top 20 Follow up phase competed as per protocol 2.5376623
categories has 74.6% coverage non-protocol change to ESA or IV iron 2.4748828
Treated 1 early and 1 late out of protoc 1.5384884

both Python and using Elastic


• Spelling Correction
• Semantic Matching using Facebook AI’s AACT
Levenshtein Distance (Fuzzy Matching) Example:
“Protocol Deviation”
16,191 unique
FastText vector-space model Reasons
11% Reduction

Example Matches
o Achieved Case Normalization
14,482 unique
o 93% reduction in the complexity of terms Reasons

used for drop-withdrawal 29% Reduction

o 98% coverage of the dataset using the top 20 Fuzzy Matching


11,536 unique
unique mapped categories

19
Uncovering Machine Learning-Ready Data from Public Clinical Trial Resources: A case-study on normalization across Aggregate Content of ClinicalTrials.gov (2021) IEEE BIBM
Semantic matching w/ Vector Space Models
Vector Space Model (FastText)*
“study was terminated due to pi departure”
Post fuzzy matching
11,536 unique

93% Reduction
VSM Model
(Semantic Similarity)
1,057 unique

Semantic Mapping to Top 20


categories has 98% coverage

1 million word vectors trained with subword infomation on Wikipedia 2017,


UMBC webbase corpus and statmt.org news dataset (16B tokens).
20
Uncovering Machine Learning-Ready Data from Public Clinical Trial Resources: A case-study on normalization across Aggregate Content of ClinicalTrials.gov (2021) IEEE BIBM
Feature selection, Cross-validation and Model evaluation

21
Submitted to NPJ Digital Health
Key factors driving patient attrition in clinical trials:

• Patient Level Attrition Modelling using IPLD • Higher prevalence of Adverse


Events in Cancer, Fibrosis,
• Expansion to other TAs (Oncology, CVRM) Pulmonary Hypertension
• Explore and predict point hazard in patient • Longer Trials and Longer Treatment
attrition for multiple studies cause higher Patient Attrition
• Incorporate more potential factors and • Attrition differs by Disease, i.e.
complicated associations for point hazard higher in Cancer, Fibrosis,
Pulmonary Hypertension than others
• Integrate the model into Clinical Control Tower
Dashboard

22
Submitted to NPJ Digital Health
Patient Attrition: From Trial-level to Patient-level (Time to
Event) Models

23
Predictive Modeling of Personalized Clinical Trial Attrition using Time-to-event Approaches – ECML/PKDD/PharML 2020
Time to Attrition Modeling and Exploration

24
Predictive Modeling of Personalized Clinical Trial Attrition using Time-to-event Approaches – ECML/PKDD/PharML 2020
Summary
• Conclusion

• Patient level attrition modeling could be formulated within time-to-event modeling


framework

• Hazard mechanism in this case cannot be captured by flexible parametric models, it


may need more flexible nonparametric modeling approaches

• Cox model has the potentiality to be applied in this business use case

• Limited signal has been found as the factors that significantly affect patient attrition
hazard

• Future Directions

• Data augmentation : sample size , feature size

• Advanced modeling : Nonparametric hazard modeling, Cox random forest, Deep neural
network survival 25
Conclusions & Future Outlook
• Developing a range of AI-driven approaches to accelerate different facets of drug development
and clinical trials

• Focus on augmenting and AI-driven transformation of manual process using data-and algorithm
driven approaches: Developing a clinical trial specific vector embedding (ClinicalTrials2Vec)
and graph modeling (TrialGraph) for rapid mining of clinical trial data assets (Example:
inclusion/exclusion criteria, protocol/patient similarity, trial-level results) may help to digitally
accelerate the adoption of data-driven phenotyping.

• For example: Currently, digital phenotyping is evolving as an emerging approach; Developing


validated e-phenotyping algorithms is not scalable for thousands of diseases

• Leveraging insights from historical data using a predictive framework to improve the success
metrics of future trials and enhance patient and provider engagement

• We are always looking for collaborators!

26
Thank You

Shameer Khader, PhD


Senior Director
Data Science and Artificial Intelligence
@kshameer

Join the conversation #DIA2021

27

You might also like