Professional Documents
Culture Documents
#DD4SAIS
CONTENTS
pipeline = pyspark.ml.Pipeline(stages=[
document_assembler,
tokenizer,
stemmer, Spark NLP annotators
normalizer,
stopword_remover,
tf, Spark ML featurizers
idf,
lda]) Spark ML LDA implementation
Single execution plan for
topic_model = pipeline.fit(df)
the given data frame
Case study: Demand Forecasting of Admissions from ED
Features from Structured Data Reason for visit Current wait time
Age Number of orders
• How many patients will be admitted today? Gender Admit in past 30 days
• Data Source: EPIC Clarity data Vital signs Type of insurance
Case study: Demand Forecasting of Admission from ED
ML with NLP
github.com/melcutz/nlu_tutorial
Tokenizer Normalizer
Different Vocabulary Lemmatizer Fact Extraction
ShARe /
90.30% CLEF Disease & problem norm.
92.29% NCBI Disease norm. in literature
Dataset Metric
94.17% Mirco-averaged F1
4th i2b2/VA
79.76% Marco-averaged F1
althomas@indeed.com david@pacific.ai
in/alnith/ in/davidtalby
@davidtalby
#DD4SAIS 20