You are on page 1of 2

ZoomRx | Data Science Case Study

Background
ZoomRx is working on a product which requires to accurately process and classify biomedical
literature. Mapping biomedical short forms (​eg: HIV​) with their long forms (​eg: Human
Immunodeficiency Virus​) is an important component of this process.

Case Objective
We want you to come with a scalable solution which recognizes, extracts, and maps the short
form with the corresponding long form in a given sentence.

Training Dataset
Input Output

Id Sentence Short Form Long Form

1 The cerebrospinal fluid (CSF) is contained in the CSF c​erebro​s​pinal ​f​luid


brain ventricles and the cranial and spinal
subarachnoid spaces.

2 Another momentous occasion was the creation of PWA P​eople​ ​w​ith​ ​A​IDS
the People with AIDS Coalition (PWA Coalition) Coalition Coalition
in 1983 out of a meeting of HIV/AIDS patients in
SF, USA.

3 GH-releasing peptide (GHRP) was shown to GHRP GH​-​r​eleasing ​p​eptide


possess strong GH-releasing activity both in vitro
and in vivo.

4 To study the use of plasma concentrations (PCs) PCs p​lasma


as a tool for monitoring the effectiveness and c​oncentration​s
safety of vancomycin treatment.

5 With this knowledge, we examined the utility of GVG g​amma-​v​inyl​G​ABA


gamma-vinylGABA (GVG), a selective and
irreversible inhibitor of GABA-transaminase.

6 This antibody reacts with Bromodeoxyuridine BrdU Br​omo​d​eoxy​u​ridine


(BrdU) in single stranded DNA.

7 The main active ingredient in both Harvoni® and NA NA


Epclusa® is Sofosbuvir.
Evaluation Process
Once you are ready, we expect your application to read an input csv file with sentences and
populate the short form and long form columns with reliable data.

Input CSV File Layout & Sample Data

Id Sentence Short Form Long Form

1 The Food and Drug Administration (FDA) is a government


agency established in 1906 with the passage of the Federal
Food and Drugs Act.

2 Hepatitis C (Hep C) is a chronic liver disease caused by the


Hepatitis C virus (HCV).

3 The hydrolysis of lipids in human high-density lipoprotein


(HDL) by esterases first separated on a polyvinylidene
fluoride membrane.

You might also like