Pilot study for ontology-based

analysis of INPC data:
Final report
T. Schleyer, A. Ruttenberg, B. Duncan,
F. Smith, A. Roberts
Original project goals
• select several representative research
questions that use INPC data
• model the data needed for these questions in
an ontology
• replicate data retrieval/analysis using SPARQL
and R
• compare understandability, documentation,
query complexity, workflow and extensibility
INPC data analysis – current workflow
Receive request for data
Search for data
Return results to requestor
- Often a list of terms/criteria or a brief written description
- Sometimes a spreadsheet of codes (ICD9, CPT, etc) to search for
- Find any codes needed (e.g. look up medications by name or class)
- Map between coding systems (e.g. ICD9 to Regenstrief dictionary)
- Have requestor review codes
- Perform search across numerous tables, some of which duplicate information
- Iterative process – refine and re-run query
Challenges with current process
• data managers = “Greek Oracle”
• relational database a technical/idiosyncratic
construct (e.g. naming constraints,
normalization, performance)
• meaningful, real-time interaction about data
difficult
• little to no opportunities to leverage external
data representation resources
• hard to detect problems

Relational databases and hidden meaning
sys_id is coding system, such as ICD9,
local codes, SNOMED, LOINC.
code is the actual code, like ICD 920.1.
service_code is the question that the record is the answer to.
top_parent_service_code is the code of the parent question.
value_type… indicates what the type is for the data (coded, numeric, etc.).
Sample query: Vending Machine:10 Breast cancer

WHERE
SERVICE_SYS_ID=1 and ((SERVICE_CODE in
('189' /*DX and COMPLAINTS*/, '4569'/*E.R. DIAGNOSIS*/, '4966'/*HOSP
DX*/, '7076'/*DX LISTS*/,
'7686'/*HOSP HX*/, '7909'/*DISCH DX*/, '9950'/*REHAB DX*/,
'9951'/*ORTHO DX*/, '9952'/*SURG DX*/, '9953'/*ENT DX*/, '9954'/*EYE DX*/,
'9955'/*DERM DX*/, '9956'/*NEURO DX*/, '7909'/*Disch Dx*/,
'14360'/*OB Discharge Diagnosis*/,
'36129'/*Axis IV Discharge Dx*/, '3871'/*Initial Dx*/,
'16501'/*Discharge Dx/Prob*/, '19825'/*Ekg.Cart.Dx*/, '21827'/*DIDS
DX*/,
'21669'/*ANDROLOGY DIAGNOSIS */, '22813'/*VISIT DIAGNOSIS*/,
'21237'/*Primary Care Dx*/,
'19788'/*Preoperative Diagnosis*/, '37081'/*OB Triage Admission
Diagnoses */,
'37086'/*OB Triage Discharge Diagnoses*/)
and (upper(VALUE_TEXT_FOR_DISPLAY)='BREAST CA') ) …
What are ontologies?
• represent domains through classes and their
relationships
• Each class in an ontology has a defined and
unique meaning.
• Properties are semantic relationships among
classes, e.g.:
– simple: "Patient” “has”: “gender,” “age”
– complex: “is_a,” “is_treated_by,” etc.

Example: Oral Health and Disease Ontology

(http://code.google.com/p/ohd-ontology/, http://www.ontobee.org/browser/index.php?o=OHD)
OHD - Caries finding

OHD – tooth restoration procedure

OHD – tooth

Reusing other ontologies

Finding breast cancer drugs
• First we find cancer patients by querying for patients that:
– have a cancer diagnosis ICD9 code
– have a concept code in clinical variable that identifies a cancer
diagnosis
• Found a total of ~1500 patients for the 1 year of records we
have.
• We search the pharmacy_order table for prescriptions to
cancer patients:
– About 39,000 total – 26,000 have NDC codes, 13,000 don’t!
– The 13,000 prescriptions comprise ~400 prescription types
– Examples include: MORPHINE SUL TAB 30MG ER, NAMENDA
TAB 5MG, NITROFURANTN CAP 100MG
– Note that queries done at Regenstrief typically will miss 1/3 of
the prescriptions.

Components
• Cancer patients
• Prescriptions for them
• Diagnoses of them
• ICD9 Hierarchy
• NDF-RT OWL translation
• Mapping of NDC to RxNorm
• Mapping of RxNORM to NDF-RT
Representation choices
• Codes are information artifacts, about whom or
what they are coded.
• Patients are actual patients.
• NDF-RT are actual drugs.
• Prescriptions are directive information entities.
• OBO Ontologies: OBI, IAO, OGMS, OMRSE
• Other ontologies/documents: NDF-RT, ICD9
• Web services: RxNorm API
• Store: OWLIM SE, Hoerst
Key Leverage
• Use of NDF-RT hierarchies and relations
– Ingredients
– Physiological effects
– Therapeutic classes
– Cause ‘May treat’, Mechanism of Action
• Use of ICD9, limited as it is
– Leverage classification to be able to compute
“malignant neoplasm” = neoplasms – “benign
neoplasms”
• Transparency of data artifacts
– Data team has learned about structure in process.

RxNorm to NDF-RT
• Restricted to cancer patients in 1 year
• Find all prescription NDC codes
• Use internal concept mapping to get 1037
RxNorm codes
• Use NDF-RT to get 47488 NDF-RT<->RxNorm
mappings using SPARQL against OWL NDF-RT


prefix rxcui: <http://purl.obolibrary.org/n/NDFRT_C818>
SELECT ?class ?rxnorm
WHERE {
?class rcxcui: ?rxnorm .
}




RxNorm to NDF-RT
• 328 RxNorms not in NDF-RT derived map
• Use the RxNorm WEB API to find:
– more general term
– or, remapped term
• more general term of remapped term
• remapped, remapped term
• more general term of remapped, remapped term
• and add mapping if found
• Leaving: 21 unmapped terms

RxNorm to NDF-RT mapping

1037 tried

1016 successful
• 9 have RxNorm codes that can’t be resolved
– 207982,309937,311945,314058,314265,404282,562715,845521,966533
• 12 were not mapped
– 0.5 ML Influenza A virus vaccine, A-California-7-2009 (H1N1)-like virus 0.12 MG/ML / Influenza A virus vaccine, A-Victoria-361-2011
(H3N2)-like virus 0.12 MG/ML / Influenza B virus vaccine, B-Wisconsin-1-2010-like virus 0.12 MG/ML Prefilled Syringe [Fluzone
High-Dose 2012-2013 Formula]
– Coal Tar 200 MG/ML Topical Solution
– Influenza A virus vaccine, A-California-7-2009 (H1N1)-like virus 0.03 MG/ML / Influenza A virus vaccine, A-Victoria-361-2011
(H3N2)-like virus 0.03 MG/ML / Influenza B virus vaccine, B-Wisconsin-1-2010-like virus 0.03 MG/ML Injectable Suspension [Fluzone
2012-2013 Formula]
– Isopropyl Alcohol 0.7 ML/ML Medicated Pad [BD Alcohol]
– Isopropyl Alcohol 0.7 ML/ML Medicated Pad
– POLYETHYLENE GLYCOL 3350 105 MG/ML / Potassium Chloride 0.00497 MEQ/ML / Sodium Bicarbonate 0.017 MEQ/ML / Sodium
Chloride 0.0479 MEQ/ML Oral Solution [NuLytely]
– POLYETHYLENE GLYCOL 3350 105 MG/ML / Potassium Chloride 0.00497 MEQ/ML / Sodium Bicarbonate 0.017 MEQ/ML / Sodium
Chloride 0.0479 MEQ/ML Oral Solution [TriLyte]
– POLYETHYLENE GLYCOL 3350 59 MG/ML / Potassium Chloride 0.01 MEQ/ML / Sodium Bicarbonate 0.02 MEQ/ML / Sodium Chloride
0.025 MEQ/ML / sodium sulfate 0.04 MEQ/ML Oral Solution [Gaviltye-G]
– POLYETHYLENE GLYCOL 3350 59 MG/ML / Potassium Chloride 0.01 MEQ/ML / Sodium Bicarbonate 0.02 MEQ/ML / Sodium Chloride
0.025 MEQ/ML / sodium sulfate 0.04 MEQ/ML Oral Solution [Golytely]
– POLYETHYLENE GLYCOL 3350 59 MG/ML / Potassium Chloride 0.01 MEQ/ML / Sodium Bicarbonate 0.02 MEQ/ML / Sodium Chloride
0.025 MEQ/ML / sodium sulfate 0.04 MEQ/ML Oral Solution
– Prednisone 10 MG Oral Tablet
– hydrocortisone acetate 10 MG/ML / Pramoxine hydrochloride 10 MG/ML Topical Foam [Epifoam]
Lessons learned
• discovery of data quality issues, such as missing
results and data irregularities
• maintaining classes easier than maintaining
queries and sets
• leveraging other people's work reduces your own
• transparency of data discovery/query refinement
process
• inherent documentation in ontologies (as
opposed to information in Faye's head)
Thank you for your attention.

Questions?

Master your semester with Scribd & The New York Times

Special offer for students: Only $4.99/month.

Master your semester with Scribd & The New York Times

Cancel anytime.