Process automation for efficient translational research on Endometrioid Ovarian Carcinoma

Monika Ahuja,

1 MCA ,

Thomas Bair,

2 PhD ,

Michael J Goodheart,

3 MD ,

Boyd Knosp,

1 MS

for Clinical & Translational Science, 2Iowa Institute for Human Genetics, 3Department of Obstetrics & Gynecology University of Iowa Hospitals & Clinics, Iowa City, IA, United States, 52242
Methods Results

Abstract This process enables an automated integrated platform to combine study data with patient EMR data from EPIC, and facilitates ongoing, accurate phenotypic information extraction as well as the sharing of de-identified data for processing and genetic analysis. This process reduces the manual effort necessary to integrate data and the amount of manual data entry errors that can corrupt an entire dataset and produce misleading results.

• Integration of EMR data consisting of histology, diagnosis, demographics, follow up visit details, survival time, and percentages.
Cancer Registry Data Extracted in file

In order to enable informatics collaborations, the Institute for Clinical and Translational Science (ICTS) at the University of Iowa is developing an automated environment that enables the sharing of phenotypic information retrieved from the Research Data Warehouse (RDW) and integrated with study specific data, BioBank, and domain specific knowledge.

On the surface this process seemed simple. After analyzing the clinical data we found that
Manually created file Manually DeIdentified file


Data Extracted in file

staging data is being stored only in clinical notes and therefore could not be extracted accurately. To extract the staging and grading data accurately, we are now working with the our EMR customization team to enable the required functionality to capture the staging and grading data discretely. Significant patient information is stored in the University of Iowa Cancer Registry; however due to recent technical migration in Registry data, we haven’t had any success in automating data extraction from Cancer Registry and therefore the process depends on receiving extracted data file from the Cancer Center.

Use Case: mRNA analysis for Endometrioid Ovarian Carcinoma
The project goal is to integrate mRNA expression profiling data with study specific data, EMR, and bio-bank data. This project required the sharing of only de-identified data with bioinformatics team members to study the molecular profile of serous and endometrioid ovarian carcinomas. In order to enable an ongoing collaboration between the clinical experts, the BioBank team, and the bio-informatics team, we are designing an automated Extract, Transform and Load (ETL) process. This system integrates the relevant specimen data, custom prognosis data received from clinical experts, with a patient’s EMR system (EPIC) including staging, histology, and survival data. To facilitate the sharing of integrated data with team members, the processed, de-identified data is ftp'd to a secured location for statistical processing. Team members compare the histologically benign ovaries and 24 cancer samples from patients diagnosed with endometrioid ovarian cancer. Further comparisons are made using TCGA datasets to obtain gene expression profiles on samples from serous carcinoma patients (n=54).

Manual Chart extraction

FIGURE 1: Processing in the past
Manually curated study data file where multiple teams manually gathered, exchanged and recorded data.

Issues in manual data extraction, integration and de-identification:
• Dependencies on multiple teams to extract data. • Time and efforts involved in the chart extraction and integrating the dataset. • Errors and involved rework for data management and processing. • Managing multiple versions of the dataset.

Current Processes
• Extracting clinical notes and sharing those with research team to minimize time spent searching for required notes in the EMR system. • Tracking the staging and grading data for participants by the researchers.

Future Enhancements
Cancer Registry Data Extracted in file

1. Building customizations in the EMR system to capture cancer staging and grading data. 2. Building a direct interface with University of Iowa Cancer registry 3. Capturing the currently un-integrated data elements like staging, grading information
ETL Integrated / De-Identified file


This project allows a research group to perform mRNA analysis and attach domain specific terms not supported by the clinical system such as non-standard classifications, sample preparation variables and other research parameters, while maintaining the useful ongoing data relationship to the clinical data. This automation reduces the time manual effort to integrate data, and the chance of producing errors caused by manual data entry that reduces the potential to corrupt the statistical analysis with incorrect categorical variables. This automated system allows near real-time extraction of data from a clinical database to be de-identified, allowing an accurate comparison of clinically derived samples with existing samples from the TCGA repository. This study is expected to extend to include the observation of clinically relevant samples from endometrioid adenocarcinomas of uterine origin that may molecularly resemble ovarian endometrioid adenocarcinomas more closely than the serous ovarian histology initially studied. Age and stage matched uterine tumors will be examined in the same fashion and compared to those from GEO (Gene Expression Omnibus). This project serves as a prototype for near real-time dynamic, robust, secure data integration and will serve as a template for similar collaborations.


Data Extracted in file

from EMR system.

Study Consented Patient data

Data Warehouse

Clinical data extraction has a lot of dependency on the clinical notes. In order to extract and analyze data, EMR systems need to be customized to ensure that enough discrete data elements exist to map the clinical notes essence.

FIGURE 2: Current Processing
Replacing manual data entry with complex Extraction Transform and Load (ETL) processing

• Automated merged and de-identified data from multiple sources into a single file. • Reducing errors, and time for manual chart extraction, data correction, and discussions with the participant. • Automated data refresh.

• • • •

Special thanks to Todd Burstain, Stuart Wood, Christopher Richard for their contributions in clinical data extraction and SSIS knowledge sharing. Department of Obstetrics & Gynecology at the University of Iowa Hospitals & Clinics Iowa Institute for Human Genetics National Center for Advancing Translational Sciences and the National Institutes of Health (NIH) through Grant 2 UL1 TR000442-06

Sign up to vote on this title
UsefulNot useful