You are on page 1of 38

AZ CDISC Implementation

A brief history of CDISC implementation Stephen Harrison

Overview

Background CDISC Implementation Strategy First steps Business as usual ADaM or RDB? Lessons learned Summary

Background

Seven R&D sites all operating in their own environments Creating and maintaining similar tools across the R&D sites Continuous duplication of effort across regions

A&RT Initiative

Project initiation April 2003 Objective: Harmonise the A&R process and environment across ALL R&D sites within AZ Multiple workstreams looking at technology, process and standards Reporting Database (RDB) w/stream
Deliver standardized reusable code or macros to automate production of analysis and report ready datasets

Data Flow Process

CRFs

Module Package RAW Data

Analysis Datasets/ RDB

CSR/ HLD Output

Previous data flow process was a simple route from existing CRFs to Clinical Study Reports/Higher Level Document outputs Reporting Database is created directly from the Module Package Remit of project was to use existing internal data standards Opportunity to implement CDISC standards
5

CDISC Implementation Strategy

CRFs

Module Package RAW Data

Analysis Datasets/ RDB

CSR/ HLD Output

SDTM

RDB completely described in terms of SDTM source good for reviewer No need to construct SDTM at the end of the process

CDISC Implementation Strategy

CRFs

Module Package RAW Data

Analysis Datasets/ RDB

CSR/ HLD Output

SDTM

RDB completely described in terms of SDTM source good for reviewer No need to construct SDTM at the end of the process Linear process fulfils the requirement of traceability

Longer term strategy

CRFs

Module Package RAW Data

Analysis Datasets/ RDB

CSR/ HLD Output

New CRFs/ CDASH

SDTM

Modified CRFs Underlying RAW data standards are SDTM friendly Transformation process is simplified CDASH - Clinical Data Acquisition Standards Harmonization
8

Longer term strategy

CRFs

Module Package RAW Data

Analysis Datasets/ RDB

CSR/ HLD Output

New CRFs/ CDASH

SDTM

ADaM

ADaM Adopt ADaM model, replacing internal data standards Utilise industry standard transformation and derivation processes

First steps

Global team set up August 2005 to specify AZ business rules Application of SDTM Implementation Guide v3.1.1 from an AZ point of view Two team members also part of CDISC SDS team
Inside track to SDTM

Scope - all corporate and TA standard modules (>200) Mapping exercise took nearly 18 months to complete!
10

Manual mapping document

11

Business as usual

Web Interface developed Metadata driven process RAW to SDTM and SDTM to RDB mapping function Inherit Corporate data standards and maps down to project or study level Metadata used by code builder to create executable code

12

RAW Data Metadata

Windows
PMPL

CSV file Variables

A&RT Web Interface A&RT Application Database


Datasets Variables

Project

Dataset

Import

Variables Study
Data Standards

Dataset

13

Standards and Reuse of Code

Corporate

Data standards Mappings

Locked dataset definitions Locked Corporate map

Therapy Area

Data standards Mappings

Locked dataset definitions Locked TA map

Project

Data standards Mappings

Locked dataset definitions Locked Project map

Study

Data standards Mappings

14

Inheritance SDTM
RAW Metadata Corporate
DEM AELOG HISM

SDTM Metadata
mapping
DM

Corporate
AE MH

TA (Respiratory)
PULM RESPHIS

PF

CF

Project
DM DEM AELOG PULM RESPHIS

Project
PF AE MH CF

Study 1
DEM AELOG PULM 15

Study 2
DEM AELOG RESP HIS DM

Study 1
PF AE DM

Study 2
AE CF MH

Example RAW SDTM map

16

Define Simple Mapping

17

Define Macro Mapping

18

Transposition Groups

19

A&RT Mapping Process

Create Mapping Metadata RAW SDTM Import RAW Data Metadata

Create Mapping Metadata SDTM RDB


Web Interface (Oracle) UNIX (SAS) SAS code Execute job

A&RT Application Database

Program Builder
SAS code

Execute job Load RAW Data


20

RAW Database

SDTM Database

Reporting Database

ADaM or RDB?

Well established reporting requirements AZ Reporting Database standards defined and in use before CDISC considered Perception that ADaM model still quite unstable and subject to significant change Unlike SDTM, no regulatory pressure to implement ADaM

21

Reporting Database

Study Database (RAW Data)


WBDC LAB

Mapping to SDTM

Reporting Database Superset


Derived Variables Unaltered Source Data in SDTM format Supplemental Qualifiers

New Dataset
Key ID Variables

CRO RAW Module Package Datasets AMOS

SDTM Data Domains

Derived Variables

Supplemental Qualifiers Etc GRand

Derived Observations

R_AE R_DM R_VS Etc.

RD_xx RH_xx Etc.

22

Reporting Datasets (R_)

Datasets must remain fundamentally unchanged from the SDTM source data. An R_ dataset is a superset of the SDTM dataset SDTM RDB
SuppVS

Variables

VS

VS

R_VS (Superset)

Observations SuppVS

Original SDTM dataset name retained, but prefixed with R_ All information from SUPP-- datasets re-attached to parent RDB dataset

23

RDB General Conventions

All reporting must take place directly from Reporting Database defined at study level All variables used for reporting must be created in relevant reporting dataset Subject datasets must have at least 1 observation per randomized subject All SDTM data must be present in Reporting Database Original SDTM data cannot be amended, but new variables or observations can be created as needed (e.g., imputing dates) All naming conventions defined by SDTM must be followed when generating additional variables
24

RDB Common Dataset Features

Datasets taken from source database name prefixed with R_ (e.g., DM becomes R_DM) New derived datasets name prefixed with RD_ (e.g., RD_SUBJ) Transposed datasets name prefixed with RH_ (e.g., R_LB becomes RH_LB) Datasets must contain Key variables to uniquely identify every observation Duplication of variables across multiple datasets should be avoided (except for Key and Cross variables) Duplication of source (SDTM) variables should be avoided Variables defined at a higher level must not have attributes changed, except in the following circumstances: Length may be increased Algorithm may be project-specific
25

RDB Use of Codes and Decodes

Historically, codes and decodes used widely Associated using SAS formats Loses all meaning outside of SAS SDTM does not use codes and decodes Variables defined using explicit text values to describe observations Clear, unambiguous and interpretable irrespective of the tools or software used RDB based on SDTM Codes and decodes not used in final reporting datasets

26

Transposed Datasets

RAW datasets may be transposed to contain re-structured RAW data (e.g., RH_dataset = horizontal structure, RV_dataset = vertical) Normally only considered for Findings domains Original dataset must still exist as R_dataset May make reporting easier (e.g., lab parameters reported as columns)

27

Transposed Datasets

Carefully consider whether transposed data is essential and/or appropriate Duplicates data Variable names driven by --TESTCD can be meaningless, e.g.,:
Unique subject Identifier USUBJID Visit name VISIT Alanine Aminotranferase (ukat/L) L01101 Albumin (g/L) L01118 Alkaline Aspartate Phosphotase Aminotranferase (ukat/L) (ukat/L) L01104 L01102

Significant loss of information e.g., original results, units, reference ranges, analysis flags, etc. Contravenes CDISC SDTM convention to store units as a separate variable qualifier to the test result
28

Example SDTM to RDB map

29

Lessons learned
Mapping takes a lot of effort!
Ambiguity in guidance Individual opinions and interpretations

Get your conventions right


Often had to revisit decisions as experience grew

Big differences between CRF and SDTM standards:


Purpose: data collection vs. data storage Coding: codes vs. text (e.g., 1, 2, 3 vs. mild, moderate, severe) Structure: horizontal vs. vertical

SDTM IG v3.1.2 a big improvement


Introduction of Clinical Findings (CF) domain really helped with many difficult mappings
30

Changes for SDTM IG v3.1.2 CF

General Observation Classes

Special Purpose Datasets

Interventions

Events

Findings

Demographics

Clinical Findings (CF) Domain


Findings about Events or Interventions that dont fit in SDTM domain variables for those classes CFOBJ (Object of Measurement): Event or Intervention that is the subject of the test evaluation Mandatory, but wont necessarily have a parent record in another domain

Comments

Related Records Supplemental Qualifiers

Trial Design
31

Changes for SDTM IG v3.1.2 CF


MHCAT

MHSTDTC

MHTERM
32

MHOCCUR MHPRESP

Changes for SDTM IG v3.1.2 CF


CFCAT CFTESTCD = OCCUR CFORRES = answer provided in checkbox

CFOBJ
33

Changes for SDTM IG v3.1.2 CF


CFCAT

CFORRES

CFOBJ CFTEST
34

Changes for SDTM IG v3.1.2 CF

Example
Row 1 2 3 4 USUBJID D06-608-123 D06-608-123 D06-608-123 D06-608-123 CFSEQ 1 2 3 4 CFOBJ HYPERTENSION MYOCARDIAL INFARCTION MYOCARDIAL INFARCTION MYOCARDIAL INFARCTION CFTEST OCCURRENCE OCCURRENCE DATE OF MOST RECENT MI NUMBER OF MI CFTESTCD OCCUR OCCUR MY_LDAT MYNO CFDTC 2006-08-28 2006-08-28 2006-06-20 2006-08-28

(continued)
Row 1 2 3 4 USUBJID D06-608-123 D06-608-123 D06-608-123 D06-608-123 VISITNUM 1 1 1 1 CFORRES CURRENT PAST 2006-06-20 2 CFSTRESC CURRENT PAST 2006-06-20 2 CFCAT SPECIFIC CV MEDICAL AND SURGICAL HISTORY SPECIFIC CV MEDICAL AND SURGICAL HISTORY SPECIFIC CV MEDICAL AND SURGICAL HISTORY SPECIFIC CV MEDICAL AND SURGICAL HISTORY

35

Summary
CDISC Implementation is a huge task AZ strategy allows for step-wise implementation
CDASH ADaM

Mapping tool really assists process


Easy inheritance Reuse of standards and code

SDTM IG v3.1.2 big improvement

36

Questions and Answers

37

38

Thank You