NESUG 2010 Pharmaceutical Applications

Outsourced Data Integration Project

with CDISC SDTM & ADaM Deliverables

Christine Teng, Merck Research Labs, Merck Sharp & Dohme Corp., Rahway, NJ

Margaret Coughlin, Merck Research Labs, Merck Sharp & Dohme Corp., Rahway, NJ


The Clinical Data Interchange Standards Consortium (CDISC) has established platform-independent data standards that enable information systems interoperability to improve clinical research and related areas of healthcare. Many pharmaceutical companies have started implementing CDISC clinical trial data models such as the Study Data Tabulation Model (SDTM) and the Analysis Data Model (ADaM). The purpose of this paper is to discuss some experience gained from working with a CRO on a data integration project that converts several studies into a common SDTM structure in support of eCTD and Integrated Summary of Safety analyses. As this is a relatively early project working with a CRO on SDTM integration, processes of working with CROs in this area are still evolving.

This paper briefly describes important components that are recommended to be provided to a CRO in order to facilitate the process of mapping data to the CDISC data models. The paper will also discuss review activities to help verify that CRO-converted SDTM datasets comply with CDISC standards.

SAS®9, Windows®, Intermediate Level Key Words: CDISC, SDTM, ADaM


The data integration project was comprised of several studies that needed integrated safety analysis for a filing.

These studies were not done internally so eCRFs were not designed uniformly nor was it likely that the ultimate goal of converting the data to SDTM was ever considered. The availability of standard submission data in SDTM format provides a solution for the purpose of integration that also benefits regulatory reviewers, as the FDA has put in considerable effort to develop a repository for all submitted trial data and a suite of standard review tools to access, manipulate, and view the tabulations.


Accurate mapping of study data to SDTM format is critical for a successful SDTM conversion and integration process.

In order to facilitate a smooth mapping process, it is helpful that some basic expectations are established with the CRO. The expectations provided by the sponsor should provide a clear picture of what a successful SDTM conversion and integration will look like and assist the partner in working toward that goal.

Each company has its own standards andSOPs. It is helpful to provide specific basic expectations to ensure that the deliverables (output datasets, tables/listings/figures, and programs) can be reproduced internally if the intention is to reuse the programs developed by the CRO at a later time. Situations to consider when deciding if internal reproducibility is necessary include whether the sponsor is responsible for responses to agency questions, publications, or other external presentations. For example, to minimize modifications to the CRO-supplied programs, it is helpful that no hard coded library or file path is defined inside the programs. Also, configurations should be done in one autoexec-like program so that changes can be done in one location instead of in many programs. This is important if there is a need to reuse the CRO-supplied programs to verify the deliverables.

The high level process map to create the deliverables is depicted as follows:

• Beginning in the upper right-hand box, the original annotated CRF referenced to the legacy data are converted to reference SDTM domains and variables. The SDTM annotated CRFs are created for this purpose.

• Next, the SDTM mapping specifications, in xls format, are created based on the SDTM annotated CRFs.

• The SDTM metadata specifications are define.xls-like documents that are used by sponsor internally.

• The conversion programs, based on information created above and the latest MedDRA Encoding, are used to build the SDTM datasets from the legacy data.

• For submission, the define.xml is generated by the validation tool from the SDTM datasets.

• ADaM specifications are built from the SDTM datasets for integrated analysis purpose.

• ADaM datasets are created using ADaM specifications.

Further definitions of the items presented here are found in the next section.


Both the sponsor and the CRO need to work closely and collaboratively to have a successful filing. Adequate information should be provided to the CRO as early as possible to allow time to clarify questions. If data integration is the focus, then the standardization of common items should be noted up front such as Treatment Names, Treatment Codes, Visit Names, Visit Numbers, MedDRA Encoding Version, and coding for Disposition Status. Using the SDTM structure has helped standardize the data structure. The table below further describes some materials that are helpful for the CRO to implement the SDTM conversion.

The above items are some materials the sponsor and CRO found helpful to share, when available, in order to facilitate SDTM conversions of data in non-SDTM formats.

Informal reviews of conversion were done during development to ensure the CRO did not deviate from the requirements and standards. Certain verifications are recommended to be performed for SDTM deliverables. The high level review process is done in two steps. Please note that the review process described below is for illustration and is only done for this specific project since company standards were evolving at the time. Only some of the key review points are described below.

1) Mapping verification A SAS program is used to verify that all CRF raw datasets and all collected variables were mapped. An Excel workbook is created by loading the mapping specifications into the SAS program to facilitate the review process by matching variables in the specification with the real datasets (See Table 1 below).

The SDTM annotated CRF is also reviewed by the sponsor's internal standards team to confirm agreement of mapping domains and SUPPQUAL variables of CDISC standards and practices. For example, verifications include whether topic variables are mapped, correct class of domain is used and whether each suppqual variable has a parent record etc.

2) SDTM conversion verification After conversion is implemented based on the specification, the SDTM datasets are reviewed. In addition to verifying the legacy data mapping to SDTM for a random sample of subjects base on pre-specified test cases, another diagnostic program is written to display certain key variables for overall verification of mapping content.

Table 2 to 4 below are the generated output (data are mock-up).

The Clinical Data Interchange Standards Consortium (CDISC) Analysis Data Model (ADaM) is an emerging new industry standard for submitting analysis datasets to regulatory agencies, such as the U.S. Food and Drug Administration (FDA). While it provides several solutions for construction of analysis datasets, it is also subject to interpretation by users during implementation. The details of ADaM analysis variable naming conventions and usages are documented in the implementation guide and will not be repeated here.

ADaM IG presents metadata for two standard structures, as follows:

• ADSL – Subject Level analysis dataset, one record per subject

• ADXXX – Multiple-level-per-subject basic data structure Metadata for the standard ADaM variables are presented in Section 3 of ADaM IG and the ADaM basic data structure and variables are discussed and illustrated in Section 4. The ADaM datasets displayed below are built from SDTM domains and naming convention followed the implementation guide.

Below, three ADaM datasets are presented for illustration (See Tables 5a/5b – 7a/7b below). Due to space, only part of mapping specifications/ variables are shown associated with the datasets.

(b) ADLB - ADaM datasets should provide variables and metadata to fulfill below criteria:

• Identify observations that exist in the submitted study tabulation data (e.g. SDTM).

• Identify observations that are derived within the ADaM analysis dataset.

• Identify the method used to create derived observations.

• Identify observations used for analyses, in contrast to observations that are not used for analyses yet are included to support traceability or future analysis.

(c) Time to event analysis of safety data - one record per subject per event of interest Since most analysis datasets are derived from SDTM datasets, it is expected that there are some level of traceability between the SDTM dataset(s) and analysis dataset(s). In general, the CDISC ADaM standards recommend including as much supporting data in the traceability records and variables as possible, except in instances where it is not practical to do so such as eDiary data.

In order to ensure efficiency in running analysis programs, in addition to determining the right number of variables to keep, the length of each variable should be optimized since the integrated datasets are usually large. Cutting down on length will speed up running the jobs during development/debugging as well as supporting post-production agency requests. Please note that there are several ways to improve efficiency. However, since the CRO was responsible for the programming activities, this checking is one that the partner can easily identify and provide suggestions about.

NESUG 2010 Pharmaceutical Applications Table 10 lists the ADaM datasets and their associated variable length. 'Max_Len' contains the actual maximum length of all values of a specific variable. 'Defined_len' has the assigned length of each variable. One can see that some variables length of 200 can really be shortened.

In addition to the expectations, documentation, and requirements, it is important to communicate progress and concerns periodically to facilitate work and perform a timeline checkup. An issue log for each study was found to be helpful to document issues identified during the review process; this tool helps communicate the issues in detail to assist both sponsor and CRO in documenting their understanding. Recurring meetings help in addressing issues from different functional areas and assist in keeping the deliverables on track.


In summary, a successful filing package is a collaborative effort between the CRO and sponsor. The sponsor should provide clear expectations and all necessary information to help the CRO perform the SDTM mapping and set up of analysis datasets for analysis. If the data is expected to be loaded into the sponsor's internal database, certain guidelines and standards also need to be considered. It's expected that the CRO will follow their internal SOPs, understand the sponsor's expectation and raise any questions they may have to keep their activities on target.

CDISC standardization helps achieve an overall structural consistency when data from multiple vendors with different data structures need to be pooled. Since CDISC standards are still evolving, these changes may impact internal policies and procedures developed to comply with the new standards by requiring adaptations. In a similar manner, the sponsor needs to establish clear roles and responsibilities with a CRO and be responsive to their needs for guidance and flexibility.

NESUG 2010 Pharmaceutical Applications

Version 3.1.

1 (V3.1.1) of the CDISC Study Data Tabulation Model Implementation Guide http://www.cdisc.org/models/sdtm/v1.1/index.html ADaM Implementation Guide, Version 1.0 (ADaMIG v1.0) Draft http://www.cdisc.org/models/adam/V2.1_Draft/index.html SDTM Validation – how can we do it right?

