XML4PharmaWiki

I have helped quite a few organizations (sponsors, CROs, technology vendors) with implementation of the CDISC standards. Some of these have been able to obtain CDISC ODM certification thanks to this.

Others however, did not consult a specialist, and moreover, did not read the ODM specification very well I am afraid, leading to non-compliant ODM files generated by their systems. Some implementers think that when their generated ODM files validate against the XML-Schema, they are safe. This is however not the case: one still need to also read and implement the specification document itself. Such a validation can easily be done using the CDISC ODM Checker, which is freely available to CDISC members.

In my career as a CDISC consultant, I have seen very many ODM files, and unfortunately, encountered many “bad practices”. This contribution lists a number of them.

Copying the value of the “OID” to the “Name” attribute

Although the ODM specification does not says exactly what the “Name” attribute (which is present on StudyEventDef, FormDef, etc.) should contain, it may be obvious that one should give it a meaningfull value. Unfortunately, some EDC vendors just copy the value of the OID into the Name attribute. For example:

  <ItemDef OID="IT.XYZ" Name="IT.XYZ" ...>...</ItemDef>

Although not a violation of the standard, I consider this bad practice. The reason that the ODM team added the “Name” attribute (and even made it mandatory) was that it should contain a short name or description to the Item, ItemGroup, Form, etc., so that someone who is inspecting the ODM file (using a viewer or not) can quickly understand what the Form, Item, Codelist … is about.

For example (good practice):

 <ItemDef OID="IT.XYZ" Name="Systolic blood pressure" ...>...</ItemDef>

The worst example I have seen was something like this:

 <ItemDef OID="IT.XYZ" Name="IT.XYZ" Comment="Systolic blood pressure" ...>...</ItemDef>

i.e. the “Comment” attribute has been abused to add what should go into the “Name” attribute, this though the specifications states “The SDSVarName, Origin, and Comment attributes carry submission information as described in the CDISC SDTM”. Of course, it might also have been meant that the SDTM variable should get that value for “Comment”, but this again is a misinterpretation of the standard, as the “Comment” attribute in define.xml is meant to add other information at submission time.

Remark that the question itself, as should appear on the CRF, should go into the “Question” element, which fully supports internationalization. For example:

  <Question>
      <TranslatedText xml:lang="en">Systolic blood pressure</TranslatedText>
      <TranslatedText xml:lang="fr">tension artérielle systolique</TranslatedText>
      <TranslatedText xml:lang="de">Systolischer Blutdruck</TranslatedText>
  </Question>

As of ODM 1.3, more elaborate descriptions of the Item, Form, Codelist etc., can be added using the “Description” element, which is also fully internationalized.

Not using reusability

What to think about the following snippet:

  <StudyEventData StudyEventOID="enroll"> 
      <FormData FormOID="frmABCDEFGH"> 
          <ItemGroupData ItemGroupOID="frmABCDEFGH.sctSysEnr"> 
              <ItemData ItemOID="frmABCDEFGH.sctSysEnr.itmSubjID" Value="1234" />

which I encountered in an ODM export from a major EDC vendor.

It uses “composite” OIDs for forms, itemgroups and items, i.e. the OID of the Item is composed of the OID from the form + the OID of the itemgroup + its own identifier.

It is clear that such items are not reusable, i.e. if the field “Subject ID” is used in another form, a new Item will need to be defined. So if there are 40 forms, there will be 40 ItemDefs, each defining the metadata for “Subject ID”. One of the major strengths of the ODM however is exactly the reusability of forms, itemgroups, items, codelists etc., through the “def-ref-data” mechanism, i.e. an item is defined once, and can be used repeatedly in different (sub)forms, just by referencing to it:

  <ItemGroupDef OID="IG.HEADER" Name="form header" ...>
      <ItemRef ItemOID="IT.SUBJID" Mandatory="Yes"/>
      ...
  </ItemGroupDef>

  <ItemDef OID="IT.SUBJID" Name="Subject ID" ...>...</ItemDef>

The subform “form header” can now be used in any form, as can the item “Subject ID”. Both have been defined once, but can be reused many times.

This is not possible when using the “composite” OIDs.

Invalid values in codelists

Another frequently encountered error (this time even being a violation of the standard) is shown in the following example:

  <CodeList OID="CL.XYZ" Name="XYZ test score" DataType="integer">
      <CodeListItem CodedValue="1">...</CodeListItem>
      <CodeListItem CodedValue="2">...</CodeListItem>
      <CodeListItem CodedValue="3">...</CodeListItem>
      <CodeListItem CodedValue="">...</CodeListItem>
  </CodeList>

What is meant here is that giving no answer to the question (i.e. leaving the field blank in the CRF) is a valid option. However the codelist has been defined as being of data type “integer” and the value ”” (blank) is surely not a valid integer! So the ODM Validator will give the error message ”'' is not a valid value of type integer”. The specification states: “The DataType restricts the values that can appear in the CodeList whether internal or external”

So what is the good practice here?

Simply, my making the Item non-mandatory (i.e. optional) when referencing it. For example:

  <ItemGroupDef OID="IG.XYZ" Name="my tests" ...>
      <ItemRef ItemOID="IT.XYZ" Mandatory="No" />
  </ItemGroupDef>

  <ItemDef OID="IT.XYZ" Name="my first test" DataType="integer">
      <Question>...</Question>
      <CodeListRef CodeListOID="CL.XYZ"/>
  </ItemDef>

Note the Mandatory=“No” in the ItemRef element. It declares that it is allowed that the question remains unanswered. The implementation in the eCRF can remain the same, i.e. one of the options in a dropdown is the empty value. However, that should be deduced from the 'Mandatory=“No”' and NOT from an empty value for a codelist item.

Also remark that the value of the “DataType” attribute of “CodeList” and its referencing “ItemDef” must correspond.

Uniqueness of OIDs

The ODM specification lists a number of rules for the uniqueness of the OIDs. Some implementors have misunderstood these rules. For example, I once encountered the following snippet:

  <ItemDef OID="RACE" Name="Race of the subject" Datatype="integer">
      <Question>...</Question>
      <CodeListRef CodeListOID="RACE" />
  </ItemDef>

  <CodeList OID="RACE" Name="Race" DataType="integer">
  ...
  </CodeList>

What we see is that the ItemDef and the CodeList both obtained the same OID. The specification says: “The OID for a MeasurementUnit, MetaDataVersion, StudyEventDef, FormDef, ItemGroupDef, ItemDef, CodeList, Presentation, ConditionDef, and MethodDef must be unique within a single study”. Some implementors have (wrongly) interpreted this as that each ItemDef must have its unique OID within the set of ItemDefs, and each CodeList must have its unique OID within the set of CodeLists. This is however not what was meant by the ODM developers. Although some systems may not have trouble with duplicate values for OIDs in different contexts, other may have, making this kind of constructs non-portable. Giving the fact that the standard is meant to enable portability, such constructs should be avoided.

Many vendors use “smart OIDs”, i.e. from the value of the OID alone, one can already deduce to what kind of element it belongs (good practice!). For example:

FM.DEMOG, IG.DEMOG, IT.RACE, CL.RACE

or: FORM_DEMOG, IG_DEMOG, IT_RACE, CL_RACE

with “IG” standing for “ItemGroup”, “IT” for “Item” and “CL” for “CodeList”. Of course other “abbreviations” may of course also be used.

In an export of EDC systems that were not build on the ODM standard, it is not always possible to assign OIDs that immediately give an idea what the item, form, etc. is about, as e.g. each item is just a row in a database table identified by a row number. Even then one can generate “semi-smart OIDs”, like:

IT.001 IT.002 IT.003 … CL.001 CL.002 etc..

At least it is than clear that IT.001 is the OID of an “ItemDef”

* Missing values in datasets

The ODM specification clearly states: “one should not use ItemData elements with IsNull set to “Yes” to indicate uncollected data. The better practice is to transmit only collected data”.

Suppose a subform consisting of three questions. The OIDs of the Items are “itm_001”, “itm_002” and “itm_003”. For a specific subject, only the two first questions are answered, the third is not (the field is left blank). In that case the following is bad practice:

  <ItemGroupData ItemGroupOID="igr_001">
      <ItemData ItemOID="it_001" Value="100"/>
      <ItemData ItemOID="it_002" Value="headache"/>
      <ItemData ItemOID="it_003" Value=""/>
  </ItemGroupData>

According to the specification, the good practice is to only transmit collected data. “it_003” was not collected, so the good practice is just to omit the corresponding ItemData element in the export:

  <ItemGroupData ItemGroupOID="igr_001">
      <ItemData ItemOID="it_001" Value="100"/>
      <ItemData ItemOID="it_002" Value="headache"/>
  </ItemGroupData>

The spec also tells something about the “IsNull” attribute. It is mutually exclusive to the “Value” attribute, so only one of both is allowed (this rule will be implemented in the Schematron that is currently being developed). The “IsNull” attribute is NOT meant to indicate uncollected data. It is meant to specifically indicate that a value must be set to “NULL” (whatever that means) in the database. As such, my opinion is that it is especially useful for updating data points. For example, a datapoint was collected:

  <ItemData ItemOID="itemSysBp" Value="1000"/>

Obviously, this is an incorrect systolic blood pressure. So later, it is decided to set the value to NULL in the database, as it is not known what the really measured value was. One can then use (but this is subject to agreement between the parties):

  <ItemData ItemOID="itemSysBp" TransactionType="Update" IsNull="Yes">
      <AuditRecord>
          <UserRef .../>
          <LocationRef .../>
          <DateTimeStamp>...</DateTimeStamp>
          <ReasonForChange>The originally entered value was an impossible value, but the real measured value is not known, so it was decided to set the value to NULL</ReasonForChange>
      </AuditRecord>
  </ItemData>

Remark that using:

  <ItemData ItemOID="itemSysBp" TransactionType="Remove">...</ItemData>

would have meant “remove the data point from the database, as it was not collected”.

Missing values - SAS

Another error I have seen, fortunately only a few times, comes from users that have generated ODM files from their SAS database. I do not know whether this error is a result of the SAS procedures for export to XML (I doubt it), or just from people who developed software or procedures assuming that also in ODM a dot ”.” means a “missing value”.

An example I found is:

  <ItemData ItemOID="IT.SYSBP" Value="."/>

Missing values are in SAS indeed often displayed as a dot (the binary value being hexadecimal '0x2e'). That is however not the case in ODM at all: “the better practice is to transmit only collected data”.

Why is ”.” not a valid indicator for a missing value in ODM? The simple reason is that ODM is based on XML, and XML is more and more going into using native XML-datatypes for values. This is already implemented in ODM 1.3, which allows to use typed ItemData:

If the ”.” is given as a value as above, and the file is validated against the XML-Schema, the Schema processor will immediately protest that ”'.' is not a valid value against its datatype 'integer'”.

XML4PharmaWiki

User Tools

Site Tools

Page Tools