ODM CodeLists Good Practice

Jozef Aerts 2009/11/21 10:53

Author: Jozef Aerts, XML4Pharma

Applicable to: ODM version 1.3, 1.2, 1.1

CodeLists and ODM

CodeLists are used in ODM to limit the number (“enumerate”) of answers to a question that appears on an (e)CRF or within an ePRO page (other uses not excluded). They also allow to “localize” the text of the possible answers for different languages and cultures.

In ODM, CodeLists can only be referenced from “ItemDef” elements. Essentially, an “ItemDef” corresponds to a question on a form or a data point measured by an instrument. For example:

  <ItemDef OID="IT.007" Name="feeling today" DataType="integer">
      <Question>
          <TranslatedText xml:lang="en">How do you feel today?</TranslatedText>
          <TranslatedText xml:lang="de">Wie fühlen Sie sich heute</TranslatedText>
          <TranslatedText xml:lang="fr">Comment sentez-vous aujourd'hui?</TranslatedText>
      </Question>
      <CodeListRef CodeListOID="CL.019"/>
  </ItemDef>

The item (data point) here is about how the subject feels today. The question text is given for three languages: English, German and French. For more information about internationalization in ODM, please see our previous contribution CDISC ODM in multi-language studies. A reference to a CodeList with OID “CL.019” is made, limiting (“enumerating”) the possible answers. The definition of the codelist is e.g.:

  <CodeList OID="CL.019" Name="CodeList for feeling today" DataType="integer">
      <CodeListItem CodedValue="1">
          <Decode>
              <TranslatedText xml:lang="en">I feel good</TranslatedText>
              <TranslatedText xml:lang="de">Ich fühle mich gut</TranslatedText>
              <TranslatedText xml:lang="fr">Je me sens bien</TranslatedText>
          </Decode>
      </CodeListItem>
      <CodeListItem CodedValue="2">
          <Decode>
              <TranslatedText xml:lang="en">I feel pretty well</TranslatedText>
              <TranslatedText xml:lang="de">Ich fühle mich ziemlich gut</TranslatedText>
              <TranslatedText xml:lang="fr">Je me sens assez bien</TranslatedText>
          </Decode>
      </CodeListItem>
      <CodeListItem CodedValue="3">
          <Decode>
              <TranslatedText xml:lang="en">I don't feel so well</TranslatedText>
              <TranslatedText xml:lang="de">Ich fühle mich nicht so gut</TranslatedText>
              <TranslatedText xml:lang="fr">Je me ne sens pas tres bien</TranslatedText>
          </Decode>
      </CodeListItem>
      <CodeListItem CodedValue="4">
          <Decode>
              <TranslatedText xml:lang="en">I feel pretty bad</TranslatedText>
              <TranslatedText xml:lang="de">Ich fühle mich ziemlich schlecht</TranslatedText>
              <TranslatedText xml:lang="fr">Je me ne sens assez mauvaise</TranslatedText>
          </Decode>
      </CodeListItem>
      <CodeListItem CodedValue="5">
          <Decode>
              <TranslatedText xml:lang="en">I feel terrible</TranslatedText>
              <TranslatedText xml:lang="de">Ich fühle mich sehr schlecht</TranslatedText>
              <TranslatedText xml:lang="fr">Je me ne sens terrible</TranslatedText>
          </Decode>
      </CodeListItem>
  </CodeList>

So there are only 5 possible answers to the question, and for each of them, the answer texts are given for the three languages that the study will be deployed in. The values that are stored in the clinical database can only be one of “1”, “2”, “3”, “4” and “5” - no other values are allowed. So, if one encounters a data point:

  <ItemData ItemOID="CL.007" Value="8"/>

then this is not only a violation of the standard, but is also totally unclear what this value represents.

In an eCRF or on an ePRO page, the user will usually never see these values, but only the possible answers in his/her own language (also see our contribution on internationalization).

DataTypes for CodeLists

Each CodeList element has a mandatory attribute “DataType”. The possible values are “integer”, “float”, “text” and “string”. The data type of “float” does not make much sense and is seldomly used (although some people like to use it to make subcategories of possible answers, e.g. “1.1”, “1.2”, “2.1” …). There are a number of rules for the value of the “DataType” that are often incorrectly implemented:

The first rule states that the value of the “DataType” attribute of the “ItemDef” that references the “CodeList must be identical to that of the “CodeList” itself. So the following example is a violation of the standard:

  <ItemDef OID="IT.007" Name="feeling today" DataType="integer">
      ...
      <CodeListRef CodeListOID="CL.019"/>
  </ItemDef>
  <CodeList OID="CL.019" Name="CodeList for feeling today" DataType="text">
      ...
  </CodeList>

The second rule is a rule that is very often violated, especially in an attempt to designate that the user did not (want to) answer the question:

  <CodeList OID="CL.019" Name="CodeList for feeling today" DataType="integer">
      <CodeListItem CodedValue="1">...</CodeListItem>
      <CodeListItem CodedValue="2">...</CodeListItem>
      <CodeListItem CodedValue="3">...</CodeListItem>
      <CodeListItem CodedValue="4">...</CodeListItem>
      <CodeListItem CodedValue="5">...</CodeListItem>
      <CodeListItem CodedValue="">
          <Decode>
              <TranslatedText xml:lang="en">No response to the question</TranlatedText>
          </Decode>
      </CodeListItem>
  </CodeList>

The last “CodeListItem” attempts to state that a “null” value must be added to the database in case the user did not answer the question. However, the empty string for “CodedValue” is NOT an acceptable value for the type “integer”. This is a violation of the standard.

There are two possible solutions (best practices) for this.

  <ItemGroupDef OID="IG.001" Name="Questions about feeling today" Repeating="No">
      <ItemRef ItemOID="IT.007" Mandatory="No"/>
  </ItemGroupDef>
  <ItemDef OID="IT.007" Name="feeling today" DataType="integer"> ...

The 'Mandatory=“No”' attribute means that it is allowed that the question is simply not answered (i.e. the data point not collected), and thus no (or a null) value may appear in the database. In the clinical part of the ODM, this means that the “ItemData” element for that specific data point will NOT appear, as the specification says: “The better practice is to transmit only collected data”. So, in the above case ('Mandatory=“No”'), and the subject did not answer the question (i.e. the data point was not collected), the following statements would be invalid:

  <ItemData ItemOID="IT.007" Value=""/>
  <ItemData ItemOID="IT.007"/>
  <CodeListItem CodedValue="6">
      <TranslatedText xml:lang="en">Not collected</TranslatedText>
  </CodeListItem>

It that case, it is (consequently) adviced to set “Mandatory” to “Yes” when referencing the ItemDef.

The third rule (“CodeListItems within a single CodeList must not have duplicate CodedValues”) is a pretty simple and logical one: one cannot have two identical values (as interpreted by the data type) for the coded value. So, the following examples are invalid, and do violate the standard:

  <CodeList OID="CL.019" Name="CodeList for feeling today" DataType="integer">
      <CodeListItem CodedValue="1">...</CodeListItem>
      <CodeListItem CodedValue="1">...</CodeListItem>
  </CodeList>
  <CodeList OID="CL.019" Name="CodeList for feeling today" DataType="float">
      <CodeListItem CodedValue="1">...</CodeListItem>
      <CodeListItem CodedValue="1.0">...</CodeListItem>
  </CodeList>

In the second case, the value “1” and “1.0” are identical as being floats

The following however is correct:

  <CodeList OID="CL.019" Name="CodeList for feeling today" DataType="text">
      <CodeListItem CodedValue="1">...</CodeListItem>
      <CodeListItem CodedValue="1.0">...</CodeListItem>
  </CodeList>

In this case, the data type is “text”, so “1” and “1.0” are different values.

CodeLists and the Length attribute

Another error (or at least misinterpretation of the specification) is demonstrated using the following snippet:

  <ItemDef OID="IT.RACE" Name="Race" DataType="integer" Length="42">
      ...
      <CodeListRef CodeListOID="CL.RACE"/>
  </ItemDef>
  <CodeList OID="CL.RACE" Name="Race codelist" DataType="integer">
      <CodeListItem CodedValue="1">
          <Decode><TranslatedText>Caucasian</TranslatedText></Decode>
      </CodeListItem>
      <CodeListItem CodedValue="2">
          <Decode><TranslatedText>Black or African American</TranslatedText></Decode>
      </CodeListItem>
      <CodeListItem CodedValue="3">
          <Decode><TranslatedText>Native Hawaiian and Other Pacific Islander</TranslatedText>
      </CodeListItem>
      ...
  </CodeList>

As one sees, a Length of 42 has been defined on the ItemDef, with the idea that the longest string (“Native Hawaiian …”) has 42 characters. This is however a false interpretation of the specification. The “Length” attribute denotes the length needed to store the coded value in the database. So in this case, a Length of “1” suffices as the coded values are all lower than 10.

EnumeratedItem

As of ODM 1.3, it is possible to make a simple enumeration list without different answer texts for different languages. This is useful only when the way the possible answer is displayed is culture- and language independent.

A typical example for this is the question “how many alcoholic drinks did you have yesterday”, with the possible answers “0”, ”⇐1”, “1-2”, ”>2”.

One can than (but there is no obligation) use the EnumeratedItem element as follows:

  <CodeList OID="CL.ALCOHOL" Name="Alcohol consumption" DataType="text">
      <EnumeratedItem CodedValue="0"/>
      <EnumeratedItem CodedValue="&lt;=1"/>
      <EnumeratedItem CodedValue="1-2"/>
      <EnumeratedItem CodedValue="&gt;2"/>
  </CodeList>

The same rules as above mentioned apply. There is however an extra rule saying that “CodeListItems and EnumeratedItems may not be mixed within a single codelist”. So the following is invalid:

  <CodeList OID="CL.ALCOHOL" Name="Alcohol consumption" DataType="text">
      <CodeListItem CodedValue="0">
          <Decode>
              <TranslatedText xml:lang="en">None</TranslatedText>
              <TranslatedText xml:lang="fr">Ne rien</TranslatedText>
              <TranslatedText xml:lang="de">Keine</TranslatedText>
          </Decode>
      </CodeListItem>
      <EnumeratedItem CodedValue="&lt;=1"/>
      <EnumeratedItem CodedValue="1-2"/>
      <EnumeratedItem CodedValue="&gt;2"/>
  </CodeList>

Also remark the use of the XML entities '&lt;' and '&gt;', in order that the XML parser is not confused thinking there is an “open XML element” or “close XML element” statement.

ExternalCodeList

In some cases, one may use published, publicly available codelists. For example, there is the ”Common Terminology Criteria for Adverse Events” codelist. In version 4, it contains nearly 800 terms for adverse events descriptions.

If one wants to use such a CodeList, the “ExternalCodeList” element should be used. For example:

  <CodeList OID="CL.CTCAE" Name="CTCAE CodeList" DataType="integer">
      <ExternalCodeList Dictionary="Common Terminology Criteria for Adverse Events" 
          Version="v4.0" href="http://evs.nci.nih.gov/ftp1/CTCAE/About.html"/>
  </CodeList>

The “Dictionary” and “Version” attributes are mandatory, the “href” attribute is optional (but should be used in case the codelist has been published on the internet). In case there is an electronic local instance of the dictionary, the “href” attribute must be replaced by a “ref” attribute.

It would be ideal if such published codelists also have an API so that one could do requests again them e.g. using a Web Service. Unfortunately, this is very seldomly the case. As such, it often not very easy to check whether the given value in the “ItemData” element is an acceptable value of that codelist.

Unfortunately, the “ExternalCodeList” element is not sufficiently used. I have seen ODM files which use the CTCAE codelist, but without saying so using the “ExternalCodeList” element. When then doing a mapping to SDTM, this becomes a very dangerous excercise, as there is no guarantee at all that the values in the ODM clinical data are really the correct ones that one is thinking.