==== ODM CodeLists Good Practice ====
--- //[[Jozef.Aerts@XML4Pharma.com|Jozef Aerts]] 2009/11/21 10:53//
Author: Jozef Aerts, XML4Pharma
Applicable to: ODM version 1.3, 1.2, 1.1
== CodeLists and ODM ==
CodeLists are used in ODM to limit the number ("enumerate") of answers to a question that appears on an (e)CRF or within an ePRO page (other uses not excluded).
They also allow to "localize" the text of the possible answers for different languages and cultures.
In ODM, CodeLists can only be referenced from "ItemDef" elements. Essentially, an "ItemDef" corresponds to a question on a form or a data point measured by an instrument.
For example:
How do you feel today?
Wie fühlen Sie sich heute
Comment sentez-vous aujourd'hui?
The item (data point) here is about how the subject feels today. The question text is given for three languages: English, German and French. For more information about internationalization in ODM, please see our previous contribution [[http://www.xml4pharmaserver.com/XML4PharmaWiki/doku.php?id=cdisc_odm_in_multi-language_studies|CDISC ODM in multi-language studies]].
A reference to a CodeList with OID "CL.019" is made, limiting ("enumerating") the possible answers.
The definition of the codelist is e.g.:
I feel good
Ich fühle mich gut
Je me sens bien
I feel pretty well
Ich fühle mich ziemlich gut
Je me sens assez bien
I don't feel so well
Ich fühle mich nicht so gut
Je me ne sens pas tres bien
I feel pretty bad
Ich fühle mich ziemlich schlecht
Je me ne sens assez mauvaise
I feel terrible
Ich fühle mich sehr schlecht
Je me ne sens terrible
So there are only 5 possible answers to the question, and for each of them, the answer texts are given for the three languages that the study will be deployed in.
The values that are stored in the clinical database can only be one of "1", "2", "3", "4" and "5" - no other values are allowed. So, if one encounters a data point:
then this is not only a violation of the standard, but is also totally unclear what this value represents.
In an eCRF or on an ePRO page, the user will usually never see these values, but only the possible answers in his/her own language (also see our [[http://www.xml4pharmaserver.com/XML4PharmaWiki/doku.php?id=cdisc_odm_in_multi-language_studies|contribution on internationalization]]).
== DataTypes for CodeLists ==
Each CodeList element has a mandatory attribute "DataType". The possible values are "integer", "float", "text" and "string". The data type of "float" does not make much sense and is seldomly used (although some people like to use it to make subcategories of possible answers, e.g. "1.1", "1.2", "2.1" ...).
There are a number of rules for the value of the "DataType" that are often incorrectly implemented:
* The DataType attributes of the referenced CodeList and the containing ItemDef must be the same
* The CodedValue must be an acceptable value of the DataType of the containing CodeList
* CodeListItems within a single CodeList must not have duplicate CodedValues
The first rule states that the value of the "DataType" attribute of the "ItemDef" that references the "CodeList must be identical to that of the "CodeList" itself. So the following example is a **violation of the standard**:
...
...
The second rule is a rule that is very often violated, especially in an attempt to designate that the user did not (want to) answer the question:
...
...
...
...
...
No response to the question
The last "CodeListItem" attempts to state that a "null" value must be added to the database in case the user did not answer the question. However, the empty string for "CodedValue" is **NOT** an acceptable value for the type "integer". This is a **violation** of the standard.
There are two possible solutions (best practices) for this.
* the first solution is to define both the ItemDef as the CodeList as being of data type "text". This is the less elegant solution
* it looks as that it is allowed that the subject or user does not answer the question. However, this should not be defined in the CodeList, it should be defined in the reference to the question itself, i.e. in the "ItemRef" element, using the "Mandatory" attribute:
...
The 'Mandatory="No"' attribute means that it is allowed that the question is simply not answered (i.e. the data point not collected), and thus no (or a null) value may appear in the database.
In the clinical part of the ODM, this means that the "ItemData" element for that specific data point will **NOT** appear, as the specification says: "The better practice is to transmit only collected data".
So, in the above case ('Mandatory="No"'), and the subject did not answer the question (i.e. the data point was not collected), the following statements would be **invalid**:
* a third solution is of course to have a CodedValue "6" for which the English text is "not collected":
Not collected
It that case, it is (consequently) adviced to set "Mandatory" to "Yes" when referencing the ItemDef.
The third rule ("CodeListItems within a single CodeList must not have duplicate CodedValues") is a pretty simple and logical one: one cannot have two identical values (as interpreted by the data type) for the coded value.
So, the following examples are **invalid**, and do **violate the standard**:
...
...
...
...
In the second case, the value "1" and "1.0" are identical as being floats
The following however is correct:
...
...
In this case, the data type is "text", so "1" and "1.0" are different values.
== CodeLists and the Length attribute ==
Another error (or at least misinterpretation of the specification) is demonstrated using the following snippet:
...
Caucasian
Black or African American
Native Hawaiian and Other Pacific Islander
...
As one sees, a Length of 42 has been defined on the ItemDef, with the idea that the longest string ("Native Hawaiian ...") has 42 characters. This is however a false interpretation of the specification.
The "Length" attribute denotes the length needed to store the **coded** value in the database.
So in this case, a Length of "1" suffices as the coded values are all lower than 10.
== EnumeratedItem ==
As of ODM 1.3, it is possible to make a simple enumeration list without different answer texts for different languages. This is useful only when the way the possible answer is displayed is culture- and language independent.
A typical example for this is the question "how many alcoholic drinks did you have yesterday", with the possible answers "0", "<=1", "1-2", ">2".
One can than (but there is no obligation) use the EnumeratedItem element as follows:
The same rules as above mentioned apply.
There is however an extra rule saying that "CodeListItems and EnumeratedItems may not be mixed within a single codelist". So the following is invalid:
None
Ne rien
Keine
Also remark the use of the XML entities '<' and '>', in order that the XML parser is not confused thinking there is an "open XML element" or "close XML element" statement.
== ExternalCodeList ==
In some cases, one may use published, publicly available codelists.
For example, there is the "[[http://ctep.cancer.gov/protocolDevelopment/electronic_applications/ctc.htm|Common Terminology Criteria for Adverse Events]]" codelist. In version 4, it contains nearly 800 terms for adverse events descriptions.
If one wants to use such a CodeList, the "ExternalCodeList" element should be used. For example:
The "Dictionary" and "Version" attributes are mandatory, the "href" attribute is optional (but should be used in case the codelist has been published on the internet). In case there is an electronic local instance of the dictionary, the "href" attribute must be replaced by a "ref" attribute.
It would be ideal if such published codelists also have an API so that one could do requests again them e.g. using a Web Service. Unfortunately, this is very seldomly the case. As such, it often not very easy to check whether the given value in the "ItemData" element is an acceptable value of that codelist.
Unfortunately, the "ExternalCodeList" element is not sufficiently used. I have seen ODM files which use the CTCAE codelist, but without saying so using the "ExternalCodeList" element. When then doing a mapping to SDTM, this becomes a very dangerous excercise, as there is no guarantee at all that the values in the ODM clinical data are really the correct ones that one is thinking.