User Tools

Site Tools


uniqueness_of_oids_-_good_practice

Jozef Aerts 2008/12/31 11:36

Uniqueness of OIDs - Good Practice

Author: Jozef Aerts, XML4Pharma

Applicable to: ODM version 1.3

The OID uniqueness rules

The ODM standard uses the 'OID' attribute as the unique identifier for studies, units of measurement, versions of the metadata, and down to codelists and methods definitions. The rules for uniqueness of OIDs has been described in section 2.11 (Element Identifiers and References). Essentially the rules are:

  • The OIDs of different studies from the same sponsor must be unique
  • Within a Study, the OIDs for User, Location and SignatureDef (these are all OIDs from the 'AdminData' section) must be unique.
  • The OID for a MeasurementUnit, MetaDataVersion, StudyEventDef, FormDef, ItemGroupDef, ItemDef, CodeList, Presentation, ConditionDef, and MethodDef must be unique within a single study (within a related series of ODM documents). The definitions of StudyEventDef, FormDef, ItemGroupDef, ItemDef, CodeList, Presentation, ConditionDef and MethodDef can however be overwritten using the 'MetaDataVersion' versioning system (see below). It is important to note here that 'MeasurementUnit's are not versionable.
  • The OID for an ArchiveLayout must be unique within a single form definition.
Versioning using the MetaDataVersion element

As already stated, the elements StudyEventDef, FormDef, ItemGroupDef, ItemDef, CodeList, ConditionDef, and MethodDef can be versionized. All these elements are child elements of the 'MetaDataVersion' element. Each of the MetaDataVersion instances must have its own OID, so within a Study, there can never be two 'MetaDataVersion' instances with the same OID.

Now different MetaDataVersions can be used to e.g. define two branches (usually named 'Arms') of a Study. It can however also be used to define different versions of an entity (StudyEventDef, FormDef etc.) where the later version overrides the previous version. Assume that for example, we first defined the ItemDef “Sex” with OID “IT.SEX” as free text in MetaDataVersion 'MV.001':

  <MetaDataVersion OID="MV.001" Name="first version of the metadata">
      <ItemDef OID="IT.SEX" Name="Sex" DataType="text" Length="10"/>
  </MetaDataVersion>

Some time later, we decide that “Sex” should be stored in the database as “0” (for male) and “1” for female, and that these are the only allowed values. We then define a second version of the metadata:

  <MetaDataVersion OID="MV.002" Name="second version of the metadata">
      <ItemDef OID="IT.SEX" Name="Sex" DataType="integer" Length="1">
          <CodeListRef CodeListOID="CL.SEX"/>
      </ItemDef>
      <CodeList OID="CL.SEX" Name="CodeList for sex" DataType="integer">
          <CodeListItem CodedValue="0">
              <Decode>
                  <TranslatedText xml:lang="en">Male</TranslatedText>
              </Decode>
          </CodeListItem>
          <CodeListItem CodedValue="1">
              <Decode>
                  <TranslatedText xml:lang="en">Female</TranslatedText>
              </Decode>
          </CodeListItem>
      </CodeList>
  </MetaDataVersion>

The second version of the metadata (for the same study) must come later in the series (chain) of ODM files than the first version. This can be seen either by following the chain using the “FileOID” and “PriorFileOID” attributes of each ODM document.

For example, the first document may have:

  <ODM ... FileOID="MyStudyFile_001">...

and the second document may have:

  <ODM ... FileOID="MyStudyFile_002" PriorFileOID="MyStudyFile_001">...

Alternatively, when one has made definitions for a large number of entities in the first MetaDataVersion, and one only want to change one definition, and reuse all other, one may use the “Include” mechanism:

  <MetaDataVersion OID="MV.002" Name="second version of the metadata">
      <Include StudyOID="MyStudy" MetaDataVersionOID="MV.001"/>
      <ItemDef OID="IT.SEX" Name="Sex" DataType="integer" Length="1">
          <CodeListRef CodeListOID="CL.SEX"/>
      </ItemDef>
      ...
  </MetaDataVersion>

More explanation and examples about the “Include” mechanism is given in a separate contribution The 'Include' mechanism in CDISC ODM Study descriptions.

Good practice for assigning OIDs

As OIDs are the unique identifiers in ODM, it makes sense to assign OIDs in a consistent way, and even in a way that from the OID itself, one can already see what kind of entity it describes. For example, it may be very usefull to use a prefix so that it can immediately be seen that the OID belongs to an ItemDef, a FormDef or a CodeList. For example, for ItemDefs, one may choose to use OIDs like:

  • “IT.001”, “IT.002”, …
  • or: “ITM_001”, “ITM_002”

Similarly, for CodeLists:

  • “CL.001”, “CL.002”
  • “CDL_001”, “CDL_002” …

One may of course (although it is less common practice, try to assign meaningful values for the OIDs, such as:

  • “IT.SEX”, “CL.SEX”, “FO.DEMOGRAPHICS” etc.
Bad practice for assigning OIDs

Secondly, and this is a direct consequence of the first 'good practice', it is my opinion that one should not assign the same OID for different entities. For example, I do consider the following as 'bad practice':

  <StudyEventDef OID="DEMOGRAPHICS" ...>
      <FormRef FormOID="DEMOGRAPHICS" .../>
  </StudyEventDef>
  <FormDef OID="DEMOGRAPHICS" ... >
      <ItemGroupRef ItemGroupOID="DEMOGRAPHICS" .../>
  </FormDef>
  <ItemGroupDef OID="DEMOGRAPHICS" ...>
  ...
  </ItemGroupDef>

The ODM specification does not forbid this, but I still consider it as bad practice. Although one may argue that the OID is always connected to its context element, using the same OID to different entities is very confusing, especially for non-advanced users of the ODM.

Also, what I have often seen is that an ItemDef and its associated CodeList share the same OID, such as in:

  <ItemDef OID="itemRace" Name="Race" DataType="text">
      <Question>...</Question>
      <CodeListRef CodeListOID="itemRace"/>
  </ItemDef>

A much better practice is to have both a similar OID, but using a different prefix. For example:

  <ItemDef OID="it_Race" Name="Race" DataType="text">
      <Question>...</Question>
      <CodeListRef CodeListOID="cl_Race"/>
  </ItemDef>

In this case, from the value of the OIDs, one can still see that these do belong together, but each of them still has a distinct and clear OID.

Another construct that I have seen in ODM files, and that I do not like at all is:

  <StudyEvent OID="SE.VISIT1" ...>
      <FormRef FormOID="SE.VISIT1_FM.001" .../>
  </StudyEvent>
  <FormDef OID="SE.VISIT1_FM.001">
      <ItemGroupRef ItemGroupOID="SE.VISIT1_FM.001_IG.001" .../>
  </FormDef>
  <ItemGroupDef OID="SE.VISIT1_FM.001_IG.001">
      <ItemRef ItemOID="SE.VISIT1_FM.001_IG.001_IT.001" .../>
  </ItemGroupDef>
  <ItemDef OID="SE.VISIT1_FM.001_IG.001_IT.001" ...> 
  ... 
  <CodeListRef CodeListOID="SE.VISIT1_FM.001_IG.001_IT.001_CL.001"/>
  </ItemDef>
  <CodeList OID="SE.VISIT1_FM.001_IG.001_IT.001_CL.001" ...>
      ...
  </CodeList>

I.e. each OID is a composition of an identifier with the OID of its parent element. The problem here is reusability. With these constructs, e.g. ItemDefs are not reusable among forms, and e.g. codelists are not reusable among Items.

The worst case I have seen here, was an ODM file which had over 80 distinct CodeLists (according to the OID), each containing the same possible answers to the associated question: “Yes” and “No”.

These kinds of “monster” ODM files are typically produced by systems that either themselve have no system of reusable questions and forms, or where the developers of the ODM export made it themselves easy.

uniqueness_of_oids_-_good_practice.txt · Last modified: 2013/12/21 09:19 (external edit)