Open Rules for CDISC Standards - Implementing the validation rules in your own validation software

With the "Open Rules for CDISC Standards" everyone, every company, can develop its own software for executing FDA, PMDA and CDISC validation rules, this as the rules in written in the W3C standardized vendor-neutral XQuery language. This makes you independent of a single provider (one is well known for the many bugs and "false positives" in its software). With "Open Rules for CDISC Standards", you can inspect each of the rule implementations, improve them, correct them immediately if necessary (please let us know, so that we can make an update available to everyone within a few hours - no more waiting until the next release which is next year). You can even use these rule implementations as a template for your own additional, company-specific rules, and deploy them immediately.
In the next sections, we explain you how you can easily use these rules in your own software, be it Java, C# or any other modern computer language.

Principles

The rules require you to have your SDTM, SEND or ADaM datasets in the modern CDISC Dataset-XML format. The completely outdated SAS Transport 5 (.xpt format) is not supported (this format is not vendor-neutral).
In most cases, you will need to pass two parameters to the software that is executing the rules, which is explained in detail below. The first parameter is the "location" of where the whole submission (files) is located. This can be:

A file folder: in that case, the path will start with: "file:///". For example on Windows: "file:///C:/MySubmissions/LZZT_Pilot_2013_Dataset-XML/"
Don't forget the "/" at the end.
A native XML or other no-SQL database address (so-called "collection"). For example: "/db/MySubmissions/LZZT_Pilot_2013_Dataset-XML/".
A RESTful web query string that returns the XML. This can be extremely useful in a SOA environment

The second parameter is the name of the define.xml file or resource within the submission, e.g. "define.xml"

In some cases there is a third parameter, allowing you to pass a single file or resource. For example, for rules that take somewhat more time to complete, you might want to validate only one file (e.g. "VS.xml") instead of all findings files (all files is the default).

Passing parameters to the XQuery

When having downloaded validation rules in XQuery (either using the web service or from the regular download), and inspecting what has been returned, you will notice that the hardcoded location of the define.xml has been commented out (everything between "(:" and ":)" is comment). Instead you will find:

This allows to pass parameters to the XQuery. The following parameters are defined:

$base: the basis of the location where the XML documents are located. This can be a file directory or folder, a URL, or a collection in a native XML database.
Do not forget to put / at the end when appropriate
$define: the name of the define.xml document. This is very important, as already stated: "define.xml is leading"

In some of the XQueries you will find an additional parameter:

$domain: code for the domain or single dataset on which to perform the validation, e.g. "VS"

This additional parameter has been added for those rules that are either memory or computing intensive.
In order to improve performance, set $domain to a single domain, for example $domain='LB'. In such case, the validation will only performed on datasets for the LB domain.
if you want to do the validation for all domains at once however, use $domain='ALL'. This is not always a good idea, especially when having large datasets. A better method is to iterate over all domains, and apply the validation to each of them separately.

Passing the location of your MedDRA files to your implementation

For rules that need to read MedDRA files, you will need to pass the location of your MedDRA folder. The parameter for this is $meddrabase. For example: $meddrabase='C:\meddra_19_0_english\MedAscii'.
Unfortunately, MedDRA is still propriety and requires a license, and is deployed using 30 year old technology (ASCII files).

Running the XQueries from within your Java software

You will usually want to run the XQueries using a validation software (similar to OpenCDISC - but better). Writing such validation software is pretty straightforward, but we still like to provide you a "jumpstart".

Here is a Java programm that we developed for testing and which you can use as a base. There are two methods, one for quering Dataset-XML submission files that are stored as files, and the second for quering submission Dataset-XML documents that are in a native XML database (eXist-DB in this case).

Let's go through the steps. First of all the libraries that are needed:

We used SaxonHE9 for parsing and as the XQuery engine, so you will need the saxonhe9.jar and the saxon9-xqj.jar which you can get from the Saxon website

Further you will need the xmldb.jar library, which you can obtain from several sources. If you do already have the eXist native XML database installed, you can find it in directory /lib/core.
if you prefer to use BaseX as native XML database, you can probably also find it in its distribution (I haven't tried yet).

The method 'runXQueryOnFile' is for the case that the Dataset-XML submission files are in a file system:

First we tell the system that we want to use the Saxon XQuery engine and an InputStream is defined for the file containing the XQuery - in this case it contains the validation rule FDAC201.
Then the XQuery engine is set up and the contents of the file are put into a "XQPreparedExpression" (which is like a prepared statement in Java-SQL).
The contents of the next lines ("exp.bindObject") pass the location of the define.xml file to the XQuery - also see the $base and $define variables in the XQuery. Remark that the XQuery does not need to know where the other Dataset-XML files are located, as the location is read from the define.xml document (using "def:leaf xlink:href").
Then the XQuery is executed ("exp.executeQuery();") and an iteration over the set with results is performed. An example output is e.g.:

At the end, do not forget to close the connection ...

The second method "runXQueryOnEXistDB" demonstrates how to proceed when the define.xml and Dataset-XML files are stored in a native XML database (in this case eXist-DB).
in this case, you will need a few extra Java libraries i.e. xmlrpc-client-3.1.3.jar and xmlrpc-common-3.1.3.jar which you will find in the eXist-DB distribution under /lib/core.

The first lines of the method are:

We define the driver (just as you would do for a relational database, but of course it's another one) and the connection string (we use XML-RPC for the connection), and then define the location of the file with the XQuery (in this case for rule FDAC201 again).

The database is then registered and the XQuery service is then invoked, to which we then pass the two variables "base" and "define" (see $base and $define in the XQuery file), and then read the contents of the XQuery file into a String (the method readFile can be found in the source code).

The query is then executed ("service.query") and an iteration is started over all the results. As these are XML elements, we use an "XMLResource". Of course the XML could also be put into an assembly, and visualited e.g. using a stylesheet.

A typical example output is:

Remark that this code is a "quick and dirty" code, not optimized for performance at all. But I think it is a good start anyway.

If you need some help with one these, please do not hesitate to contact us (info-at-xml4pharma.com).

Courtesy of XML4Pharma - last update: March 2017