Skip to content.
|Networking government in New Zealand.

8 Data quality best practice

NOTE

Data quality requirements are extremely important from an interoperability perspective. Agreeing on the data structure that xNAL represents is only half of the problem. The other half of the problem is to agree on an acceptable level of data quality to achieve the data exchange without errors.

8.1 General requirements

There are a number of data quality requirements that apply to all elements and attributes.

8.1.1 Strong data typing

All elements and attributes that contain text nodes (textual data) have strong data types.

Data types used in xNAL are listed in the table below.

Type name
Namespace
Constraints
Used by

String

XNLb

Limited length, based on xs:string

Special white spacing rules apply (all insignificant white spacing is collapsed).

This type is used for all text nodes containing name or address data regardless of the actual value whether it is numeric or alphanumeric.

xNL-basic, xNL, xAL, xNAL

Guid

XNLb

An xs:string constrained with a 128-bit GUID pattern of XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX

xNL, xAL, xNAL

Guids

XNLb

A list of space delimited GUIDs defined by guid data type.

xAL

This table provides only a high-level description of the simple data types. The appropriate schema should be referred to for more detail.

8.1.2 Case sensitivity

Matching and validation procedures are case insensitive. Applications should utilise formats that would be used in a formal paper-based document. Applications can change the casing as required for a particular presentation.

8.1.3 Identifiers

Identifiers (such as keys as discussed in Section 7.1, xAL referencing through the AddressDetailsKey attribute) are a useful feature of xNAL, but xNAL neither mandates any particular identifier system nor specifies how to obtain the data substituted with an identifier. To agree these is a responsibility of the parties involved in an xNAL data exchange.

It is a valid scenario when one party involved in the data exchange has no access to the identifier system used by the other party. This will lead to misunderstanding between the parties. Use the identifiers only after confirming that they are both understood and absolutely necessary by the other party.

8.1.4 Full details versus sufficient details

Names and Addresses may contain various levels of detail, but still be sufficient for the intended use. xNAL does not prescribe what level of detail should be used; but the default rule is that if the application of the name or address data is unknown, the highest level of detail should be provided. This default rule will give some additional reliability to the data, as it is easier to discard redundant or unnecessary parts than to search for missing bits and pieces.

8.2 The parser's role in data quality

An xNAL parser plays a critical role in the success of xNAL application in a distributed environment. An xNAL parser can parse name and address data and return it in a normalised form in the xNAL format. For example, an agency (the user) has a database that contains name data as one field. This data can be sent to an xNAL parser and returned back as xNAL where first name, last name, and all other name parts sit as separate xNAL elements. This concept is illustrated in Figure 11.

The xNAL parser may take in a number of formats, but it always returns an xNAL infoset with the input data parsed to the best ability of the parser.

An xNAL parser may also provide validation services (making it a validating parser).

The validation services provide validation of addresses against a reference dataset and names against common naming rules. Note, however, that name validation may not be 100% effective, as there are some rare combinations of names that are almost impossible to parse even with a human level of intelligence.

NOTE

Selection of a particular parser to meet requirements to be set in future by the New Zealand E-government Unit (EGU) is outside the scope of this document.

8.2.1 Parsed and unparsed data

All xNAL infosets can be separated in three groups:

  • unparsed
  • parsed
  • partially parsed.

Unparsed data

Unparsed data resides as free-text in xNLb:NameLine or xAL:AddressLine elements.

The following XML fragment provides an example of an unparsed address. All address data resides in the AddressLines element.

The same data may reside in only one AddressLine element, as in the following example:

Parsed data

Parsed data resides under specific xNAL elements, and no significant data should reside under xNLb:NameLine or xAL:AddressLine elements.

The following XML fragment provides an example of a parsed address. All address data resides in the respective xNAL elements. The AddressLine element contains supplementary information.

Partially parsed data

Partially parsed data resides under specific elements, but some significant data may still reside under xNLb:NameLine or xAL:AddressLine elements.

The following XML fragment provides an example of a partially parsed address. Most of the address data resides in the corresponding xNAL elements. The first AddressLine element contains supplementary information, but the second AddressLine element contains what is supposed to be a flat number/street number. Assume that somehow the parser didn't pick it up and left the data unparsed. It is a valid scenario, but it normally occurs when the initial data is of a very poor quality.

8.2.2 Data parsing (cleansing) and interoperability

xNAL makes it valid for users to exchange parsed or unparsed data without violation of the schemas. Parsing may affect interoperability in a negative way as the recipient will not be able to fully understand the received data. Figure 12 illustrates possible use cases involving the xNAL parser.

Parsing data before sending

Depending on the application, exchange of unparsed or partially parsed data may be valid in some cases. Usually the data should eventually be parsed before it can be processed. Figure 13 illustrates a possible exchange sequence where one party uses an xNAL parser to parse and validate some xNAL data and then sends the parsed data to another party.

Sending unparsed data

Figure 14 illustrates another option where one party sends unparsed data to the other and the recipient has to use an xNAL parser to have the data parsed. It shifts the burden of responsibility for data quality to the service provider (Party B in this case). This is still a valid use case if there is no parsing service available for Party A. It is also a valid behaviour for Party B to reject any unparsed or partially parsed data. This condition should be agreed and specified in the description of the particular service.

8.2.3 Default data-parsing requirements

As discussed above, parties can exchange parsed or unparsed data. Unparsed data, however, can affect interoperability. The default rule therefore is that no unparsed data should be sent to another party unless by a prior agreement, in which case it will be an exemption from this standard. Receiving Agencies do not have to accept unparsed data—it is the Sending Agency's responsibility to submit the data in the best parsed form possible.

It is the responsibility of:

  • the Sending Agency to use an xNAL parser to make sure that the data is parsed
  • each party individually to use or not to use a validating xNAL parser to make sure that the received data is correct.

[ Previous | Next ]