8 Data quality best practice
- Within this section:
- 8.1 General requirements
- 8.2 The parser's role in data quality
NOTE
Data quality requirements are extremely important from an interoperability perspective. Agreeing on the data structure that xNAL represents is only half of the problem. The other half of the problem is to agree on an acceptable level of data quality to achieve the data exchange without errors.
8.1 General requirements
There are a number of data quality requirements that apply to all elements and attributes.
8.1.1 Strong data typing
All elements and attributes that contain text nodes (textual data) have strong data types.
Data types used in xNAL are listed in the table below.
|
Type name
|
Namespace
|
Constraints
|
Used by
|
|---|---|---|---|
|
String |
XNLb |
Limited length, based on xs:string Special white spacing rules apply (all insignificant white spacing is collapsed). This type is used for all text nodes containing name or address data regardless of the actual value whether it is numeric or alphanumeric. |
xNL-basic, xNL, xAL, xNAL |
|
Guid |
XNLb |
An xs:string constrained with a 128-bit GUID pattern of XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX |
xNL, xAL, xNAL |
|
Guids |
XNLb |
A list of space delimited GUIDs defined by guid data type. |
xAL |
This table provides only a high-level description of the simple data types. The appropriate schema should be referred to for more detail.
8.1.2 Case sensitivity
Matching and validation procedures are case insensitive. Applications should utilise formats that would be used in a formal paper-based document. Applications can change the casing as required for a particular presentation.
8.1.3 Identifiers
Identifiers (such as keys as discussed in Section 7.1, xAL referencing through the AddressDetailsKey attribute) are a useful feature of xNAL, but xNAL neither mandates any particular identifier system nor specifies how to obtain the data substituted with an identifier. To agree these is a responsibility of the parties involved in an xNAL data exchange.
It is a valid scenario when one party involved in the data exchange has no access to the identifier system used by the other party. This will lead to misunderstanding between the parties. Use the identifiers only after confirming that they are both understood and absolutely necessary by the other party.
8.1.4 Full details versus sufficient details
Names and Addresses may contain various levels of detail, but still be sufficient for the intended use. xNAL does not prescribe what level of detail should be used; but the default rule is that if the application of the name or address data is unknown, the highest level of detail should be provided. This default rule will give some additional reliability to the data, as it is easier to discard redundant or unnecessary parts than to search for missing bits and pieces.
8.2 The parser's role in data quality
An xNAL parser plays a critical role in the success of xNAL application in a distributed environment. An xNAL parser can parse name and address data and return it in a normalised form in the xNAL format. For example, an agency (the user) has a database that contains name data as one field. This data can be sent to an xNAL parser and returned back as xNAL where first name, last name, and all other name parts sit as separate xNAL elements. This concept is illustrated in Figure 11.

The xNAL parser may take in a number of formats, but it always returns an xNAL infoset with the input data parsed to the best ability of the parser.
An xNAL parser may also provide validation services (making it a validating parser).
The validation services provide validation of addresses against a reference dataset and names against common naming rules. Note, however, that name validation may not be 100% effective, as there are some rare combinations of names that are almost impossible to parse even with a human level of intelligence.
NOTE
Selection of a particular parser to meet requirements to be set in future by the New Zealand E-government Unit (EGU) is outside the scope of this document.
8.2.1 Parsed and unparsed data
All xNAL infosets can be separated in three groups:
- unparsed
- parsed
- partially parsed.
Unparsed data
Unparsed data resides as free-text in xNLb:NameLine or xAL:AddressLine elements.
The following XML fragment provides an example of an unparsed address. All address data resides in the AddressLines element.

The same data may reside in only one AddressLine element, as in the following example:

Parsed data
Parsed data resides under specific xNAL elements, and no significant data should reside under xNLb:NameLine or xAL:AddressLine elements.
The following XML fragment provides an example of a parsed address. All address data resides in the respective xNAL elements. The AddressLine element contains supplementary information.

Partially parsed data
Partially parsed data resides under specific elements, but some significant data may still reside under xNLb:NameLine or xAL:AddressLine elements.
The following XML fragment provides an example of a partially parsed address. Most of the address data resides in the corresponding xNAL elements. The first AddressLine element contains supplementary information, but the second AddressLine element contains what is supposed to be a flat number/street number. Assume that somehow the parser didn't pick it up and left the data unparsed. It is a valid scenario, but it normally occurs when the initial data is of a very poor quality.

8.2.2 Data parsing (cleansing) and interoperability
xNAL makes it valid for users to exchange parsed or unparsed data without violation of the schemas. Parsing may affect interoperability in a negative way as the recipient will not be able to fully understand the received data. Figure 12 illustrates possible use cases involving the xNAL parser.
Parsing data before sending
Depending on the application, exchange of unparsed or partially parsed data may be valid in some cases. Usually the data should eventually be parsed before it can be processed. Figure 13 illustrates a possible exchange sequence where one party uses an xNAL parser to parse and validate some xNAL data and then sends the parsed data to another party.
Sending unparsed data
Figure 14 illustrates another option where one party sends unparsed data to the other and the recipient has to use an xNAL parser to have the data parsed. It shifts the burden of responsibility for data quality to the service provider (Party B in this case). This is still a valid use case if there is no parsing service available for Party A. It is also a valid behaviour for Party B to reject any unparsed or partially parsed data. This condition should be agreed and specified in the description of the particular service.

8.2.3 Default data-parsing requirements
As discussed above, parties can exchange parsed or unparsed data. Unparsed data, however, can affect interoperability. The default rule therefore is that no unparsed data should be sent to another party unless by a prior agreement, in which case it will be an exemption from this standard. Receiving Agencies do not have to accept unparsed data—it is the Sending Agency's responsibility to submit the data in the best parsed form possible.
It is the responsibility of:
- the Sending Agency to use an xNAL parser to make sure that the data is parsed
- each party individually to use or not to use a validating xNAL parser to make sure that the received data is correct.
[ Previous | Next ]

