(Image credit: Dollar Photo Club.)
The term “data quality” used to be reserved for highly technical conversations in the bowels of an insurer’s IT division as database developers and administrators discussed a new or existing database. That has all changed over the past several years. Now that term is discussed in insurance executive and even board meetings as part of discussions about how any insurer plans, operates and grows in its chosen marketplace. In short, data quality has quickly become synonymous with business quality, complete with the corresponding implications on operational effectiveness, competitive differentiation and profitability.
This three-part series will focus on creating a new or improving an existing, data quality management strategy and implementation approach that accounts for the elevated and strategic importance of data quality management in any insurer. And the core of any strategic data quality management strategy is the following components: detection, notification and remediation. Part I of this series will focus on detection.
Whether it has been formally created or has just evolved, every insurer has a data quality management strategy. Any time that data is aggregated, organized, and used for business purposes there is an implicit data quality strategy in place.
For this series of articles, the concepts and approaches described will be based on an enterprise data warehouse (EDW) strategy and architecture approach. That is the most common approach in the insurance industry and involves various source systems (SOR) that generate data that is maintained in a separate repository/warehouse (EDW) for purposes of producing reports, visualizations, portals, etc., as part of a business intelligence (BI) stack. No matter the data quality approach, however, at some point in time something will go wrong, and that’s where a sound data quality detection approach is critical.
In that context, data quality exceptions can be seen as both strategic and tactical. In the most severe instance, a strategic data quality exception can be akin to a wildfire—when it’s raging everyone sees it (detection), everyone knows about it (notification) and everyone is mobilized to address it (remediation). Such a scenario, just like an actual wildfire, should be a rare occurrence.
The distinction between a strategic and tactical data quality approach is that in a strategic approach the methods are proactive and intended to be preventative. The ultimate goal is that every critical data quality exception is detected before consumption in the BI Stack. And minimally, that some notification occurs to allow a Data Steward to begin planning a remediation along with forewarning key stakeholders.
Conversely, data quality exceptions that are ad hoc and unaccounted for and require immediate remediation fall into the category of tactical data quality issues. Such exceptions generally cannot be planned for or prevented due to multiple unknowns and variables. They simply must be dealt with on an ad hoc basis. That said, such exceptions are useful for their educational value toward preventing future data quality exceptions, or at least in identifying methods that can be implemented to enable more proactive data quality control. In any case, any and all data quality exceptions should be managed and cataloged in business rules to prevent their recurrence.
Besides the program bugs and process flaws that should get detected and remediated as part of a pre-production review, there are often latent flaws that appear in the post-production phase of any system implementation. In fact, tactical data quality exceptions can arise even when the detection process is working as designed. That often occurs when the SOR allows the creation of data that violates the implemented business rules.
Detecting a data quality exception can be either intentional (proactive) or accidental (reactive). Generally speaking, an accidental detection becomes a tactical data quality exception and is managed accordingly. The focus of a data quality management strategy as it concerns detection is around intentional detection as this fulfills the strategic data quality objective to proactively prevent the consumption of invalid data.
Intentional data quality detection may occur both before and after invalid data is persisted in the EDW. Once a data quality issue is identified and categorized—and therefore anticipated—a detection method can be implemented in either the pre-EDW or intra-EDW phase. If however, a data quality issue is unaccounted for it can be detected as part of a post-EDW audit where it can then be identified and categorized. In either case, the appropriate notification and remediation steps will follow the detections.
The assumption that invalid data should never be persisted in an EDW is a sound one, at least in theory. However, in real-world scenarios, such an approach can be too restrictive and could, in fact, cause more problems than intended. There are cases where it might be better to allow invalid data to be persisted, at least temporarily, as that would cause less damage than the potential damage to the overall EDW. That’s why it’s essential to create a robust data quality detection process that allows for levels of severity based on business rules. Done well, these exception levels can control how the EDW process will continue based on detection severity—or not—and the conditions under which detection notification occurs.
A strong set of business rules is a necessary part of any data management strategy. Business rules affect the way any data is sourced, transformed, and acted upon when data quality exceptions are detected in the EDW. The rules should be created pre-production but will likely get altered during the intra and post-production phases as unanticipated scenarios arise. Ultimately though, there should be a repository of data quality business rules focused on detecting data quality exceptions.
An example of a sound EDW business rule could be something like:
Valid Status Codes are characters in this range (‘’,0-9, A-G), any exceptions are allowed to flow into EDW, but a notification must be sent to data steward as a level 2, including the key of source record and value detected.
Operational procedures that deal with any data-related issued should be handled by a DevOps group. If and when a particular issue is identified as a data quality exception, it must be logged and categorized for remediation. A good data management strategy practice is to track every data quality exception in some purpose-built repository for historical reporting (MS Excel may suffice) and to associate each exception in accordance with the following attributes:
- Detection Point – as specific as possible, job, script, procedure, SQL, or other
- EDW Target Affected – table, column, other
- Source Object – file, table, column, other
- Exception – Rule ID, DBMS error, other
- Notification Action – none, level, Data Steward, other
- Root Cause – post analysis comments
- Status – to track remediation
Implementing a process for detecting data quality exceptions is the first foundational pillar toward creating an effective data management strategy.
In part two of this series, the focus will be on the next foundational pillar: Notification.