(Image credit: Adobe Stock.)
In Part I of this article series, we discussed the need for creating and implementing a strategic data quality approach, with a focus on detection. The foundation for such an approach is a data quality management strategy that includes the necessary governance and processes for maintaining effective data quality throughout the data lifecycle. That involves focusing on the major components required to manage data quality: detection, notification and remediation. In part two the focus will turn to notification.
Any enterprise data warehouse (EDW) encompasses many subsystems and components. There is the primary database that business intelligence (BI) users will access for reporting and analytic needs, and there are the many components that are used to build and maintain that database.
The notification process described in this article addresses how various EDW events are communicated to interested stakeholders, such as BI users and others. The components of this event notification solution are intended to accommodate many types of EDW events, including any data quality events or issues. And for purposes of this article, the event notification solution will use the publish-subscribe model (a.k.a. design pattern) to provide these notifications.
Before detailing the solution and its implementation, a short primer describing some of the terms involved may be useful for clarity and shared understanding.
These are various types of events that any EDW is aware of and would be of interest to users of that EDW.
An example of such an EDW event would be the data quality health of an EDW table; let’s use a policy table as an example. What is meant by “data quality health?” This descriptor could indicate many things, so the design used to publish such an event must be done in a way that is meaningful for an intended subscriber.
An example implementation for “data quality health of policy table” might include a three-level status and a descriptive comment. The three-level status would be green, yellow or red, corresponding with a status of good, warning, and problem. The descriptive comment would consist of some details that pertain to the status. The implementation for this single EDW Policy table data quality health event may require multiple steps, e.g., verify table exists, sanity check on row counts, most recent refresh date, validate foreign keys if any, validate primary keys if any, etc. The combination of all these checks will then be evaluated, and the final status and comments would be posted for possible notifications.
Again, EDW events can take many forms, but those intended for publication and subsequent notification must be implemented in a way that allows an EDW subscriber the opportunity to select from a list of events and decide if and how they would like to be notified.
As a starting point, the EDW team should design and implement some basic events. The table health status example from above is a good one. In that example, there would be a separate event for every BI table involved. Of course, there may be—and usually are—special requirements to provide events that address specific BI interests. These would be implemented as enhancements to the EDW.
For event publication purposes a repository that allows all published events to be listed and ultimately subscribed to by a BI user will be required. The options for such a repository are many, and most organizations likely already have incumbent tools or platforms that they can leverage for this purpose.
Whatever forms the repository takes; however, there will probably be a need to create a table to maintain an EDW event list. Regarding what it means to publish an event, that’s relatively straightforward: creating the event and adding it to some repository constitutes publication of the event. Having that process in place allows any BI user to subscribe to any event they choose. However, the notification itself happens when a trigger is fired for an event.
This is the process that will allow a BI User to receive EDW Event Notifications. Ideally, each BI user with access to this solution can manage their list of EDW events including how they will be notified. However, an initial implementation could be done in a more manual way where the IT team would maintain some spreadsheet or database for this purpose until a fully automated system is ready.
Some suggested subscription options include:
- The EDW event trigger
- Any posting/update of that event
- Only if table status is“yellow” or “red”
- Frequency of notification—daily, weekly, other
- Method of notification—email, text, other
- Copy others
So when does an event notification really happen? The basic parameters are:
- An EDW event must get published
- One or more subscriptions must be created
- The event trigger must get fired for the combination of event and subscription criteria
Any notification solution should include internal logging to track both successful and failed notifications. Continuing with the policy table health event example, the trigger may consist of multiple actions:
- First, all the steps that comprise the event itself must complete. The parameter example above lists many steps that would likely be part of a daily Extract-Transform-Load (ETL) job.
- The result of all this must get posted, and the implication is that there is some repository where this gets posted.
- A process must poll this repository to detect the trigger fired, e.g., a value was updated.
- Finally, the result posted will be evaluated for all subscribers to see if that result qualifies to send a notification.
Alternative Simple Solution
The above components are for a very robust notification solution. But what would a simpler solution look like if needed before a more advanced solution can be implemented?
- Create a table that includes Event Name, Result, Comment, Posted Time Stamp, Notified Time Stamp, Notification Email Address
- Create ETL jobs that update this table by Event Name, updating Result and Comment and Posted Time Stamp
- Create an ETL job that periodically checks table for any row with Posted Time Stamp after the Notified Time Stamp, and sends an email for each found, then updates Notified Time Stamp
Of course, this solution offers little control but would be simple to implement, and a table such as this would be manually maintained as far as new Event entry and Notification Email Address. The trigger here is simply that the Posted Time Stamp is greater than the Notified Time Stamp.
This article describes and provides some requirements for the key concepts of an Event Notification Solution. For any solution there will be choices of scope that must be made:
- Should any of these concepts be implemented?
- Should a simplified implementation approach be considered?
- Can existing tools and technologies be leveraged, or is it necessary to build or purchase?
No matter the approach, a data quality management strategy that includes EDW exception, status and event notification is an important milestone on the way to overall data quality health.
Part three of this article will focus on the final step involved in addressing data quality issues: Remediation.