Insight and analysis on the data center space from industry thought leaders.
Overcoming Common Roadblocks to Data Vault Development
Data vaults, like data warehouses, require ongoing operations overhead to schedule, execute and monitor the data feeds—including handling failed jobs and restarts, while ensuring everything is processed in the correct order.
November 15, 2017
barrydevlin_0
Barry Devlin is Founder and Principal of 9Sight Consulting.
Data warehouse developers have historically walked a narrow line between data quality and business agility. At the same time, they balance the needs and relationships between IT and internal business clients. Technology has answered this dilemma with two separate approaches: the data vault optimized for data warehouse agility, and data warehouse automation for faster and more reliable development.
Data vault modeling is designed for long-term historical storage of data from multiple operational systems, looking at data associated with auditing, tracing of data, loading speed and resilience. Data vault inventor, Dan Linstedt, first conceived this approach in the early 2000s. Data vault modeling is now in its second generation.
The data vault is a hybrid of third normal (3NF) and star-schema forms that offers significant benefits and interesting challenges. On the plus side, it promises agility to address rapid changes in business needs, separates ingestion concerns from various business uses and promotes data quality best practices. However, its structure is enormously complex with thought provoking design choices.
While data vaults grow in popularity and expand with features like a new development methodology, developers who want to implement one continue to face numerous challenges. Below are a few of the most challenging obstacles and tips on how to overcome them.
Eliminate the Rift Between IT and Business
The age-old struggle between IT and business explicitly challenges data vault projects. An overly engineering mindset in IT may alienate business interests. As the data warehouse staff focuses on implementing a new data vault model, they could reduce their face-to-face time with the business, leading to poorer, less detailed, or delayed delivery of specific business solutions.
Delays can widen the rift between business and IT and prompt the business to look elsewhere for quick-fix solutions. Models and methods must take a backseat to seamless collaboration between business and IT to meet functional business needs and tight delivery timeframes. An automated approach to data vault design, development, deployment and operation can both accelerate time to data vault delivery, as well as provide new abilities to iteratively collaborate with business users early in the project – increasing engagement, trust and success in delivering value to the business the first time.
Start with Data Sources
The first step in adhering to data vault principles is to understand the source systems, their structures, relationships and underlying data quality. Although a time-consuming task, it is necessary to validate the model design and implementation approach. Automated discovery and data quality profiling will reduce the design time and population process. Business and IT can collaborate in compressed time windows to iterate on model designs and validate with live data. This approach eliminates assumptions, enables the model to be validated before deployment and ensures the data warehouse can evolve at the pace needed by the business.
Set Rules and Follow Them
The data vault model involves an extensive framework of rules and recommendations. A data vault’s data objects—from common hubs, links and satellites to the lesser known point in time and bridge helper tables—must adhere to specific standards and definition rules to ensure data vault agility and ease of maintenance. When developers “re-invent” these structures, problems arise that demand reworking, both in the initial build and in ongoing operation.
Whether sharing tasks between diverse teams or onboarding new team members, ensuring sustained best practices requires strict design standards, documentation, error handling and auditing. By eliminating the idiosyncrasies of each developer’s coding style, the generated code is consistent across the team and adheres to the same naming standards, resulting in ease of maintenance and future upgrades as well as quick on-boarding of new developers.
Create an Automated Culture of Maintenance
Perhaps the most under-appreciated challenge for a data warehouse team is the ongoing operation, maintenance and upgrade of the environment. Prepare for it now.
Data vaults, like data warehouses, require ongoing operations overhead to schedule, execute and monitor the data feeds—including handling failed jobs and restarts, while ensuring everything is processed in the correct order. Data vaults are also challenged by the added complexity of scheduling and management numerous data and processing objects. Manual approaches are inadequate to address this. A particular challenge is that in manual deployment the necessary logging and auditing capabilities are often sidelined when projects fall behind schedule.
Capabilities in automation software, such as integrated scheduling tools and automated logging and auditing capabilities, help IT teams to meet the complexity and continuous need for operational attention head on.
Be Ready for Non-stop, High-speed Change
Businesses are inundated with change—constant, rapid and unpredictable change—sometimes even before the first data warehouse iteration is rolled out. A key driver of the data vault model and methodology is to ease the problems associated with such ongoing change.
At the level of practical implementation, response to change first requires the ability to carry out extensive and effective impact analysis. What tables and columns will be affected by changing this code? What are the unintended, down-stream consequences? How can we reduce risk and, simultaneously expedite necessary change? Documentation is supposed to provide answers, but the reality is that manual approaches to development are seldom accompanied by complete, up-to-date documentation.
Beyond the productivity and standardization gains associated with eliminating the vast majority of hand-coding required to deliver a data vault, documentation automation may be the most visible and impactful contribution to a project seen by IT teams. With code and documentation tied to metadata, change management can be automated and reduced to hassle-free review rather than decoding ancient programming. Such metadata-driven automation is key to keeping pace with the ever more rapidly changing business needs.
During the last decade and a half, businesses have been gradually adopting the data vault model as a new foundation for their data warehouses. Its design and approach has been instrumental in successfully addressing the growing need for agility in business analytics and decision-making support.
However, many companies have found that the structural complexity of the model can challenge the IT teams charged with implementation. Automation software built to tackle data vault development, such as WhereScape Data Vault Express, can improve collaboration between business and IT, boost developer productivity, increase organizational consistency and standardization, better position teams for change and help organizations reap the benefits of Data Vault 2.0 much quicker.
Opinions expressed in the article above do not necessarily reflect the opinions of Data Center Knowledge and Informa.
About the Author
You May Also Like