/images/th_background_4.jpg

Data Explorer

Our approach to data model management and cataloguing is transforming how businesses manage and leverage their data. Through a continuously integrated, compiler-driven process, we create and verify data models seamlessly integrated with source code — making data model management as straightforward as code development.

When a data lake becomes sophisticated - incorporating data from various sources like trade, reference, and legal - it becomes a challenge to track.

Where documentation exists, it is fragmented, out of date and maintained in a silo separate from the codebase.

/images/data-management.png

To utilise a data lake for upcoming projects and internal policy compliance, the following considerations become essential.

Have a single golden source data catalogue that evolves with the data model.

Clearly demarcated entry and exit points to and from the data lake, and attached custom classifications such as Internal, Confidential and Secret.

Accurately reflect the lineage of data fields and dependencies between structures.

/images/static-doc.png

The Static Documentation Approach

The Traditional Solution

The standard approach is to commission a task force to capture and document all data fields and flows afterwards. This approach has a number of drawbacks.

The Drawbacks of the Static Documentation Approach

The creation of this documentation is manual, error-prone and time consuming

As the produced documentation is decoupled from the source code, it is likely to be out of date as soon as it is produced.

Documentation maintained separately from the source code typically involves individuals who were not involved in the original design/implementation - leading to an increased risk of misinterpretation.

The TeraHelix Solution

Data model management creation and verification is driven by continuous integration and compiler processes. Analysts specify the Data Explorer alongside their source code - Data Catalogue as Code.

This process includes verifying conventions, resolving references, and automating publishing to various formats like Databricks and online documentation.

/images/tag.png

Data catalogue attribute, tag and annotation management: Users can specify predefined tags and enumerations. The compiler ensures no content violates these constraints, flagging issues during the build and peer review process..

/images/dimension.png

Data model referential schema linking: References and links declared in the Data Explorer/ data model are fully resolved by the compiler. This ensures that any 'dangling pointers' are identified early in the process.

/images/web-programming-small.png

Data catalogue consistency with compiler technology: A compiler will perform a series of checks and validations on the inputs to ensure that it is consistent and coherent.

/images/cloud-computing.png

Model change impact analysis: Potential impacts pertaining to model changes can be flagged as part of the build and peer review process.

/images/brainstorm.png

Derivation Generation: The generation of integrations, scripts, User Interfaces, as well as AI / RAG Prompts, is supported by the Data Explorer.