Our approach to data model management and cataloguing is transforming how businesses manage and leverage their data. Through a continuously integrated, compiler-driven process, we create and verify data models seamlessly integrated with source code — making data model management as straightforward as code development.
To utilise a data lake for upcoming projects and internal policy compliance, the following considerations become essential.
Have a single golden source data catalogue that evolves with the data model.
Clearly demarcated entry and exit points to and from the data lake, and attached custom classifications such as Internal, Confidential and Secret.
Accurately reflect the lineage of data fields and dependencies between structures.
The Static Documentation Approach
The standard approach is to commission a task force to capture and document all data fields and flows afterwards. This approach has a number of drawbacks.
Data model management creation and verification is driven by continuous integration and compiler processes. Analysts specify the Data Explorer alongside their source code - Data Catalogue as Code.
This process includes verifying conventions, resolving references, and automating publishing to various formats like Databricks and online documentation.
Data catalogue attribute, tag and annotation management: Users can specify predefined tags and enumerations. The compiler ensures no content violates these constraints, flagging issues during the build and peer review process..
Data model referential schema linking: References and links declared in the Data Explorer/ data model are fully resolved by the compiler. This ensures that any 'dangling pointers' are identified early in the process.
Data catalogue consistency with compiler technology: A compiler will perform a series of checks and validations on the inputs to ensure that it is consistent and coherent.
Model change impact analysis: Potential impacts pertaining to model changes can be flagged as part of the build and peer review process.
Derivation Generation: The generation of integrations, scripts, User Interfaces, as well as AI / RAG Prompts, is supported by the Data Explorer.