Home Data Engineering Data Media Creating A Cost Effective Data Catalog With Tree Schema

Creating A Cost Effective Data Catalog With Tree Schema

Audio version of the article


A data catalog is a critical piece of infrastructure for any organization who wants to build analytics products, whether internal or external. While there are a number of platforms available for building that catalog, many of them are either difficult to deploy and integrate, or expensive to use at scale. In this episode Grant Seward explains how he built Tree Schema to be an easy to use and cost effective option for organizations to build their data catalogs. He also shares the internal architecture, how he approached the design to make it accessible and easy to use, and how it autodiscovers the schemas and metadata for your source systems.


  • Introduction
  • How did you get involved in the area of data management?
  • Can you start by giving an overview of what you have built at Tree Schema?
    • What was your motivation for creating it?
  • At what stage of maturity should a team or organization consider a data catalog to be a necessary component in their data platform?
  • There are a large and growing number of projects and products designed to provide a data catalog, with each of them addressing the problem in a slightly different way. What are the necessary elements for a data catalog?
    • How does Tree Schema compare to the available options? (e.g. Amundsen, Company Wiki, Metacat, Metamapper, etc.)
  • How is the Tree Schema system implemented?
    • How has the design or direction of Tree Schema evolved since you first began working on it?
  • How did you approach the schema definitions for defining entities?
  • What was your guiding heuristic for determining how to design the interface and data models? – I wrote down notes that combine this with the question above
  • How do you handle integrating with data sources?
  • In addition to storing schema information you allow users to store information about the transformations being performed. How is that represented?
    • How can users populate information about their transformations in an automated fashion?
  • How do you approach evolution and versioning of schema information?
  • What are the scaling limitations of tree schema, whether in terms of the technical or cognitive complexity that it can handle?
  • What are some of the most interesting, innovative, or unexpected ways that you have seen Tree Schema being used?
  • What have you found to be the most interesting, unexpected, or challenging lessons learned in the process of building and promoting Tree Schema?
  • When is Tree Schema the wrong choice?
  • What do you have planned for the future of the product?

This article has been published from the source link without modifictions to the text. Only the headline has been changed.

Source link

- Advertisment -

Most Popular

- Advertisment -