Summary
As data professionals we have a number of tools available for storing, processing, and analyzing data. We also have tools for collaborating on software and analysis, but collaborating on data is still an underserved capability. Gavin Mendel-Gleason encountered this problem first hand while working on the Sesshat databank, leading him to create TerminusDB and TerminusHub. In this episode he explains how the TerminusDB system is architected to provide a versioned graph storage engine that allows for branching and merging of data sets, how that opens up new possibilities for individuals and teams to work together on building new data repositories. This is a fascinating conversation on the technical challenges involved, the opportunities that such as system provides, and the complexities inherent to building a successful business on open source.
Interview
- Introduction
- How did you get involved in the area of data management?
- Can you start by describing what TerminusDB is and what motivated you to build it?
- What are the use cases that TerminusDB and TerminusHub are designed for?
- There are a number of different reasons and methods for versioning data, such as the work being done with Datomic, LakeFS, DVC, etc. Where does TerminusDB fit in relation to those and other data versioning systems that are available today?
- Can you describe how TerminusDB is implemented?
- How has the design changed or evolved since you first began working on it?
- What was the decision process and design considerations that led you to choose Prolog as the implementation language?
- One of the challenges that have faced other knowledge engines built around RDF is that of scale and performance. How are you addressing those difficulties in TerminusDB?
- What are the scaling factors and limitations for TerminusDB? (e.g. volumes of data, clustering, etc.)
- How does the use of RDF triples and JSON-LD impact the audience for TerminusDB?
- How much overhead is incurred by maintaining a long history of changes for a database?
- How do you handle garbage collection/compaction of versions?
- How does the availability of branching and merging strategies change the approach that data teams take when working on a project?
- What are the edge cases in merging and conflict resolution, and what tools does TerminusDB/TerminusHub provide for working through those situations?
- What are some useful strategies that teams should be aware of for working effectively with collaborative datasets in TerminusDB?
- Another interesting element of the TerminusDB platform is the query language. What did you use as inspiration for designing it and how much of a learning curve is involved?
- What are some of the most interesting, innovative, or unexpected ways that you have seen TerminusDB used? https://en.wikipedia.org/wiki/Semantic_Web- What are the most interesting, unexpected, or challenging lessons that you have learned while building TerminusDB and TerminusHub?
- When is TerminusDB the wrong choice?
- What do you have planned for the future of the project?
This article has been published from the source link without modifications to the text. Only the headline has been changed.