Automate Feature Extraction With Molecula

Summary

A majority of the time spent in data engineering is copying data between systems to make the information available for different purposes. This introduces challenges such as keeping information synchronized, managing schema evolution, building transformations to match the expectations of the destination systems. H.O. Maycotte was faced with these same challenges but at a massive scale, leading him to question if there is a better way. After tasking some of his top engineers to consider the problem in a new light they created the Pilosa engine. In this episode H.O. explains how using Pilosa as the core he built the Molecula platform to eliminate the need to copy data between systems in able to make it accessible for analytical and machine learning purposes. He also discusses the challenges that he faces in helping potential users and customers understand the shift in thinking that this creates, and how the system is architected to make it possible. This is a fascinating conversation about what the future looks like when you revisit your assumptions about how systems are designed.

Interview

  • Introduction
  • How did you get involved in the area of data management?
  • Can you start by giving an overview of what you are building at Molecula and the story behind it?
    • What are the additional capabilities that Molecula offers on top of the open source Pilosa project?
  • What are the problems/use cases that Molecula solves for?
  • What are some of the technologies or architectural patterns that Molecula might replace in a companies data platform?
  • One of the use cases that is mentioned on the Molecula site is as a feature store for ML and AI. This is a category that has been seeing a lot of growth recently. Can you provide some context how Molecula fits in that market and how it compares to options such as Tecton, Iguazio, Feast, etc.?
    • What are the benefits of using a bitmap index for identifying and computing features?
  • Can you describe how the Molecula platform is architected?
    • How has the design and goal of Molecula changed or evolved since you first began working on it?
  • For someone who is using Molecula, can you describe the process of integrating it with their existing data sources?
  • Can you describe the internal data model of Pilosa/Molecula?
    • How should users think about data modeling and architecture as they are loading information into the platform?
  • Once a user has data in Pilosa, what are the available mechanisms for performing analyses or feature engineering?
  • What are some of the most underutilized or misunderstood capabilities of Molecula?
  • What are some of the most interesting, unexpected, or innovative ways that you have seen the Molecula platform used?
  • What are the most interesting, unexpected, or challenging lessons that you have learned from building and scaling Molecula?
  • When is Molecula the wrong choice?
  • What do you have planned for the future of the platform and business?

This article has been published from the source link without modifications to the text. Only the headline has been changed.

Source link