Summary
When you build a machine learning model, the first step is always to load your data. Typically this means downloading files from object storage, or querying a database. To speed up the process, why not build the model inside the database so that you don’t have to move the information? In this episode Paige Roberts explains the benefits of pushing the machine learning processing into the database layer and the approach that Vertica has taken for their implementation. If you are looking for a way to speed up your experimentation, or an easy way to apply AutoML then this conversation is for you.
Interview
- Introduction
- How did you get involved in the area of data management?
- Can you start by giving an overview of the current state of the market for databases that support in-process machine learning?
- What are the motivating factors for running a machine learning workflow inside the database?
- What styles of ML are feasible to do inside the database? (e.g. bayesian inference, deep learning, etc.)
- What are the performance implications of running a model training pipeline within the database runtime? (both in terms of training performance boosts, and database performance impacts)
- Can you describe the architecture of how the machine learning process is managed by the database engine?
- How do you manage interacting with Python/R/Jupyter/etc. when working within the database?
- What is the impact on data pipeline and MLOps architectures when using the database to manage the machine learning workflow?
- What are the most interesting, innovative, or unexpected ways that you have seen in-database ML used?
- What are the most interesting, unexpected, or challenging lessons that you have learned while working on machine learning inside the database?
- When is in-database ML the wrong choice?
- What are the recent trends/changes in machine learning for the database that you are excited for?
This article has been published from the source link without modifications to the text. Only the headline has been changed.