Audio version of the article
As a data engineer you’re familiar with the process of collecting data from databases, customer data platforms, APIs, etc. At YipitData they rely on a variety of alternative data sources to inform investment decisions by hedge funds and businesses. In this episode Andrew Gross, Bobby Muldoon, and Anup Segu describe the self service data platform that they have built to allow data analysts to own the end-to-end delivery of data projects and how that has allowed them to scale their output. They share the journey that they went through to build a scalable and maintainable system for web scraping, how to make it reliable and resilient to errors, and the lessons that they learned in the process. This was a great conversation about real world experiences in building a successful data-oriented business.
- How did you get involved in the area of data management?
- Can you start by giving an overview of what YipitData does?
- What kinds of data sources and data assets are you working with?
- What is the composition of your data teams and how are they structured?
- Given the use of your data products in the financial sector how do you handle monitoring and alerting around data quality?
- For web scraping in particular, given how fragile it can be, what have you done to make it a reliable and repeatable part of the data pipeline?
- Can you describe how your data platform is implemented?
- How has the design of your platform and its goals evolved or changed?
- What is your guiding principle for providing an approachable interface to analysts?
- How much knowledge do your analysts require about the guarantees offered, and edge cases to be aware of in the underlying data and its processing?
- What are some examples of specific tools that you have built to empower your analysts to own the full lifecycle of the data that they are working with?
- Can you characterize or quantify the benefits that you have seen from training the analysts to work with the engineering tool chain?
- What have been some of the most interesting, unexpected, or surprising outcomes of how you are approaching the different responsibilities and levels of ownership in your data organization?
- What are some of the most interesting, unexpected, or challenging lessons that you have learned from building out the platform, tooling, and organizational structure for creating data products at Yipit?
- What advice or recommendations do you have for other leaders of data teams about how to think about the organizational and technical aspects of managing the lifecycle of data projects?
This article has been published from a wire agency feed without modifications to the text. Only the headline has been changed.