Slack’s Data Engineering and Machine Learning Product Pipeline

January 24, 2020

Josh Wills, a software engineer working on data engineering problems at Slack, discusses the Slack data architecture and how they build and observe their pipelines. Josh, along with color commentary such as the move from IC to manager (and back), discusses recommendations, tips, tools, and lessons Slack engineering teams discovered while building products like Slack Search. The podcast covers machine learning, observability, data engineering, and general practices for building highly resilient software.

Key Takeaways

Slack has a philosophy of building only what they need. They have a don’t reinvent the wheel mindset.
Slack was originally a PHP monolith. Today, it is largely Hack-lang, HHVM, and several Java and Go binarys. On the data side, application logs are in Thrift (there is a plan to migrate to protobuf). Events are processed through a Kafka cluster that handles 100,000s of events per second. Everything is kept in S3 with a large Hive metastore. EMR is spun upon demand. Presto, Airflow, Slack, Snowflake (business analytics), Quiver (key-value store) are all used.
ML worked best for Slack when it was used to help people answer questions. Things like Learn to Rank (LTR) become the most effective use of ML for Slack.
You can get pretty far with rules. Use machine learning when that’s all that’s left.
When you start applying observability to your data pipeline, a key lesson for Slack was to really focus on structured data, tracing, high cardinality events. This lets them really use the tools they were already familiar with (ELK, Prometheus, Grafana) and go deep into understanding what’s happening in the systems.

This article has been published from a wire agency feed without modifications to the text. Only the headline has been changed.

Source link

Slack’s Data Engineering and Machine Learning Product Pipeline

Key Takeaways

Most Popular

In Crypto we Still “Trust”

Understanding why Wall Street likes Crypto

Meta to Spend $40 Billion on AI this year

Is Bitcoin halving the launch of new era for crypto

Drawing the line on using AI in TV and film?

Making AI Sustainable

Follow Us

POPULAR POSTS

In Crypto we Still “Trust”

Making AI Sustainable

Drawing the line on using AI in TV and film?

Understanding why Wall Street likes Crypto

POPULAR CATEGORY

Slack’s Data Engineering and Machine Learning Product Pipeline

Key Takeaways

RELATED ARTICLES

Most Popular

Follow Us

POPULAR POSTS

POPULAR CATEGORY