Josh Wills, a software engineer working on data engineering problems at Slack, discusses the Slack data architecture and how they build and observe their pipelines. Josh, along with color commentary such as the move from IC to manager (and back), discusses recommendations, tips, tools, and lessons Slack engineering teams discovered while building products like Slack Search. The podcast covers machine learning, observability, data engineering, and general practices for building highly resilient software.

Key Takeaways

  • Slack has a philosophy of building only what they need. They have a don’t reinvent the wheel mindset.
  • Slack was originally a PHP monolith. Today, it is largely Hack-lang, HHVM, and several Java and Go binarys. On the data side, application logs are in Thrift (there is a plan to migrate to protobuf). Events are processed through a Kafka cluster that handles 100,000s of events per second. Everything is kept in S3 with a large Hive metastore. EMR is spun upon demand. Presto, Airflow, Slack, Snowflake (business analytics), Quiver (key-value store) are all used.
  • ML worked best for Slack when it was used to help people answer questions. Things like Learn to Rank (LTR) become the most effective use of ML for Slack.
  • You can get pretty far with rules. Use machine learning when that’s all that’s left.
  • When you start applying observability to your data pipeline, a key lesson for Slack was to really focus on structured data, tracing, high cardinality events. This lets them really use the tools they were already familiar with (ELK, Prometheus, Grafana) and go deep into understanding what’s happening in the systems.

This article has been published from a wire agency feed without modifications to the text. Only the headline has been changed.

Source link