Data Platform at Airbnb:
The Data Platform team at Airbnb advances the state of data at Airbnb, and empowers users to intuitively derive insights from this data. To accomplish this goal, the team leverages existing open source technologies like Kafka, Hadoop, Hive, Presto, Spark, and other inhouse tools to curate high-quality data sets. The team also builds data tooling and establishes company-wide best practices that empower users throughout the company to build high-quality datasets and data products.
What are examples of work that Data Platform Engineers have done at Airbnb?
- Global Metrics Repo: a widely adopted computation framework that allows users to easily define metrics and dimensions, which can be leveraged for business reporting and evaluating experiment performance.
- Real-time/Online Data Services: a framework that enable online data use-cases for use in Product. Our current infrastructure leverages Spark Streaming and Nebula (our Production facing Key/Value store) to power numerous production facing use cases, and is backed by a robust anomaly detection framework powered by Druid.
- Machine learning infrastructure: Many products at Airbnb rely on machine learning (ML) to achieve their goals, and we’ve built a common infrastructure for ML that saves significant development time for the company.
- Logging Infrastructure: By clarifying testing procedure, and automating common testing procedures, we provide improved data quality, and faster iteration cycles for anyone working with data at Airbnb. To accomplish this, we are building tooling and test infrastructure that identifies problems before code is deployed into production.
- Core Data Warehouse: a collection of tables that represent the fundamental concepts used to describe Airbnb's business. These datasets are built and maintained in Hive, and are accessible via Hive, Presto, Spark, and DRUID. These highly curated datasets drive the majority of analytics use cases in the company.
The following experience is relevant to us:
- 5+ years of full-time, industry experience
- Working with data at the petabyte scale
- Design and operation of robust distributed systems
- Experience with Java / Scala is preferred
- Strong scripting ability in Ruby / Python / Bash
- Working knowledge of relational databases and query authoring (SQL)
- Love to use and develop open source technologies like Kafka, Hadoop, Hive, Presto, and Spark
- Rigor in high code quality, automated testing, and other engineering best practices
- BS/MS/PhD in Computer Science or a related field (ideal)
- Competitive salaries
- Quarterly employee travel coupon
- Paid time off
- Medical, dental, & vision insurance
- Life insurance and disability benefits
- Fitness Discounts
- Flexible Spending Accounts
- Apple equipment
- Commuter Subsidies
- Community Involvement (4 hours per month to give back to the community)
- Company sponsored tech talks and happy hours
- Much more...