Software Engineer, Data Infrastructure
Data Infrastructure Engineering at Airbnb:
Data Infrastructure Engineering builds distributed components, systems, and tools that power decisions at Airbnb. No other travel service participates so broadly in the travel experience from discovery, to booking, to crafting experiences during the stay, and assessing quality of trips. This gives us an incredibly rich dataset to collect, transform, and analyze in order to improve the effectiveness of our marketplace and create delight for guests and hosts.
We leverage existing open source technologies like Kafka, Hadoop, Hive, Presto, Spark, and also write our own. Many projects listed below have been open sourced. As a member of our team you would spend time designing and growing our existing infrastructure, democratizing data access at the company, and promoting the correct use of data and analytics at the company.
What are examples of work that Data Infrastructure Engineers have done at Airbnb?
- Real-time streaming infrastructure: To enable teams to move quickly, getting accurate data with minimal delay is a core focus in DI. Currently, we are building out Spark-based infrastructure to allow for easy development of streaming applications.
- Interactive dimensional analysis: Data scientists have a strong need to query data and compute aggregates on various dimensional cuts. To address this, we are building a query tool based on Druid to allow users to interactively slice-and-dice large datasets.
- Cluster management systems: To help manage multiple petabyte-scale clusters, easy-to-use systems to handle security, disaster recovery, and replication are in development. For replication, we have developed and open sourced ReAir
- Data workflow management: Started at Airbnb, Airflow is a system that enables users to schedule data-related workflows with a code-as-configuration model and web front end.
- Machine learning infrastructure: Many products at Airbnb rely on machine learning (ML) to achieve their goals. A common infrastructure for ML can save significant development time for the company.
The following experience is relevant to us:
- 2+ years of full-time, industry experience
- Working with data at the petabyte scale
- Design and operation of robust distributed systems
- Experience with Java / Scala is preferred
- Strong scripting ability in Ruby / Python / Bash
- Working knowledge of relational databases and query authoring (SQL)
- Love to use and develop open source technologies like Kafka, Hadoop, Hive, Presto, and Spark
- Rigor in high code quality, automated testing, and other engineering best practices
- BS/MS/PhD in Computer Science or a related field (ideal)
- Competitive salaries
- Quarterly employee travel coupon
- Paid time off
- Medical, dental, & vision insurance
- Life insurance and disability benefits
- Fitness Discounts
- Flexible Spending Accounts
- Apple equipment
- Commuter Subsidies
- Community Involvement (4 hours per month to give back to the community)
- Company sponsored tech talks and happy hours
- Much more...