Are reliable and robust systems your passion? Do you see SLIs and wonder about SLOs everywhere you look? Do you obsess over concurrency, service uptimes, performance numbers, metrics and dashboards? Do you worry if you are unable to account for every millisecond in your product? Do you enjoy building globally distributed teams and seeing your teammates excel in solving complex scalability and reliability requirements? Do you enjoy hands-on coding culture and player-coach style leadership?  On the SRE team at Affirm you will find a unique opportunity to leverage your leadership skills, while continuing to be hands on and responsible for design and implementation of some of the most innovative software solutions.  
Affirm’s engineering team is working on building a large-scale, massively distributed, fault-tolerant global infrastructure shared across multiple customer financial products, merchants and vendors. Ensuring reliability and scalability for such an infrastructure is hard. Our consistency and security requirements make scaling harder. Our Site Reliability Engineering team consists of engineers who combine software and systems engineering to build and run our infrastructure in a cost-effective manner. The team ensures that Affirm's systems meet our users' and partners' performance requirements, while enabling engineering to have immediate, actionable visibility into the software that they build and deploy.

What You'll Do

  • Manage a group of talented Site Reliability Engineers who are solving complex reliability and scalability challenges.
  • Provide technical leadership to major projects executed by product and platform engineering teams to scale our operations globally across multiple AWS regions and cloud providers.
  • Enable and empower your team to drive improved visibility & accountability, error detection and alerting across all products and business functions at Affirm.
  • Design, build and scale our metrics infrastructure which enable our engineering teams to relentlessly improve our site performance, latencies and reliability.
  • Lead by example, care for your team and establish credibility with the quality of your and your team's technical execution.
  • Recruit, inspire and retain the best talent across the globe.
  • What We Look For

  • Extensive experience building and owning large-scale, geographically distributed backend systems is a plus.
  • Highly skilled at developing and debugging in one or more programming languages.
  • Python and Linux experience is a plusExperience with operating system internals, file-systems, databases, and networks.
  • Preference to using, enhancing and contributing to open source solutions over building solutions from the ground up.
  • Unquenchable thirst for knowing everything within your platform and learning new technologies.
  • An obsession for performance and metrics.
  • Experience with AWS or comparable cloud providers.
  • ABOUT AFFIRM
    At Affirm we are using technology to re-imagine and re-build core parts of financial infrastructure to enable friendlier and more transparent financial products and services that improve lives.
    We believe the financial industry is fundamentally broken. Not only is the core infrastructure built with technology from the 1970s, but there are a dwindling number of people who say "I trust my bank to look out for me". It doesn’t have to be this way, and it’s our mission to fix this problem.
    We are based in San Francisco; founded by Max Levchin (founding CTO of PayPal), Jeff Kaditz (CDO DeNA/ngmoco), and Nathan Gettings (founding CTO of Palantir); and building a team of exceptionally talented people to join us on our mission.