Site Reliability Engineers (SRE) at Algolia are both software and systems engineers that ensure we can reliably serve billions of queries every day, for users all around the world, despite datacenters being unavailable and undersea cables being cut. As we operate many services including our Search API, Places, DocSearch and Analytics, you’ll keep learning new things everyday and share what you have learned.
The platform we develop uses both virtual and bare-metal systems spanning over 50 data centers in 15 different regions serving millions of users from every corner of the globe. Since search is a critical component of many applications, the SRE team maintains a high level of expertise in system failures in order to prevent them and provide reliable service to our customers.
No two problems are the same because all the systems evolve all the time. We expect you to be a resilient problem solver who isn’t afraid to think outside of the box and use the knowledge of system interactions in your favor. You’ll also take ownership of complete projects and execute them.
The team is composed of engineers with different backgrounds and experience both in the industry and academia. The diversity works in our favour and you should increase it by bringing your experience, your knowledge and your point of view. Thinking differently is a plus, not a minus. We’re transparent with each other and to other teams both about our success and our failures. This way we learn, we accept our weaknesses and continuously strive to improve both personally and professionally.

RESPONSIBILITIES

  • Work with other teams to identify, troubleshoot, and resolve high impact issues
  • Evaluate performance of current and future systems, both software and hardware
  • Participate in design of new systems
  • Develop and maintain the automation framework used for all systems
  • Participate in on-call rotation to ensure fast response to production issues
  • REQUIREMENTS

  • 4+ years of software engineering experience
  • Knowledge of Shell scripting and at least one scripting language (Python, Ruby, etc.)
  • Willing to learn Go (golang)
  • Understanding of Linux systems: I/O, process scheduling, filesystems
  • Understanding of computer networks: TCP/IP, DNS, load-balancing
  • Full professional English proficiency
  • Rigor in high code quality, automated testing, and other engineering best practices
  • Ability to make independent decisions and taking ownership for them
  • NICE TO HAVE

  • Knowledge of Go (golang)
  • Ability to use a configuration management tool like Ansible, Puppet or Chef
  • Knowledge of low level principles of computers and network components
  • Performance profiling of applications both in development and production
  • Knowledge of cloud platforms such AWS / GCP / Azure