Public summary

Join a remote-first company that develops an open-source performance testing platform used globally by engineering teams to ensure resilient and high-performing systems. Role focuses on advancing operational excellence and reliability engineering practices for a large-scale distributed SaaS product, with opportunities to influence architecture and lead product development. The position is remote in Germany time zones, within a transparent, innovative culture emphasizing collaboration, autonomy, and career growth.

Location and work setup

Location: Germany
Remote status: Remote
German requirement signal: No German Required Detected
Detected job language: English

Salary

EUR 109709.00 - 131651.00 year

Responsibilities

Define and scale a culture of operational excellence by establishing reliability standards and coaching teams on ownership and availability. Drive advanced DevOps and SRE practices such as incident management, alerting, observability, runbooks, and release/change management. Implement reliability frameworks including SLIs, SLOs, error budgets, and utilize metrics for prioritization and engineering decisions. Guide design, development, and operation of distributed cloud systems. Influence product and system architecture through collaboration and technical leadership. Provide clear documentation and technical communication internally and externally. Evolve role into broader application and product leadership as reliability foundation matures.

Qualifications

Strong experience in DevOps and SRE practices operating production systems at scale. Proficiency or strong background in programming languages (primarily Python or Go). Expertise in designing, building, and operating large-scale distributed cloud systems. Deep understanding of reliability engineering concepts, including incident response, observability, and failure modes. Experience with test automation for performance and functional testing. Ability to influence engineering practices via clear communication, code review, and collaboration. Familiarity with modern software engineering processes. Self-driven and comfortable with autonomy and ambiguity. Bonus points for experience with containerization (Docker, Kubernetes), cloud platforms (AWS), observability tools, event-driven or asynchronous systems, and defining/applying SLIs/SLOs or error budgets.

Skills

DevOps Site Reliability Engineering (SRE) Python Go Distributed Systems Incident Management Observability Test Automation Performance Testing Cloud-Native Systems Docker Kubernetes AWS Technical Leadership Software Development Reliability Engineering

Staff Software Engineer - Cloud Performance Testing Platform (Remote Germany)