Team for Career Site

Technology

In short

In the dynamic landscape of On, the tech thrives much like a spirited runner: always moving, always improving. We are building technology that continues to supercharge the growth of On, helping to ignite the human spirit through movement. We’re seeking a Staff Site Reliability Engineer to ensure our digital platforms deliver exceptional performance, reliability, and scalability to support our global customer base.

As a Staff Site Reliability Engineer (SRE) at On, you will play a pivotal role in designing, building, and maintaining our cloud infrastructure to support our e-commerce platforms, customer-facing applications, and internal systems. You will work closely with engineering teams to drive reliability, optimise performance, and implement automation, serving as a technical expert and mentor within the team.

Your mission

– System Reliability & Performance: Ensure high availability (99.99%+ uptime), scalability, and performance of On’s digital platforms through proactive optimisation and robust infrastructure design.
– Infrastructure Development: Build and maintain cloud-based infrastructure using Infrastructure-as-Code (IaC) tools.
– Automation: Develop and implement automation solutions to streamline deployments, reduce toil, and enhance monitoring.
– Incident Response: Lead incident resolution, perform root cause analyses, and implement preventive measures to minimise downtime and improve system resilience.
– Monitoring & Observability: Design and maintain monitoring, logging, and alerting systems to ensure proactive issue detection and resolution.
– Collaboration: Partner with software engineering, product, and security teams to align infrastructure with business objectives and ensure secure, scalable systems.
– Capacity Planning: Analyse and forecast infrastructure needs to support On’s growth, balancing performance and cost efficiency.
– Mentorship: Provide technical guidance and mentorship, fostering a culture of continuous learning and improvement.
– Compliance & Security: Ensure systems meet industry standards for data privacy and security.

Your story

– Extensive experience in site reliability engineering with a track record of managing complex, high-traffic systems.
– Strong expertise in cloud platforms (GCP) and container orchestration (Kubernetes, GKE).
– Proficiency in scripting and programming (e.g. in Python, Go) for automation and tooling.
– Experience with CI/CD pipelines (ArgoCD, GitHub Actions) and IaC (Terraform).
– Solid understanding of networking, load balancing, and DNS management.
– Experience with observability and monitoring for cloud native environments.
– Strong analytical skills with a proactive approach to resolving complex technical challenges.
– Excellent communication skills, with the ability to explain technical concepts to diverse stakeholders.

Nice to Have:
– Experience with e-commerce platforms or high-traffic consumer applications.
– Background in performance engineering, including load testing and capacity optimisation for peak traffic events (e.g., product launches, Black Friday).
– Experience optimising global content delivery networks (CDNs) for low-latency, high-performance user experiences (e.g., Cloudflare).
– Bachelor’s degree in Computer Science, Engineering, or a related field (or equivalent experience).

Meet the team

You will join a skilled and dynamic team of cloud & site reliability engineers dedicated to transforming On’s technological foundation. We are crafting scalable, resilient cloud solutions to power internal operations, enhance product performance, and support On’s growth. As a key member of our team, you will shape our cloud infrastructure strategy, ensuring robust, efficient, and sustainable systems that drive innovation. Join us in Berlin, to make a lasting impact on On’s digital future!

Job Overview
We use cookies to improve your experience on our website. By browsing this website, you agree to our use of cookies.

Sign in

Sign Up

Forgotten Password

Cart

Basket

Share