Back to Jobs

Site Reliability Engineer (SRE)

Not Disclosed

Job Description & Details

Site Reliability Engineering is at the heart of modern digital services, ensuring systems stay up and performant 24/7. With businesses increasingly moving to cloud‑native architectures, skilled SREs are in high demand. This six‑month onsite role in Phoenix offers a fast‑track chance to apply your production‑support expertise in a banking‑focused environment.

Job Summary

We are seeking a hands‑on Site Reliability Engineer to monitor system health, troubleshoot production incidents, and drive automation across our cloud‑based services. The role involves building alerts, dashboards, and collaborating with development teams to improve reliability while participating in on‑call rotations.

Top 3 Critical Skills Table

Skill Why it's critical Mastery Level
Core Java Foundation for building and debugging services in Spring Boot Senior
Monitoring (Splunk/Kibana/Grafana) Provides real‑time visibility and rapid incident response Senior
Cloud Platforms (AWS/Azure/GCP) Enables scalability, reliability, and automation in production Senior

Interview Preparation

  1. How do you design an alerting strategy to minimize noise while ensuring critical incidents are caught?
    What the interviewer is looking for: Understanding of threshold setting, severity levels, and use of tools like Splunk or Grafana.
  2. Explain a time you performed a root‑cause analysis on a production outage. What steps did you take?
    What the interviewer is looking for: Structured troubleshooting methodology, documentation, and collaboration with dev teams.
  3. Describe how you would implement a CI/CD pipeline for a Spring Boot microservice.
    What the interviewer is looking for: Familiarity with build tools, automated testing, and deployment orchestration.
  4. What are the key differences between L1 and L2 support, and how do you transition an issue between them?
    What the interviewer is looking for: Clear delineation of responsibilities, escalation procedures, and communication skills.
  5. How would you automate the creation of monitoring dashboards for a new service in Grafana?
    What the interviewer is looking for: Use of templating, API integration, and infrastructure‑as‑code concepts.

Resume Optimization

  • Site Reliability Engineer
  • Production Support
  • Core Java
  • Splunk
  • Kibana
  • Grafana
  • PostgreSQL
  • MongoDB
  • ServiceNow
  • CI/CD

Application Strategy

When reaching out to the recruiter, send a concise email that starts with a friendly greeting, attaches your updated resume, and clearly maps your experience to the role. Highlight your top skills—such as Core Java, monitoring with Splunk/Kibana/Grafana, and cloud automation—and reference any relevant projects where you reduced downtime or automated incident response. Mention that you’re eager to discuss how your background aligns with the team’s reliability goals.

Career Roadmap

Current Role Typical Experience Core Focus Next Position
Site Reliability Engineer 2‑4 years Incident response, automation, monitoring Senior Site Reliability Engineer
Senior Site Reliability Engineer 4‑7 years Architecture, large‑scale reliability, mentorship SRE Lead / Reliability Architect
SRE Lead / Reliability Architect 7+ years Strategy, cross‑team leadership, budgeting Director of Reliability Engineering