Job Description & Details
Site Reliability Engineering is at the heart of modern digital services, ensuring systems stay up and performant 24/7. With businesses increasingly moving to cloud‑native architectures, skilled SREs are in high demand. This six‑month onsite role in Phoenix offers a fast‑track chance to apply your production‑support expertise in a banking‑focused environment.
Job Summary
We are seeking a hands‑on Site Reliability Engineer to monitor system health, troubleshoot production incidents, and drive automation across our cloud‑based services. The role involves building alerts, dashboards, and collaborating with development teams to improve reliability while participating in on‑call rotations.
Top 3 Critical Skills Table
| Skill | Why it's critical | Mastery Level |
|---|---|---|
| Core Java | Foundation for building and debugging services in Spring Boot | Senior |
| Monitoring (Splunk/Kibana/Grafana) | Provides real‑time visibility and rapid incident response | Senior |
| Cloud Platforms (AWS/Azure/GCP) | Enables scalability, reliability, and automation in production | Senior |
Interview Preparation
- How do you design an alerting strategy to minimize noise while ensuring critical incidents are caught?
What the interviewer is looking for: Understanding of threshold setting, severity levels, and use of tools like Splunk or Grafana. - Explain a time you performed a root‑cause analysis on a production outage. What steps did you take?
What the interviewer is looking for: Structured troubleshooting methodology, documentation, and collaboration with dev teams. - Describe how you would implement a CI/CD pipeline for a Spring Boot microservice.
What the interviewer is looking for: Familiarity with build tools, automated testing, and deployment orchestration. - What are the key differences between L1 and L2 support, and how do you transition an issue between them?
What the interviewer is looking for: Clear delineation of responsibilities, escalation procedures, and communication skills. - How would you automate the creation of monitoring dashboards for a new service in Grafana?
What the interviewer is looking for: Use of templating, API integration, and infrastructure‑as‑code concepts.
Resume Optimization
- Site Reliability Engineer
- Production Support
- Core Java
- Splunk
- Kibana
- Grafana
- PostgreSQL
- MongoDB
- ServiceNow
- CI/CD
Application Strategy
When reaching out to the recruiter, send a concise email that starts with a friendly greeting, attaches your updated resume, and clearly maps your experience to the role. Highlight your top skills—such as Core Java, monitoring with Splunk/Kibana/Grafana, and cloud automation—and reference any relevant projects where you reduced downtime or automated incident response. Mention that you’re eager to discuss how your background aligns with the team’s reliability goals.
Career Roadmap
| Current Role | Typical Experience | Core Focus | Next Position |
|---|---|---|---|
| Site Reliability Engineer | 2‑4 years | Incident response, automation, monitoring | Senior Site Reliability Engineer |
| Senior Site Reliability Engineer | 4‑7 years | Architecture, large‑scale reliability, mentorship | SRE Lead / Reliability Architect |
| SRE Lead / Reliability Architect | 7+ years | Strategy, cross‑team leadership, budgeting | Director of Reliability Engineering |