We are seeking an experienced site reliability engineer for a contract to hire role in the Austin area. This is at a great organization with a collaborative and welcoming work culture that has competitive compensation along with great benefits and perks.
Job Responsibilities
- Design, build, and support scalable, highly available distributed systems and cloud infrastructure
- Own enterprise monitoring, observability, and incident response practices with strong emphasis on Dynatrace
- Develop and maintain CI/CD pipelines and Infrastructure-as-Code automation
- Lead incident management, root cause analysis, and reliability improvements aligned to SLO/SLI frameworks
- Partner with development, architecture, and operations teams to improve platform reliability, performance, and scalability
- Drive automation, self-healing systems, and proactive reliability engineering practices
Required Skills & Qualifications
- 5+ years in SRE, DevOps, or platform engineering
- Observability & Monitoring experience with Dynatrace as well as enterprise monitoring and observability tools such as Prometheus, Grafana, Splunk, ELK, Datadog, or similar
- Infrastructure-as-Code (Terraform, Ansible, or similar but prefer Terraform)
- Kubernetes (EKS / AKS / GKE)
- Docker and container orchestration
- CI/CD pipeline development (GitLab, GitHub, Jenkins, etc.)
- Automated deployment, scaling, and incident response
- Programming / Scripting – Python, Go, Java, Bash, PowerShell, or similar
- Kafka, Redis, PostgreSQL, Cassandra, or similar distributed storage and messaging platforms
- Incident command, post-mortems, and root cause analysis



Recent Comments