We are seeking an experienced site reliability engineer for a contract to hire role in the Austin area.   This is at a great organization with a collaborative and welcoming work culture that has competitive compensation along with great benefits and perks.

Job Responsibilities

  • Design, build, and support scalable, highly available distributed systems and cloud infrastructure
  • Own enterprise monitoring, observability, and incident response practices with strong emphasis on Dynatrace
  • Develop and maintain CI/CD pipelines and Infrastructure-as-Code automation
  • Lead incident management, root cause analysis, and reliability improvements aligned to SLO/SLI frameworks
  • Partner with development, architecture, and operations teams to improve platform reliability, performance, and scalability
  • Drive automation, self-healing systems, and proactive reliability engineering practices

Required Skills & Qualifications

  • 5+ years in SRE, DevOps, or platform engineering
  • Observability & Monitoring experience with Dynatrace as well as  enterprise monitoring and observability tools such as Prometheus, Grafana, Splunk, ELK, Datadog, or similar
  • Infrastructure-as-Code (Terraform, Ansible, or similar but prefer Terraform)
  • Kubernetes (EKS / AKS / GKE)
  • Docker and container orchestration
  • CI/CD pipeline development (GitLab, GitHub, Jenkins, etc.)
  • Automated deployment, scaling, and incident response
  • Programming / Scripting – Python, Go, Java, Bash, PowerShell, or similar
  • Kafka, Redis, PostgreSQL, Cassandra, or similar distributed storage and messaging platforms
  • Incident command, post-mortems, and root cause analysis