Full Time
Austin, TX, US
Posted 1 year ago

We are seeking an experienced site reliability engineer for a direct-hire role in the Austin area. This is at a great organization with a collaborative and welcoming work culture that has competitive compensation along with great benefits and perks.

Apply for Job

Job Responsibilities

Serve as an Incident Commander and lead incident response
Lead post-mortem retrospective meetings as well as create relevant post-incident documentation and communications
Implement and administer monitoring and alerting tooling to enable proactive incident response processes
Build and configure integrations between systems for monitoring, alerting and reporting system health
Track, report and effectively communicate system availability and performance metrics
Facilitate the creation of operational runbooks and document common recovery actions
You will work with teams to define Service Level Objectives
Collaborate with software developers and architects to identify improvements that will increase the reliability of our systems

Required Skills & Qualifications

Experienced SRE with strong background acting as Incident commander as well as leading post-mortem
Scripting experience is highly preferred. The team relies heavily on Powershell but Python is also emerging in use more
Experience in Dynatrace or similar platform is a huge plus
Plus if have used Ansible and Microsoft Flows in PowerAps
gExperience implementing monitoring, logging or alerting

Apply for Job

info@ppaac.com

512-750-0778

Site Reliability Engineer – Incident Commander

Job Responsibilities

Required Skills & Qualifications

Recent Posts

Recent Comments