Site Reliability Engineer

Company: Cirrus Group Consulting
Location: Reston
Posted on: February 13, 2026

Job Description:

Job Description Job Description Site Reliability Engineer LOCATION: Reston, VA SUMMARY OF POSITION The Site Reliability Engineer (i.e., “SRE”) role is responsible for the optimization and reliability of core technical platforms and platform services, and exerting significant technical leadership in the continuous improvement of service reliability to platform stakeholders. The SRE will champion the overall health of OF core technical platforms, lead the response to operational incidents, determine root causes, propose and implement remediations that ensure overall platform viability. OF IT platforms and infrastructure exist over three locations (i.e., “on-premise”), including, Office Headquarters (Reston, VA), Primary Data Center Co-Location (Sterling, VA), and Disaster Recovery Data Center Co-Location (Chicago, IL), as well as a limited set of infrastructure services provided by Microsoft Azure (i.e., “Azure”). The core technical platform is Red Hat OpenShift, with a variety of platform services to include, but not limited to, Red Hat AMQ, HashiCorp Vault, and Keycloak, that are consumed by various platform stakeholders. This role will span from the OpenShift platform to services provided by Azure. We’re proud of the way our teammates have a positive impact on everything we do. Our employees are committed to and exemplify our Core Values: Integrity through accountability, consistency, transparency and trust Agility through adaptability, continuous improvement, expertise, and flexibility Partnership through collaboration, communication, leadership, and teamwork Inclusivity through diversity, relationships, respect, and support PRINCIPAL RESPONSIBILITIES Maintain overall health and reliability of core technical platforms and platform services to ensure business continuity and high availability. Maintain and improve the end-to-end observability of the platform, to ensure that platform state is at all times understood in context with supporting information and data that can be quickly marshalled into action. Lead incident response, root-cause analysis, and postmortems that advance the overall health of the system and prevent or diminish reoccurrence of platform issues. Partner with development teams to troubleshoot platform issues, to include deployment, routing, and configuration challenges. Build and maintain automated deployment pipelines that support engineering, development and data teams. Write, test, and deploy solutions that reduce unneeded human intervention and improve quality. Lead the delivery of new platform features, services, and capabilities. Prioritize, deliver, and operate new platform capabilities products and services. Develop and maintain accurate and up-to-date documentation, including but not limited to operational procedures, deployment plans, incident response plans. Participate in on-call rotation. Assist with other job duties as assigned. PRINCIPAL JOB REQUIREMENTS Bachelor's degree in computer science or related field, or equivalent experience. Minimum of 5-7 years of experience in a Site Reliability Engineering and/or Platform Engineering role, with progressively increasing scope of responsibility. Extensive hands-on experience and knowledge of the following technologies: Red Hat OpenShift, inclusive of operators, routing/ingress, and cluster management Azure cloud services and solutions Messaging platforms like AMQ, Kafka, Reddis HashiCorp Vault Scripting languages like Bash, Python, Go, PowerShell Observability tools like Datadog, Grafana, Prometheus Strong scripting and automation skills in Bash, Python. Strong prior experience with observability tools and connecting trends, incidents and alerts with actions. Prior experience troubleshooting complex production issues using logs, metrics, traces, packet captures, and Kubernetes debugging tools. Prior experience working in a heavily audited environment is preferred, with focus on mitigating risks and ensuring compliance with policies and procedures. Knowledge of enterprise-level technologies and concepts. Ability to multi-task in a dynamic environment while continuing to progress on longer term projects. Ability to communicate well, both orally and in writing, including producing thorough documentation of all work. Ability to conduct independent technical research and share results with management and/or peers. Ability to listen and integrate ideas from different views, build and maintain respectful relationships, collaborate with others, and resolve conflicts constructively. Proof of eligibility to work in the United States.

Keywords: Cirrus Group Consulting, Rockville , Site Reliability Engineer, IT / Software / Systems , Reston, Maryland

Didn't find what you're looking for? Search again!

Let Reston recruiters find you. Post your resume for free!

Get Reston IT / Software / Systems jobs via email.

View more Rockville IT / Software / Systems jobs

Other IT / Software / Systems Jobs

Salary in Rockville, Maryland Area | More details for Rockville, Maryland Jobs |Salary

AI Engineer - Entry to Expert Level (Maryland)
Description: Fort George G. Meade Complex, MD Pay Plan: GG, Grade: 07/1 to 15/10 Open: 2026-02-09, Close: 2026-02-13 Responsibilities At NSA, AI Engineering is a specialized discipline that intersects data science, (more...)
Company: National Security Agency
Location: Nottingham
Posted on: 02/15/2026

AI Engineer - Entry to Expert Level (Maryland)
Description: Fort George G. Meade Complex, MD Pay Plan: GG, Grade: 07/1 to 15/10 Open: 2026-02-09, Close: 2026-02-13 Responsibilities At NSA, AI Engineering is a specialized discipline that intersects data science, (more...)
Company: National Security Agency
Location: Parkville
Posted on: 02/15/2026

AI Engineer - Entry to Expert Level (Maryland)
Description: Fort George G. Meade Complex, MD Pay Plan: GG, Grade: 07/1 to 15/10 Open: 2026-02-09, Close: 2026-02-13 Responsibilities At NSA, AI Engineering is a specialized discipline that intersects data science, (more...)
Company: National Security Agency
Location: Chase
Posted on: 02/15/2026

AI Engineer - Entry to Expert Level (Maryland)
Description: Fort George G. Meade Complex, MD Pay Plan: GG, Grade: 07/1 to 15/10 Open: 2026-02-09, Close: 2026-02-13 Responsibilities At NSA, AI Engineering is a specialized discipline that intersects data science, (more...)
Company: National Security Agency
Location: Perry Hall
Posted on: 02/15/2026

AI Engineer - Entry to Expert Level (Maryland)
Description: Fort George G. Meade Complex, MD Pay Plan: GG, Grade: 07/1 to 15/10 Open: 2026-02-09, Close: 2026-02-13 Responsibilities At NSA, AI Engineering is a specialized discipline that intersects data science, (more...)
Company: National Security Agency
Location: Brooklandville
Posted on: 02/15/2026

AI Engineer - Entry to Expert Level (Maryland)
Description: Fort George G. Meade Complex, MD Pay Plan: GG, Grade: 07/1 to 15/10 Open: 2026-02-09, Close: 2026-02-13 Responsibilities At NSA, AI Engineering is a specialized discipline that intersects data science, (more...)
Company: National Security Agency
Location: Middle River
Posted on: 02/15/2026

AI Engineer - Entry to Expert Level (Maryland)
Description: Fort George G. Meade Complex, MD Pay Plan: GG, Grade: 07/1 to 15/10 Open: 2026-02-09, Close: 2026-02-13 Responsibilities At NSA, AI Engineering is a specialized discipline that intersects data science, (more...)
Company: National Security Agency
Location: Lutherville Timonium
Posted on: 02/15/2026

Loading more jobs...

Site Reliability Engineer

Didn't find what you're looking for? Search again!

Other IT / Software / Systems Jobs

Log In or Create An Account