Overview
Salary: $60.19-66.88 Hourly up to $66.88/hr
Engaging with the implementation and engineering teams Typical Day in the Role
* Purpose of the Team:
Drive strategic program leadership for unified monitoring strategy, ensuring governance, optimization, and vendor/stakeholder engagement. * Key Projects:
o Splunk deployment and integration
o Azure monitoring implementation
o Incident management and documentation/training initiatives * Typical Task Breakdown & Operating Rhythm:
o Lead technical program management within enterprise infrastructure environments
o Collaborate with cross-functional teams for governance and optimization
o Execute monitoring solutions and deliver quick wins (e.g., Splunk deployment)
o Engage in stakeholder communication and training sessions Strategic Program Leadership
* Define and execute a multi-phase Splunk deployment strategy aligned with organizational goals.
* Drive program governance, OKRs, and risk management for global observability initiatives.
Unified Monitoring Strategy
* Partner with Infrastructure - Linux, Scheduler, Storage, ETX and Cloud teams to establish a cohesive monitoring framework for compute, storage, and network layers. In addition, collaborate with other stakeholders to provide visibility into the environment.
* Align observability metrics with SLOs, SLIs, and incident response objectives.
Splunk Deployment & Integration
* Lead the deployment, configuration and integration of Splunk with existing systems, ensuring scalability and compliance.
Cross-Functional Enablement
* Collaborate with engineering and operations to onboard data sources, standardize alerting, and deliver actionable dashboards.
* Champion best practices for proactive monitoring and automated remediation.
Governance & Optimization
* Establish KPIs, retention policies, and compliance standards for observability data.
* Continuously optimize ingestion, indexing, and search performance for cost efficiency.
Vendor & Stakeholder Engagement
* Manage relationships with Splunk and third-party vendors for licensing, support, and roadmap alignment.
* Communicate program progress, risks, and outcomes to executive stakeholders.
Incident Management Support
* Enable observability platforms to accelerate root cause analysis and reduce MTTR through predictive analytics and automation.
Documentation & Training
* Deliver comprehensive documentation and enablement programs for operational teams. Proven success deploying and managing Splunk (Enterprise or ITSI), Azure Monitoring at scale, Experience with High Performance Environment/ Infrastructure Qualifications
* 5+ years in technical program management within enterprise infrastructure environments.
* Proven success deploying and managing Splunk (Enterprise or ITSI), Azure Monitoring at scale.
* Strong knowledge of HPC environment, infrastructure components and hybrid cloud architecture.
* Familiarity with observability tools (Prometheus, Grafana, Datadog, Dynatrace).
* Exceptional communication and stakeholder management skills.
* Knowledge of SRE principles and incident management practices.
* Certifications: PMP, Certified Scrum Master (CSM), Splunk Certified Admin/Architect. Preferred
* Experience with automation/orchestration (Terraform, Ansible, CI/CD).
* Background in Azure, AWS or GCP integrations.
|