Lead, DevOps Support Engineering - Troy, MI

Posted: 05/11/2025

Apply Now

We are seeking a Lead DevOps Support Engineer to drive automation, system integration, troubleshooting, and monitoring improvements. As a Project Lead, you will play a key role in optimizing DevOps workflows, enhancing system observability, and ensuring seamless incident resolution.

This is a technical leadership role (not a managerial position), requiring hands-on expertise in automating support processes, integrating infrastructure components, improving monitoring dashboards, and handling tasks for L2 DevOps support engineers. #LI-Hybrid

Key Responsibilities:

Automation & Integration

  • Design and implement automated CI/CD pipelines tailored to embedded software workflows
  • Integrate build systems (e.g., Make, CMake, Bazel) into CI pipelines
  • Configure pipelines for cross-compilation targeting various hardware architectures (ARM, RISC-V, etc.)
  • Automate firmware packaging and secure signing
  • Set up deployment mechanisms for over-the-air (OTA) or USB/SD card updates
  • Validate firmware integrity post-deployment using checksums or digital signatures
  • Automate L2 support processes, incident resolution, and infrastructure management
  • Develop and maintain scripts and automation tools to enhance efficiency and reduce manual work
  • Ensure seamless integration between infrastructure, CI/CD pipelines, and monitoring solutions
  • Optimize deployment processes and automate recurring operational tasks

Troubleshooting & Support

  • Lead DevOps L2 incident response, diagnosing and resolving infrastructure and application issues
  • Perform root cause analysis and implement proactive fixes to prevent recurring incidents
  • Work closely with L1 and L3 teams to streamline support escalations and improve response times
  • Troubleshoot Kubernetes, cloud infrastructure, networking, and deployment failures

Monitoring & Dashboards

  • Design, configure, and optimize monitoring and logging dashboards (Prometheus, Grafana, ELK, etc.)
  • Improve alerting mechanisms to enhance observability and reduce noise
  • Ensure system performance metrics are effectively tracked and visualized for proactive incident management

Process Optimization & Escalation Management

  • Define and optimize support workflows for efficient issue resolution
  • Establish escalation routes to ensure timely handling of critical incidents
  • Evaluate risks associated with deployments and infrastructure changes, implementing mitigation strategies
  • Assist in QA validation of infrastructure changes and automation scripts

Required Skills & Experience:

  • 5+ years of experience in DevOps, SRE, or L2 technical support roles
  • Experience with automated CI/CD pipelines tailored to embedded software workflows
  • Experience creating and tracking tasks for L2 DevOps engineers to drive operational efficiency
  • Strong expertise in automating support processes and troubleshooting complex systems
  • Proficiency in scripting (Bash, Python, or similar) for automation and monitoring
  • Hands-on experience with monitoring and logging tools (Prometheus, Grafana, ELK, Datadog, etc.)
  • Solid understanding of CI/CD pipelines, infrastructure components, and cloud services (AWS, GCP, or Azure)
  • Experience with containerized environments (Docker, Kubernetes) and troubleshooting containerized applications
  • Strong analytical skills for root cause analysis, incident resolution, and risk assessment

MAGNA