LongviewRecruiter Since 2001
the smart solution for Longview jobs

Site Reliability Engineer Lead - Contractor

Company: System One
Location: Pittsburg
Posted on: April 3, 2025

Job Description:

Position Tittle: SRC Lead (Site Reliability Center Lead)

Scroll down for a complete overview of what this job will require Are you the right candidate for this opportunity
Location: Pittsburgh, PA or Cleveland OH or Birmingham AL or Dallas, TX or Phoenix, AZ
Hybrid - 2-3 days in office
Years of Experience: 12+ years applicable experience required

***For immediate consideration, you can reach me out at 412.516.1987/ shafique.mohammed@systemone.com***

Team Dynamic: SRC Lead will be overseeing a team of global contractors (L1.5 Engineers).


Role Overview:
As an SRC Lead, you'll be at the forefront of ensuring the reliability, availability, and performance of critical enterprise technology and security applications. Your leadership will drive operational excellence, foster collaboration, and elevate the overall reliability of our systems within the Site Reliability Center (SRC). You'll work closely with cross-functional teams, mentor engineers, and contribute to the success of the organization.

NOTE FOR THE SKILLS/TECHNOLOGIES
Be knowledgeable enough to jump in, drive the conversations to resolution, and escalate if needed to BANK Application System Managers/SMEs (ex. Here is the problem, here is what we think it is, here is the solutions we think we should do, what do you want to do?).

Top Technologies:
--- Monitoring and Debugging Tools (LogScale, Splunk, Dynatrace)
--- DevOps pipeline (Git, Jenkins, Artifactory)
--- Infrastructure (Red Hat Linux, Openshift, Windows)
--- Networking (DNS, Load-balancing, Network tracing, Firewall)
--- Database (Oracle, SQL)
--- API understanding & Web services technologies: (SOAP, JSON, REST)
--- Directories (LDAP, Active Directory)
--- Java

Secondary:
--- Python/Java Scripting, Ansible, Powershell for Automation purposes
--- Modern development technologies and tools: (Agile, CI/CD, Git, Jenkins)
--- Kafka Event Streaming
--- ETL/Informatica

Nice to Have:
--- Database (Mongo, Cassandra, other databases)
--- Evolve

Responsibilities Summary:
Production Support. NOT new development. Troubleshoot highly technical problems which may require assessing source code to analyze and resolve problems. This requires advanced troubleshooting skills and must be able to adapt and create non-standard approaches to problem solving.

*There are 185 applications and platforms combined in this space. It is acknowledged that expertise is not expected in all, but emphasis will be needed to develop SME for the Criticality Level 0/1 mnemonics, which are reflected in the top skills.

We are looking for someone who is astute enough to see a problem and fix it or escalate it to BANK SME teams and learn from how they fix the problem. Runbooks should then be updated accordingly.

Key responsibilities:
--- Create and Maintain documentation to ensure knowledge accessibility.
--- Liaise with other application support teams and internal/external business and technical partners.
--- Provide ad hoc and on-demand reports.
--- Perform timely escalation of critical issues and proactively identify patterns of recurring issues to improve production.
--- Lead problem resolution and conduct root cause analysis and establish processes that will help incident prevention.
--- Participate in the Incident and Problem Management processes as a resolver accountable for root cause analysis, resolution and reporting.
--- Guidance to all staff involved and vendors in driving a coordinated approach for results.
--- Reduce escalations to Level 3 based on incremental learning about applications.

1. Technical Acumen and System Familiarity:
While the majority of the role involves management, the SRC Lead should possess a solid understanding of the systems and technical stacks they are supporting. They should be able to pull up dashboards, troubleshoot issues, and guide conversations related to system health. Additionally, they must effectively manage impact and risk.
2. System Monitoring and Health:
Lead the production environment by monitoring availability and taking a holistic view of system health.
3. Quality and Time-to-Market:
Drive improvements in reliability, quality, and time-to-market for software solutions.
4. Performance Optimization:
Continuously optimize system performance, anticipating customer needs and innovating for excellence.
5. Operational Leadership:
Provide primary operational support for large-scale distributed software applications.
6. Mentorship:
Mentor and guide engineers within your shift team, fostering growth and technical expertise.
7. Stakeholder Communication:
Manage team operations while effectively communicating with directors and other executives/CIOs who have a stake.

Qualifications:
--- Proactive Approach:
Take a proactive approach to identifying problems, performance bottlenecks, and areas for improvement.
--- Leadership Experience:
Demonstrated leadership in technical roles, preferably within Site Reliability Engineering (SRE) or DevOps.
--- Continuous Improvement:
Foster a culture of continuous improvement and technical excellence, proactively identifying patterns of recurring issues to enhance stability and improved processes (automation opportunities, etc).

Thanks,

Shafique Mohammed
Recruiting Manager

210 Sixth Avenue, Suite 3100 - Pittsburgh, PA 15222
412.516.1987 (o)
systemone.com - LinkedIn

#LI-SM1
#M1








Ref: #404-IT Pittsburgh

Keywords: System One, Longview , Site Reliability Engineer Lead - Contractor, Professions , Pittsburg, Texas

Click here to apply!

Didn't find what you're looking for? Search again!

I'm looking for
in category
within


Log In or Create An Account

Get the latest Texas jobs by following @recnetTX on Twitter!

Longview RSS job feeds