Site Reliability Engineer Lead - Contractor
Company: System One
Location: Pittsburg
Posted on: April 3, 2025
|
|
Job Description:
Position Tittle: SRC Lead (Site Reliability Center Lead)
Scroll down for a complete overview of what this job will require
Are you the right candidate for this opportunity
Location: Pittsburgh, PA or Cleveland OH or Birmingham AL or
Dallas, TX or Phoenix, AZ
Hybrid - 2-3 days in office
Years of Experience: 12+ years applicable experience required
***For immediate consideration, you can reach me out at
412.516.1987/ shafique.mohammed@systemone.com***
Team Dynamic: SRC Lead will be overseeing a team of global
contractors (L1.5 Engineers).
Role Overview:
As an SRC Lead, you'll be at the forefront of ensuring the
reliability, availability, and performance of critical enterprise
technology and security applications. Your leadership will drive
operational excellence, foster collaboration, and elevate the
overall reliability of our systems within the Site Reliability
Center (SRC). You'll work closely with cross-functional teams,
mentor engineers, and contribute to the success of the
organization.
NOTE FOR THE SKILLS/TECHNOLOGIES
Be knowledgeable enough to jump in, drive the conversations to
resolution, and escalate if needed to BANK Application System
Managers/SMEs (ex. Here is the problem, here is what we think it
is, here is the solutions we think we should do, what do you want
to do?).
Top Technologies:
--- Monitoring and Debugging Tools (LogScale, Splunk,
Dynatrace)
--- DevOps pipeline (Git, Jenkins, Artifactory)
--- Infrastructure (Red Hat Linux, Openshift, Windows)
--- Networking (DNS, Load-balancing, Network tracing, Firewall)
--- Database (Oracle, SQL)
--- API understanding & Web services technologies: (SOAP, JSON,
REST)
--- Directories (LDAP, Active Directory)
--- Java
Secondary:
--- Python/Java Scripting, Ansible, Powershell for Automation
purposes
--- Modern development technologies and tools: (Agile, CI/CD, Git,
Jenkins)
--- Kafka Event Streaming
--- ETL/Informatica
Nice to Have:
--- Database (Mongo, Cassandra, other databases)
--- Evolve
Responsibilities Summary:
Production Support. NOT new development. Troubleshoot highly
technical problems which may require assessing source code to
analyze and resolve problems. This requires advanced
troubleshooting skills and must be able to adapt and create
non-standard approaches to problem solving.
*There are 185 applications and platforms combined in this space.
It is acknowledged that expertise is not expected in all, but
emphasis will be needed to develop SME for the Criticality Level
0/1 mnemonics, which are reflected in the top skills.
We are looking for someone who is astute enough to see a problem
and fix it or escalate it to BANK SME teams and learn from how they
fix the problem. Runbooks should then be updated accordingly.
Key responsibilities:
--- Create and Maintain documentation to ensure knowledge
accessibility.
--- Liaise with other application support teams and
internal/external business and technical partners.
--- Provide ad hoc and on-demand reports.
--- Perform timely escalation of critical issues and proactively
identify patterns of recurring issues to improve production.
--- Lead problem resolution and conduct root cause analysis and
establish processes that will help incident prevention.
--- Participate in the Incident and Problem Management processes as
a resolver accountable for root cause analysis, resolution and
reporting.
--- Guidance to all staff involved and vendors in driving a
coordinated approach for results.
--- Reduce escalations to Level 3 based on incremental learning
about applications.
1. Technical Acumen and System Familiarity:
While the majority of the role involves management, the SRC Lead
should possess a solid understanding of the systems and technical
stacks they are supporting. They should be able to pull up
dashboards, troubleshoot issues, and guide conversations related to
system health. Additionally, they must effectively manage impact
and risk.
2. System Monitoring and Health:
Lead the production environment by monitoring availability and
taking a holistic view of system health.
3. Quality and Time-to-Market:
Drive improvements in reliability, quality, and time-to-market for
software solutions.
4. Performance Optimization:
Continuously optimize system performance, anticipating customer
needs and innovating for excellence.
5. Operational Leadership:
Provide primary operational support for large-scale distributed
software applications.
6. Mentorship:
Mentor and guide engineers within your shift team, fostering growth
and technical expertise.
7. Stakeholder Communication:
Manage team operations while effectively communicating with
directors and other executives/CIOs who have a stake.
Qualifications:
--- Proactive Approach:
Take a proactive approach to identifying problems, performance
bottlenecks, and areas for improvement.
--- Leadership Experience:
Demonstrated leadership in technical roles, preferably within Site
Reliability Engineering (SRE) or DevOps.
--- Continuous Improvement:
Foster a culture of continuous improvement and technical
excellence, proactively identifying patterns of recurring issues to
enhance stability and improved processes (automation opportunities,
etc).
Thanks,
Shafique Mohammed
Recruiting Manager
210 Sixth Avenue, Suite 3100 - Pittsburgh, PA 15222
412.516.1987 (o)
systemone.com - LinkedIn
#LI-SM1
#M1
Ref: #404-IT Pittsburgh
Keywords: System One, Longview , Site Reliability Engineer Lead - Contractor, Professions , Pittsburg, Texas
Click
here to apply!
|