Lead and actively participate in incident response activities, including identifying, analyzing, resolving, and preventing software system issues.
Create and implement incident response plans, playbooks, and standard operating procedures to facilitate effective incident handling and minimize system downtime.
Foster a culture of observability, emphasizing the importance of monitoring, logging, and metrics to maintain system performance and stability.
Collaborate with software development teams to design and deploy automated monitoring, alerting, and reporting systems that proactively identify and address potential issues.
Work closely with security teams to integrate security practices into the development process, conduct security assessments, and implement appropriate safeguards.
Provide technical expertise and guidance in development operations practices, such as CI/CD pipelines, version control, configuration management, and deployment strategies.
Spearhead the implementation and maintenance of infrastructure-as-code (IaC) frameworks and tools to ensure consistent and scalable infrastructure provisioning.
Mentor and lead junior engineers, offering technical guidance and fostering a collaborative and innovative work environment.
Stay updated with industry trends, emerging technologies, and best practices related to incident response and development operations.
Experience Requirements
2 - 3 years
Educational Requirements
BSC In Engineering
Skills
Strong problem-solving skills and attention to detail.Ability to work collaboratively in a team environment.Flexible and willing to accept a change in priorities as necessaryExtensive experience in incident response management, including handling and mitigating security incidents, performing root cause analysis, and implementing preventive measures.