Site Reliability Engineer

Türkiye Radyo Televizyon Kurumu / İstanbul
Publish Date: 16.6.2022

As TRT Digital Products Department we’re looking for a candidate who understands and shares our goals to create a unique user/audience experience through each platform. The software engineering team is looking to hire talented Site Reliability Engineers to build the core digital products for TRT.

Job Description

● Provides emergency response either by being on-call or by reacting to symptoms according to monitoring and escalation when needed

● Proposes ideas and solutions within the Infrastructure Department to reduce the workload through automation.

● Plan, design and execute solutions within Infrastructure Department to reach specific goals agreed within the team.

● Plan and execute configuration change operations both at the application and the infrastructure levels.

● Actively looks for opportunities to improve the availability and performance of the system by applying the learnings from monitoring and observation

● Improves documentation all around, either in application documentation, or in runbooks, explaining the why, not stopping with the what.

Key Responsibilities

● Troubleshoot, evaluate and resolve operational challenges contributing to defined SLOs.

● Define, improve, and engage in adapting architectural application bottlenecks as observed on Site.

● Work with other engineering stakeholders on resolving larger architectural bottlenecks and participate through TRT point of view.

● Work in close collaboration with software development teams to consult on scaling concerns.

● Contribute to the future roadmap of software development teams and establish strong operational readiness across teams.

● Scale systems through automation, improving change velocity and reliability.

● Leverage technical skills to partner with team members and be comfortable diving into a problem as needed.

● Work to enable other teams to scale through automation, knowledge-sharing, and self-service activities.

Technical Requirements

● General knowledge of 4 technical expertise areas, with deep knowledge in 1 area.

● Terraform or Ansible basic syntax and GitLab CI/CD configuration, pipelines, jobs.

● Cloud resources provisioning and configuration through CLI/API

● Advanced level of Kubernetes, CLI, service re-provisioning

● Provision and setup metric in Prometheus, Thanos, and Graphana, alerts and silences

● Provision and setup logs and queries for general questions

● Operating system (Linux) configuration, package management, startup and troubleshooting

● Block and object storage configuration

● Netwotking VPCs, proxies and CDNs

Preferred Qualifications

● Experienced in contributing to large digital organizations

● Experience in a media related industry (OTT, Media Streaming, VOD) is a plus

● At least 2 years experience in the field of Site Reliability

● A proven track record of success

Application deadline: 14.7.2022 00:00