Why we need you:
As a Senior Site Reliability Engineer, you will work as part of the team that manages and delivers monitoring and observability services across our production and pre-production systems.
Your responsibilities will include:
- System design, configuration, integration, deployment, and operations of Observability systems and tools. These systems include collection of metrics/logs/events from gaming services, applications (client, middleware, backend) and infrastructure (AWS, on-premise). Together these Observability systems and tools serve as a critical part of PokerStars operations services
- Design, deploy our Observability infrastructure and systems to the next level of availability and scale
- Ensure our Observability platform exceeds goals for availability, capacity, efficiency, scalability, and performance
- Develop metrics and log ingestion pipelines for high volumes of telemetry
- Creating build and deployment pipelines for monitoring tools
- Deployment of monitoring solutions into AWS, development and production environments
- Developing a set of alerts and metrics to keep your own services alive and performing well
- Collaborating with other SRE team members, working on improving efficiency and reliability of monitoring solutions
- Collaborate with our Application Development teams to define the standards/APIs that ensure our Applications are emitting the right telemetry (metrics, logs, traces, events)
- Collect, aggregate and visualize the collected metrics to provide visibility and standards for key indicators to understand the health of our most critical systems
- Develop software to analyse real time metrics feeds and produce actionable insight. Longer term moving towards machine learning to surface anomalies automatically
- Migrating Observability tools to Kubernetes
- Evaluating, choosing, and implementing the next generation of Observability tools
Who are we looking for:
As a Senior SRE Observability Engineer, you have extensive working experience building/ integrating/ administering systems that leverage open-source monitoring tools at scale (e.g., InfluxDB/TICK Stack, Prometheus), Elastic Stack (Elasticsearch, Logstash, Kibana, Beats) and Grafana. Some of your experience is focused on coding and scripting (mostly Python, Java and Bash). You have developed metrics and log ingestion pipelines for high volumes of telemetry. We are working with Atlassian products (Jira, Confluence, Bitbucket Server) so it’ll be good if you have used them too.
We try to follow the best methodologies and IT operations in an always-up, always-available service but you will be able to suggest any improvements. Our environment is Agile so it`ll be good if you have worked in such teams.
You are a quick learner who can adopt and devour a lot of information about our in-house framework and systems fast. In this position you will have to show your good soft skills and the ability to liaise with technical teams and product/business people. You can work under pressure whilst maintaining accuracy and attention to detail. As a team we are results oriented and rely on good communication to achieve success.
As the ideal candidate, you will have:
You have experience or exposure to the following technologies:
- B.Sc. in Computer Science or similar
- 4 years+ experience with Open-Source Monitoring & Observability tooling/integration
- Time Series Databases (TSDB) – InfluxDB/TICK Stack, Prometheus
- Elastic Stack (Elasticsearch, Logstash, Kibana, Beats)
- Full proficiency with Linux command line environment
- Strong scripting in Python and Bash
- Programming experience in Java, Golang is a big plus
- Expertise in Configuration and Deployment Automation using Salt and/or Ansible
- Monitoring protocols/frameworks – Prometheus/Influx line format, SNMP, JMX, Spring Boot Actuator
- Building software using Jenkins, JFRog, Artefactory
- Git and versioning software
- AWS Cloud services
- Containerisation experience (Kubernetes and Docker)
- Middleware (Tomcat, Kafka)
- Experience with Consul, Vault, Terraform is a plus
- Some familiarity with open Observability initiatives (e.g., Open Tracing, Open Census, Open Metrics)
What’s in it for you?
Our experience-based salaries are competitive, and we provide advice and dedicated assistance to those moving to Sofia.
Your package will include:
- Health and Dental Insurance for you, your partner and your children (if you all live at the same address)
- A personal interest allowance to let you learn something new or pursue a hobby
- A great yearly bonus based on performance
- A 1,000 BGN as congratulations if you have a baby whilst you work for us
- Personal e-learning courses and training supporting the development in your career
- 22 days annual leave
- A sports’ card membership valid across the country
- In-house yoga and gymnastic classes, as well as dances
- Discounts as a compliment form us among different services
- Free snacks, fruits and drinks in the office
What happens next?
If you’re what we’re looking for, next up will be a phone interview. And if that goes well, we’ll meet you for a zoom or face-to-face interview.
PokerStars is part of Flutter Entertainment Plc, a global sports betting, gaming and entertainment provider headquartered in Dublin and part of the FTSE 100 index of the London Stock Exchange. Flutter brings together exceptional brands, products and businesses and a diverse global presence in a safe, responsible and ultimately sustainable way.
We are an equal opportunity employer that values diversity. We do not discriminate on any protected characteristic as defined by applicable law.
We will look to provide reasonable accommodation for applicants with disabilities to participate in the job application or interview process. If you need assistance, please contact: email@example.com
Please note we cannot accept general applications; this inbox is just for providing support to those who need it.