VeriiPro Verified
Information Technology & Services, Artificial Intelligence, Data Science
Site Reliability Engineer (SRE)
San Jose, California, United StatesOnsiteFull TimePosted 1 month ago
Compensation estimateAI
See base, equity, bonus, and total comp estimates for this role — free, no credit card.
Sign up to see compensation estimateJob Description
We are seeking a hands-on SRE Engineer with expertise in Apache Spark, Kubernetes, CI/CD pipelines, Python, PL/SQL, and containerization to design, develop, and maintain real-time data processing and infrastructure. The ideal candidate will combine development skills with operational expertise to ensure high availability, scalability, and reliability of applications and infrastructure.
*Roles and Responsibilities*
- Provide technical leadership and guidance to engineering teams, participating in key technical decisions.
- Design, develop, and maintain real-time data pipelines and Data Lakehouse infrastructure.
- Write Python scripts for automation and small functionality development.
- Write SQL queries and procedures for data processing.
- Build and maintain CI/CD pipelines using Jenkins and related tools.
- Manage applications in Kubernetes environments, including deployment, configuration, access control, and troubleshooting.
- Develop and maintain Kubernetes manifests, Helm charts, and deployment artifacts.
- Manage Docker containers, including image management in private registries.
- Maintain Apache Spark clusters and related workflows.
- Develop and maintain Ansible playbooks for infrastructure configuration and management.
- Implement Infrastructure as Code (IaC) and collaborate with teams to ensure consistent infrastructure management.
- Monitor services and infrastructure using observability tools for capacity management and performance tuning.
- Drive initiatives to containerize standalone applications and migrate them to Kubernetes.
- Mentor and guide other engineers in development, testing, and deployment practices.
*Technical Skills*
- Programming & Scripting: Python, PL/SQL
- Data & Processing: Apache Spark, SQL
- CI/CD & Automation: Jenkins, Ansible, CI/CD pipelines
- Containerization & Orchestration: Docker, Kubernetes (EKS/ECS experience a plus)
- Infrastructure & Deployment: Helm charts, IaC
- Monitoring & Observability: Dynatrace or similar monitoring/tracing tools
- Data Platforms: Dremio experience is a plus
Similar roles
Site Reliability Engineer (SRE)Mithril · Palo Alto, California, United States · Hybrid
Senior Site Reliability Engineer (SRE)hackajob · Atlanta, Georgia, United States · Remote- Senior Site Reliability Engineer (SRE)PrizePicks · Georgia, United States · Remote
- Site Reliability Engineer (SRE)Finsmart Solution Pte Ltd · New York, United States · Onsite
- Site Reliability Engineer (SRE)Samsung Electronics · British Columbia, Canada · Onsite