DevOps Engineer
G
Groundup.ai
2 - 2.4K SGD
Full-time
Remote
KubernetesCloud Services (AWSAzureGCP)Infrastructure as Code (TerraformAnsible)JenkinsCI/CD
- Cloud Infrastructure & Automation
- Architect and manage scalable, secure infrastructure on GCP, Azure, and occasionally OCI/AWS.
- Implement and manage Infrastructure as Code (IaC) primarily using Terraform and occasionally with Terragrunt, and Helm.
- CI/CD Pipelines
- Design and optimize CI/CD workflows using GitHub Actions, Jenkins, and GitHub Enterprise (reusable workflows, OIDC federation).
- Ensure seamless deployment pipelines from code commit to production for microservices and AI workloads.
- Container Orchestration
- Manage Docker containers using tools such as Portainer, Docker Image repositories, Kubernetes clusters, including GPU node infrastructure for AI workloads.
- Support canary releases, blue-green deployments, and auto-scaling strategies.
- Implement and manage serverless deployments on Google Cloud Platform (Cloud Functions, Cloud Run).
- Resource Planning & Hardware Estimation
- Assist in hardware estimation for both on-premise and cloud environments, based on resource requirements such as the number of sensors and storage needs.
- Ensure robust backup strategies and data redundancy for all infrastructure components.
- Assist the team in auditing the on-cloud and on-premises resources.
- Security & Compliance
- Enforce cloud security best practices: image hardening, secret management, IAM least privilege, SBOMs, and vulnerability scanning.
- Collaborate on compliance requirements (SOC 2, ISO 27001), and respond to audits and incidents proactively.
- Configure and manage Cloudflare for enhanced security and performance.
- Monitoring & Observability
- Build and maintain observability stacks using Grafana, Prometheus, Loki, Tempo, Datadog, OpenTelemetry, and Sentry.
- Diagnose and resolve performance bottlenecks across compute, storage, and networking layers.
- Monitor and optimize cloud spending to ensure cost-efficiency.
- Develop and implement disaster recovery plans, conducting regular drills to ensure business continuity.
- Team Collaboration
- Partner with engineers to embed DevOps best practices.
- Establish and enforce documentation standards for infrastructure, processes, and troubleshooting guides.
- Use Plane for sprint planning, incident tracking, and delivery visibility.
- 5+ years of experience in Cloud/DevOps Engineering, preferably in production environments.
- Hands-on experience with GCP, Azure, and ideally exposure to AWS or OCI.
- Strong expertise in Terraform, Terragrunt, Helm, Kubernetes, and Docker.
- Proficient in scripting (e.g., Python, Bash, or PowerShell); Go/Rust is a plus.
- Experienced in CI/CD pipelines, especially using GitHub Actions.
- Strong understanding of:
-VPCs, routing, VPNs, firewalls, load balancers
-Kubernetes autoscaling and GPU/CPU resource management
-Monitoring, alerting, and log management with Datadog, Grafana OSS, and OpenTelemetry
- Familiar with DevSecOps practices and compliance controls.
- Strong ownership mindset and ability to thrive in a distributed and fast-paced environment.