Job Title: Lead Infrastructure Admin AI Cloud Services (Azure \& AWS)
Location: California, USA (Remote Candidates must be in CA Area)
Duration: 6\+ Months Contract
Start Date: 05/15/2026
Job Summary
We are seeking an experienced Infrastructure Administrator / Cloud Engineer with strong expertise in supporting AI/ML cloud environments across both Azure and AWS platforms . The ideal candidate should have hands\-on experience managing scalable cloud infrastructure, AI model hosting environments, Kubernetes clusters, DevOps automation, and Infrastructure as Code for enterprise AI services. This role will focus heavily on LLM infrastructure setup, GPU workloads, cloud security, CI/CD enablement, and AI platform administration . Healthcare industry exposure is highly preferred.
Mandatory Experience
- Minimum 5\+ years of experience in Infrastructure Administration / Cloud Infrastructure Engineering
- Strong recent experience supporting AI services infrastructure on Azure \& AWS
- Experience working in centralized support / triaging teams
- Experience supporting production\-grade AI/ML platforms, model training, and inference workloads
- Microsoft Azure
- AWS
- Cloud Computing
- DevOps / CI\-CD
- Artificial Intelligence
- Infrastructure Support
- Azure Platform Services
- Cloud Services Administration
- Kubernetes Clusters
- Infrastructure as Code
- LLM Infrastructure Setup
- Bachelor's Degree in Computer Science / Information Systems / Related Field
- 5\+ years in Cloud Infrastructure / Infrastructure Administration
- Strong Linux Administration
- Scripting experience in: Python Bash PowerShell
- Strong experience with Terraform / Terragrunt
- Experience with Docker \& Kubernetes
- Experience with GitHub Actions
- Hands\-on experience with LLM/GenAI Infrastructure setup
- Experience in incident triaging / centralized cloud operations
Must Have Technical Skills
Key Responsibilities
1\. Cloud Infrastructure Management
Design, deploy, and manage cloud infrastructure supporting AI/ML workloads on AWS and Azure . Manage services such as: EC2 Azure Virtual Machines GPU Instances EKS / AKS ECS VPC S3 Lambda Route 53 Kubernetes Clusters Configure networking, storage, compute, and security services for AI environments. Ensure high availability, reliability, scalability, and fault tolerance.
2\. AI / ML Platform Support
Deploy and maintain enterprise AI/ML services including: Amazon SageMaker Azure Machine Learning Azure AI Foundry Build and maintain AI model training and inference environments. Support Data Scientists, ML Engineers, and AI teams with optimized GPU/cloud infrastructure. Assist with LLM deployment environments and GenAI service administration.
3\. Automation / Infrastructure as Code
Implement Infrastructure as Code using: Terraform Terragrunt CloudFormation ARM Templates / Bicep Dockerfiles Automate provisioning, patching, configuration management, and environment scaling.
4\. Containerization \& Orchestration
Deploy and manage containerized AI workloads using: Docker Kubernetes Amazon EKS Azure Kubernetes Service (AKS) Amazon ECS Manage cluster administration, scaling, pod monitoring, and deployment troubleshooting.
5\. Monitoring \& Performance Optimization
Monitor AI infrastructure using: CloudWatch Azure Monitor Datadog Prometheus Optimize: GPU utilization Cloud cost AI workload performance Resource consumption
6\. Security \& Compliance
Implement: IAM / RBAC Network Security Groups Encryption Secrets Management Ensure enterprise compliance and secure cloud governance.
7\. DevOps \& CI/CD Integration
Integrate AI workloads with CI/CD pipelines. Support automated deployment of ML models and AI services. Work with: GitHub Actions Terraform pipelines Container deployment automation
Required Qualifications
For applications and inquiries, contact: hirings@openkyber.com