Course Outline

Introduction to Scaling Ollama

  • Ollama’s architecture and scaling considerations
  • Common bottlenecks in multi-user deployments
  • Best practices for infrastructure readiness

Resource Allocation and GPU Optimization

  • Efficient CPU/GPU utilization strategies
  • Memory and bandwidth considerations
  • Container-level resource constraints

Deployment with Containers and Kubernetes

  • Containerizing Ollama with Docker
  • Running Ollama in Kubernetes clusters
  • Load balancing and service discovery

Autoscaling and Batching

  • Designing autoscaling policies for Ollama
  • Batch inference techniques for throughput optimization
  • Latency vs. throughput trade-offs

Latency Optimization

  • Profiling inference performance
  • Caching strategies and model warm-up
  • Reducing I/O and communication overhead

Monitoring and Observability

  • Integrating Prometheus for metrics
  • Building dashboards with Grafana
  • Alerting and incident response for Ollama infrastructure

Cost Management and Scaling Strategies

  • Cost-aware GPU allocation
  • Cloud vs. on-prem deployment considerations
  • Strategies for sustainable scaling

Summary and Next Steps

Requirements

  • Experience with Linux system administration
  • Understanding of containerization and orchestration
  • Familiarity with machine learning model deployment

Audience

  • DevOps engineers
  • ML infrastructure teams
  • Site reliability engineers
 21 Hours

Custom Corporate Training

Training solutions designed exclusively for businesses.

  • Customized Content: We adapt the syllabus and practical exercises to the real goals and needs of your project.
  • Flexible Schedule: Dates and times adapted to your team's agenda.
  • Format: Online (live), In-company (at your offices), or Hybrid.
Investment

Price per private group, online live training, starting from 4800 € + VAT*

Contact us for an exact quote and to hear our latest promotions

Upcoming Courses

Related Categories