Course Outline

Foundations of Agentic Systems in Production

  • Agentic architectures: loops, tools, memory, and orchestration layers
  • Lifecycle of agents: development, deployment, and continuous operation
  • Challenges of production-scale agent management

Infrastructure and Deployment Models

  • Deploying agents in containerized and cloud environments
  • Scaling patterns: horizontal vs vertical scaling, concurrency, and throttling
  • Multi-agent orchestration and workload balancing

Monitoring and Observability

  • Key metrics: latency, success rate, memory usage, and agent call depth
  • Tracing agent activity and call graphs
  • Instrumenting observability using Prometheus, OpenTelemetry, and Grafana

Logging, Auditing, and Compliance

  • Centralized logging and structured event collection
  • Compliance and auditability in agentic workflows
  • Designing audit trails and replay mechanisms for debugging

Performance Tuning and Resource Optimization

  • Reducing inference overhead and optimizing agent orchestration cycles
  • Model caching and lightweight embeddings for faster retrieval
  • Load testing and stress scenarios for AI pipelines

Cost Control and Governance

  • Understanding agent cost drivers: API calls, memory, compute, and external integrations
  • Tracking agent-level costs and implementing chargeback models
  • Automation policies to prevent agent sprawl and idle resource consumption

CI/CD and Rollout Strategies for Agents

  • Integrating agent pipelines into CI/CD systems
  • Testing, versioning, and rollback strategies for iterative agent updates
  • Progressive rollouts and safe deployment mechanisms

Failure Recovery and Reliability Engineering

  • Designing for fault tolerance and graceful degradation
  • Retry, timeout, and circuit breaker patterns for agent reliability
  • Incident response and post-mortem frameworks for AI operations

Capstone Project

  • Build and deploy an agentic AI system with full monitoring and cost tracking
  • Simulate load, measure performance, and optimize resource usage
  • Present final architecture and monitoring dashboard to peers

Summary and Next Steps

Requirements

  • Strong understanding of MLOps and production machine learning systems
  • Experience with containerized deployments (Docker/Kubernetes)
  • Familiarity with cloud cost optimization and observability tools

Audience

  • MLOps engineers
  • Site Reliability Engineers (SREs)
  • Engineering managers overseeing AI infrastructure
 21 Hours

Custom Corporate Training

Training solutions designed exclusively for businesses.

  • Customized Content: We adapt the syllabus and practical exercises to the real goals and needs of your project.
  • Flexible Schedule: Dates and times adapted to your team's agenda.
  • Format: Online (live), In-company (at your offices), or Hybrid.
Investment

Price per private group, online live training, starting from 4800 € + VAT*

Contact us for an exact quote and to hear our latest promotions

Testimonials (3)

Upcoming Courses

Related Categories