Kurz AI Agents for Ops – Automated Incident Response (AIOPS)

AI Agents for Ops – Automated Incident Response (AIOPS)

AI - Artificial Intelligence, AI for Professions

In this practical hands-on course (30% theory, 70% practice) you will learn to apply multi-agent AI systems for automated incident investigation and automated resolution in IT operations. Sessions run in a prepared Linux lab, with emphasis on security and cost control.

This course explains agent architecture and the observe/decide/act loop, plus the subagent pattern to reduce blast radius and costs. You will run agent teams, design MCP safety boundaries, execute layered simulated incidents, plus production patterns for logging and escalation.

Location, current course term

Praha + online (volitelně)

10/15/2026 - 10/15/2026 CZECH
Order

2/4/2027 - 2/4/2027 CZECH
Order

Custom

Customized Training (date, location, content, duration)

The course:

Hide detail

Module 1 — Agent architecture for Ops

Intro: what LLMs do, what chat can and cannot do
What is an agent: observe / decide / act loop; difference from chat (tools + autonomy)
When NOT to use an agent: scriptable tasks, deterministic processes
Subagent pattern: why not one big agent — blast radius, costs, focused context
Three roles: coordinator (strong model), syscheck + logcheck (cheaper models)
Hands-on: build a coordinator and two specialist agents; run on a simulated alert

Module 2 — MCP servers as a safety boundary

Problem from Module 1: agents with full shell access are unacceptable in production
What is an MCP: networked tools, not shell access; like sudo rules — allowed commands only
Two tool types: read-only (investigation) vs write (action)
MCP anatomy: entry point, tool definitions, input/output schema
Hands-on — Method 1: plug an existing syscheck-MCP into a syscheck profile
Hands-on — Method 2: build a logcheck-MCP using a dev-squad (developer + tester + security subagents)

Module 3 — Automated investigation

Investigation loop: trigger → investigate → report → (optionally) act
When to stop, when to act, when to escalate
Triggers: polling as a simple start (why start simple, not webhooks)
Live demo: full chain on a prepared incident
Hands-on: connect a trigger, run the full team on a layered real scenario (Disk full → MySQL lock → HTTP 500)
Scenario design: layered checks require coordinator to correlate syscheck + logcheck outputs

Module 4 — Production patterns and cost control

Model tiering: coordinator vs workers; real cost numbers for pipeline runs
Failures and mitigations: infinite loops, token burn, hallucinated actions, cascading delegation, stale context
Production checklist: read-only default, human-in-the-loop for write, cost budget, logging, alerting-on-alerting, graceful degradation
When NOT to use agents: deterministic tasks, compliance-critical actions, tasks without an audit trail

Assumed knowledge:: Basic Linux server skills (SSH, shell, reading logs) and prior LLM chat experience.
Recommended previous course:: Linux – Basic Administration (LNX1)
Schedule:: 1 day (9:00 AM - 5:00 PM )
Course price:: 316.00 € ( 382.36 € incl. 21% VAT)
Language:

Quick links

Course Categories

AI Agents for Ops – Automated Incident Response (AIOPS)

AI - Artificial Intelligence, AI for Professions

The course: