Multi-Agent DevOps Automation with CrewAI

Apr 28, 2026 · 10 min read

The Problem

DevOps teams review infrastructure configs, scan for security issues, write documentation, and triage incidents as separate workflows. Each one requires context-switching and manual effort. I wanted to build a system where specialized AI agents handle these tasks collaboratively, with outputs feeding into each other automatically.

What It Does

The AI Agent Platform is a multi-agent system built with CrewAI where four specialized agents work together: an infrastructure reviewer, a security scanner, a documentation generator, and an incident responder. You point it at your codebase, and it produces a full analysis pipeline where each agent's findings inform the next.

Architecture

The system follows a task dependency DAG. The infrastructure review runs first, then security scanning uses those findings for deeper analysis. Documentation and incident response run in parallel after security completes.

Infra Review Agent (Senior Infrastructure Engineer): Analyzes Terraform configs and Kubernetes manifests using dedicated analysis tools
Security Agent (AppSec Engineer): Scans code for OWASP Top 10 vulnerabilities, reads source files, and cross-references infrastructure findings
Docs Agent (Technical Writer): Generates documentation from the combined analysis, writes structured output files
Incident Agent (SRE): Triages critical findings, analyzes logs, and produces incident response recommendations

Task Flow

Tasks execute in dependency order: infra_review → security_scan → (incident_response + docs_generation). Each agent has access to specific tools (Terraform analyzer, K8s config reader, code scanner, file writer, log analyzer) and receives the output from upstream tasks as context.

Key Design Decisions

Why CrewAI Over LangGraph?

CrewAI provides a cleaner abstraction for role-based agents. Each agent gets a role, goal, backstory, and tool set. The crew orchestrator handles task ordering and output passing. LangGraph is more flexible for complex state machines, but CrewAI maps directly to how engineering teams actually divide work: specialized roles with clear responsibilities.

Why Python-Native Over Low-Code?

I already have n8n experience for drag-and-drop automation. This project proves I can build programmatic agent systems: version-controlled, testable, and composable. Every agent, tool, and crew has unit tests. That's not typical for AI agent demos.

Why AWS Bedrock Over OpenAI?

Enterprise teams run on AWS. Using Bedrock with Claude means the entire system stays within the AWS ecosystem: IAM for auth, CloudWatch for logging, VPC endpoints for private connectivity. No API keys to rotate, no external network calls to secure.

Agent Design

Each agent is defined with a distinct persona that shapes how it approaches problems:

Agent	Role	Tools
Infra Review	Senior Infrastructure Engineer	Terraform Analysis, K8s Config Reader
Security	Application Security Engineer	Code Security Scanner, File Reader
Documentation	Technical Documentation Specialist	File Writer
Incident	SRE / Incident Response	Log Analyzer

Testing Strategy

AI agent projects are notoriously hard to test. I took a layered approach:

Unit tests for tools: Each tool (Terraform analyzer, code scanner) has isolated tests with sample inputs
Agent instantiation tests: Verify each agent loads with correct role, tools, and configuration
Crew integration tests: Validate task dependency ordering and output passing between agents
Sample configs with planted issues: Terraform files with hardcoded secrets and Python files with SQL injection, ensuring agents detect known vulnerabilities

Sample Output

Given a Terraform file with hardcoded AWS keys and a Python file using eval(), the system produces:

CRITICAL: Hardcoded AWS access keys detected in sample.tf
HIGH: SQL injection vulnerability in user input handling
MEDIUM: Use of eval() on untrusted input
LOW: Missing encryption configuration on S3 bucket

The incident agent then triages the critical findings and generates a response plan, while the docs agent produces a full security assessment document.

What I'd Do Differently

Streaming output: Currently waits for the full crew execution. Should stream each agent's progress in real-time
Persistent memory: Add RAG-backed memory so agents learn from previous analyses across runs
GitHub integration: Trigger the crew on PR creation, post agent findings as review comments
Custom tools: Build deeper integrations with Terraform plan output parsing and kubectl describe analysis

Takeaway

Multi-agent systems are more than a demo concept. When each agent has clear responsibilities, dedicated tools, and tested behavior, you get a system that mirrors how real engineering teams operate. The value isn't in any single agent. It's in the orchestration, the task dependency flow, and the compound output that no single agent could produce alone.