← Back to Writing

Multi-Agent DevOps Automation with CrewAI

The Problem

DevOps teams review infrastructure configs, scan for security issues, write documentation, and triage incidents as separate workflows. Each one requires context-switching and manual effort. I wanted to build a system where specialized AI agents handle these tasks collaboratively, with outputs feeding into each other automatically.

What It Does

The AI Agent Platform is a multi-agent system built with CrewAI where four specialized agents work together: an infrastructure reviewer, a security scanner, a documentation generator, and an incident responder. You point it at your codebase, and it produces a full analysis pipeline where each agent's findings inform the next.

Architecture

The system follows a task dependency DAG. The infrastructure review runs first, then security scanning uses those findings for deeper analysis. Documentation and incident response run in parallel after security completes.

  • Infra Review Agent (Senior Infrastructure Engineer): Analyzes Terraform configs and Kubernetes manifests using dedicated analysis tools
  • Security Agent (AppSec Engineer): Scans code for OWASP Top 10 vulnerabilities, reads source files, and cross-references infrastructure findings
  • Docs Agent (Technical Writer): Generates documentation from the combined analysis, writes structured output files
  • Incident Agent (SRE): Triages critical findings, analyzes logs, and produces incident response recommendations

Task Flow

Tasks execute in dependency order: infra_review → security_scan → (incident_response + docs_generation). Each agent has access to specific tools (Terraform analyzer, K8s config reader, code scanner, file writer, log analyzer) and receives the output from upstream tasks as context.

Key Design Decisions

Why CrewAI Over LangGraph?

CrewAI provides a cleaner abstraction for role-based agents. Each agent gets a role, goal, backstory, and tool set. The crew orchestrator handles task ordering and output passing. LangGraph is more flexible for complex state machines, but CrewAI maps directly to how engineering teams actually divide work: specialized roles with clear responsibilities.

Why Python-Native Over Low-Code?

I already have n8n experience for drag-and-drop automation. This project proves I can build programmatic agent systems: version-controlled, testable, and composable. Every agent, tool, and crew has unit tests. That's not typical for AI agent demos.

Why AWS Bedrock Over OpenAI?

Enterprise teams run on AWS. Using Bedrock with Claude means the entire system stays within the AWS ecosystem: IAM for auth, CloudWatch for logging, VPC endpoints for private connectivity. No API keys to rotate, no external network calls to secure.

Agent Design

Each agent is defined with a distinct persona that shapes how it approaches problems:

AgentRoleTools
Infra ReviewSenior Infrastructure EngineerTerraform Analysis, K8s Config Reader
SecurityApplication Security EngineerCode Security Scanner, File Reader
DocumentationTechnical Documentation SpecialistFile Writer
IncidentSRE / Incident ResponseLog Analyzer

Testing Strategy

AI agent projects are notoriously hard to test. I took a layered approach:

  • Unit tests for tools: Each tool (Terraform analyzer, code scanner) has isolated tests with sample inputs
  • Agent instantiation tests: Verify each agent loads with correct role, tools, and configuration
  • Crew integration tests: Validate task dependency ordering and output passing between agents
  • Sample configs with planted issues: Terraform files with hardcoded secrets and Python files with SQL injection, ensuring agents detect known vulnerabilities

Sample Output

Given a Terraform file with hardcoded AWS keys and a Python file using eval(), the system produces:

  • CRITICAL: Hardcoded AWS access keys detected in sample.tf
  • HIGH: SQL injection vulnerability in user input handling
  • MEDIUM: Use of eval() on untrusted input
  • LOW: Missing encryption configuration on S3 bucket

The incident agent then triages the critical findings and generates a response plan, while the docs agent produces a full security assessment document.

What I'd Do Differently

  • Streaming output: Currently waits for the full crew execution. Should stream each agent's progress in real-time
  • Persistent memory: Add RAG-backed memory so agents learn from previous analyses across runs
  • GitHub integration: Trigger the crew on PR creation, post agent findings as review comments
  • Custom tools: Build deeper integrations with Terraform plan output parsing and kubectl describe analysis

Takeaway

Multi-agent systems are more than a demo concept. When each agent has clear responsibilities, dedicated tools, and tested behavior, you get a system that mirrors how real engineering teams operate. The value isn't in any single agent. It's in the orchestration, the task dependency flow, and the compound output that no single agent could produce alone.