A2A와 MCP로 보는 멀티 에이전트 개발 워크플로우

AI Agent · 2026-05-06 · 4분 읽기

Markdown약 1724 tokens

멀티 에이전트 개발은 에이전트를 많이 붙인다고 좋아지지 않습니다. 역할과 산출물 계약 없이 에이전트를 늘리면 책임만 흐려지고, 한 에이전트의 잘못된 판단이 다음 단계로 전파됩니다.

그래서 멀티 에이전트 설계의 출발점은 “몇 개의 에이전트를 둘 것인가”가 아니라 누가 무엇을 할 수 있고, 어떤 입력을 받고, 어떤 산출물을 남기는가입니다.

1편에서는 코딩 에이전트를 런타임으로 봐야 한다고 정리했습니다. 2편에서는 Goal Runtime을 다뤘습니다. 이번 3편에서는 여러 에이전트가 함께 일하는 구조를 다룹니다. 핵심 키워드는 A2A와 MCP입니다.

핵심 요약

MCP는 Agent-to-Tool 축에 가깝습니다.
A2A는 Agent-to-Agent 축에 가깝습니다.
멀티 에이전트 개발에서는 “누가 무엇을 할 수 있는가”를 설명하는 Agent Card가 필요합니다.
작업은 대화 메시지가 아니라 상태를 가진 Task로 관리해야 합니다.
결과물은 채팅 응답이 아니라 Artifact로 분리해야 합니다.
개발 하네스에서는 A2A 전체를 당장 구현하지 않더라도 Task/Artifact 모델을 차용할 수 있습니다.

MCP와 A2A는 해결하는 문제가 다르다

AI 에이전트 생태계에서 MCP와 A2A는 자주 함께 언급됩니다. 하지만 둘은 같은 문제를 해결하지 않습니다.

A2A 공식 문서의 MCP 비교는 MCP가 agent가 database나 API 같은 도구와 리소스를 사용하는 방식을 다루고, A2A는 agent-to-agent collaboration을 가능하게 하는 별도 축이라고 설명합니다.

짧게 정리하면 축이 다릅니다.

에이전트 연결 경계

MCP: 도구 호출

도구 / API / DB / 파일 시스템

A2A: 작업 위임

리뷰 / 테스트 / 문서화 에이전트

구분	MCP	A2A
연결 대상	도구, API, DB, 파일 시스템	다른 에이전트
핵심 질문	이 에이전트가 무엇을 사용할 수 있는가?	이 에이전트가 누구에게 일을 맡길 수 있는가?
주요 목적	tool/resource access	agent collaboration
개발 하네스 적용	테스트 실행, 파일 검색, DB 조회	코드 리뷰, 테스트 분석, 문서화 위임

Agent-to-Tool과 Agent-to-Agent

개발 하네스를 만들 때 이 차이를 놓치면 구조가 꼬입니다.

다음은 MCP에 가까운 작업입니다.

1파일 읽기2테스트 실행3Git diff 조회4DB schema 확인5API 문서 검색

반면 다음은 A2A에 가까운 작업입니다.

1보안 리뷰 에이전트에게 변경사항 검토 요청2테스트 에이전트에게 실패 테스트 분석 요청3문서화 에이전트에게 migration note 작성 요청4성능 분석 에이전트에게 병목 후보 정리 요청

MCP는 능력을 확장합니다. A2A는 책임을 분리합니다.

개발 하네스에서는 도구 호출과 작업 위임을 아래처럼 분리해 다뤄야 합니다.

Lead Coding Agent

git, test, file, search

Review Agent

review artifact

Test Agent

test artifact

Docs Agent

docs artifact

Agent Card는 에이전트의 자기소개서다

A2A Protocol v1.0 발표는 A2A를 AI agent 간 communication을 위한 stable, production-ready open standard라고 설명합니다. 이 프로토콜에서 중요한 개념 중 하나가 Agent Card입니다.

Agent Card는 에이전트가 어떤 능력을 가지는지, 어떤 방식으로 통신하는지, 어떤 입력과 출력을 지원하는지를 설명합니다.

개발 하네스 관점에서는 Agent Card를 이렇게 볼 수 있습니다.

1{2  "name": "Code Review Agent",3  "description": "보안, 유지보수성, 테스트 누락을 중심으로 PR을 검토한다.",4  "skills": [5    "review-diff",6    "detect-risky-auth-change",7    "suggest-test-cases"8  ],9  "inputModes": ["diff", "file-list", "test-result"],10  "outputModes": ["review-artifact"],11  "requiresApprovalFor": [12    "suggest-db-migration",13    "modify-auth-policy"14  ]15}

Agent Card가 없으면 Lead Agent는 다른 에이전트가 무엇을 잘하는지 알 수 없습니다. 사람 팀에서도 역할 정의가 필요하듯, 에이전트 팀에서도 역할 정의가 필요합니다.

Task는 상태를 가진 작업 단위다

A2A specification은 Task를 A2A server가 처리하는 stateful unit of work로 설명합니다. Task는 id, contextId, status, history, artifacts 같은 필드를 가질 수 있습니다.

개발 하네스에서 Task는 다음처럼 해석할 수 있습니다.

1Task:2  id: review-payment-webhook-change3  owner: security-review-agent4  input:5    - changed files6    - payment webhook diff7    - related tests8  status: input_required9  question:10    현재 webhook signature 검증 로직 변경이 포함되어 있습니다.11    기존 secret rotation 정책을 확인할 수 없으므로 구현을 중단하고 사용자 승인을 요청합니다.

여기서 중요한 상태는 input_required입니다. 에이전트가 위험한 결정을 직접 하지 않고 사람에게 되묻는 상태가 필요합니다.

1계속 진행해도 되는가?2mock으로 대체해도 되는가?3DB migration을 생성해도 되는가?4외부 API를 호출해도 되는가?

이 질문이 사라지면 자동화는 빨라지지만 위험해집니다.

Artifact는 검토 가능한 결과물이다

Artifact는 에이전트가 만든 결과물입니다. A2A v0.1 specification은 artifact를 task 중 생성된 tangible output으로 설명합니다.

코딩 에이전트에서 artifact는 특히 중요합니다. 채팅 메시지는 흘러갑니다. Artifact는 검토됩니다.

나쁜 결과	좋은 Artifact
“수정했습니다”	변경 파일 목록
“테스트 통과했습니다”	실행 명령과 로그 요약
“문제 없어 보입니다”	blocker, warning, suggestion 분류
“기억해둘게요”	source, scope, validated_at이 있는 memory candidate

실무에서는 마지막 응답보다 artifact contract가 더 중요합니다.

1Review Artifact:2  Summary:3    결제 webhook 변경사항을 검토했다.4 5  Blockers:6    - signature 검증 실패 시 fallback 경로가 없다.7 8  Warnings:9    - retry 정책이 기존 결제 API와 다르다.10 11  Verified:12    - unit test 12개 통과13    - webhook parser 변경 없음14 15  Not Verified:16    - 실제 PG sandbox 호출은 수행하지 않음

이런 artifact가 있어야 사람이 검토할 수 있습니다.

개발 워크플로우에 적용하기

멀티 에이전트 구조를 처음부터 복잡하게 만들 필요는 없습니다.

MVP 단계에서는 다음 정도면 충분합니다.

역할	설명
Lead Agent	목표 분해, 실행 순서 결정, 최종 요약
Implementation Agent	코드 수정
Test Agent	테스트 생성과 실패 분석
Review Agent	리스크와 누락 검토
Human Owner	승인, 배포, 최종 병합 판단

핵심은 모든 에이전트를 실제 별도 프로세스로 나누는 것이 아닙니다. 처음에는 한 에이전트 안에서 역할만 분리해도 됩니다.

1이번 단계에서는 Test Agent 역할로만 행동한다.2목표는 실패 테스트 원인 분석이다.3코드 수정은 하지 말고 test artifact만 남긴다.

이 방식은 단순하지만 효과가 큽니다. 에이전트가 한 번에 분석, 구현, 테스트, 리뷰를 모두 하려고 하면 산출물이 흐려집니다. 역할을 나누면 결과물이 선명해집니다.

멀티 에이전트의 리스크

멀티 에이전트는 강력하지만 리스크도 큽니다. MetaGPT 논문은 여러 LLM agent를 단순 연결하면 오류가 전파될 수 있으며, 이를 줄이기 위해 SOP와 modular output을 강제하는 접근을 제안합니다. ChatDev 논문도 소프트웨어 개발을 여러 communicative agents의 협업으로 모델링하지만, 이런 구조가 작동하려면 역할, 단계, 대화 방식, 검증이 필요합니다.

실무 리스크는 다음과 같습니다.

리스크	설명	대응
책임 불명확	누가 최종 판단을 하는지 모름	Lead Agent와 Human Owner 분리
오류 전파	한 에이전트의 잘못된 분석이 다음 단계로 전달	Artifact 검토 단계 추가
과도한 자동화	승인 없이 위험 작업 수행	Permission Gate
산출물 혼합	대화와 결과가 섞임	Artifact Contract
비용 증가	여러 에이전트 반복 호출	Task budget 설정

체크리스트

멀티 에이전트 구조를 설계할 때는 에이전트 수보다 아래 계약이 먼저입니다.

1[ ] MCP와 A2A 역할을 구분했다.2[ ] Lead Agent 책임을 정의했다.3[ ] 각 Agent의 입력과 출력을 정의했다.4[ ] Task 상태를 기록한다.5[ ] Artifact 형식을 정했다.6[ ] input_required 상태에서 사람에게 질문한다.7[ ] 최종 판단자는 Human Owner로 남겨뒀다.8[ ] 위험 작업에는 Permission Gate가 있다.

이번 편에서 가져갈 기준

MCP는 에이전트가 사용할 수 있는 도구를 늘리고, A2A는 다른 에이전트에게 맡길 수 있는 책임을 정의합니다. 둘을 섞으면 구조가 흐려집니다. 멀티 에이전트 MVP는 Agent Card, Task 상태, Artifact 형식부터 잡아야 합니다.

다음 편

4편에서는 memory로 넘어갑니다. 특히 AI Memory를 RAG와 구분하고, 실패 로그와 Run Ledger를 어떻게 다뤄야 하는지 정리합니다.

시리즈 이어 읽기

1편: 코딩 에이전트는 왜 런타임이 되는가
2편: Codex /goal로 보는 목표 기반 개발
3편: A2A와 MCP로 보는 멀티 에이전트 개발 워크플로우
4편: AI Memory는 RAG가 아니다
5편: 개발 하네스에 적용하는 AI 코딩 에이전트 문서 세트

참고자료

Multi-agent development does not improve with adding more agents. Adding more agents without contracting roles and deliverables only blurs responsibility and allows one agent's poor judgment to propagate to the next level.

So the starting point for multi-agent design is not “how many agents to have,” but who can do what, what input they receive, and what output they leave behind.

In Part 1, we summarized that coding agents should be viewed as runtime. In Part 2, we covered Goal Runtime. In this third part, we cover the structure of multiple agents working together. The key keywords are A2A and MCP.

Key takeaways

MCP is closer to the Agent-to-Tool axis.
A2A is closer to the Agent-to-Agent axis.
Multi-agent development requires an Agent Card that explains “who can do what.”
Tasks should be managed as Tasks with status, not as conversation messages.
Results should be separated into Artifacts, not chat responses.
In the development harness, you can borrow the Task/Artifact model even if you do not implement the entire A2A right away.

MCP and A2A solve different problems

In the AI agent ecosystem, MCP and A2A are often mentioned together. But they don't solve the same problem.

[MCP Comparison of A2A Official Documents] (https://a2a-protocol.org/dev/topics/a2a-and-mcp/) explains that MCP deals with how agents use tools and resources such as databases and APIs, and A2A is a separate axis that enables agent-to-agent collaboration.

To summarize briefly, the axes are different.

agent connection boundary

MCP: Call tool

Tools / API / DB / File System

A2A: Delegating tasks

Review/Test/Documentation Agent

division	MCP	A2A
Connect to	Tools, API, DB, file system	another agent
key questions	What can this agent be used for?	Who can this agent delegate work to?
main purpose	tool/resource access	agent collaboration
Development harness application	Test execution, file search, DB inquiry	Delegate code reviews, test analysis, and documentation

Agent-to-Tool and Agent-to-Agent

If you miss this difference when creating a development harness, the structure will be messed up.

Here's something closer to MCP:

1read file2run test3Git diff lookup4Check DB schema5API Documentation Search

On the other hand, here's something that's more A2A:

1Request a Security Review Agent to review changes2Request a test agent to analyze a failing test3Request documentation agent to write migration note4Ask the performance analysis agent to clean up bottleneck candidates

MCP expands your capabilities. A2A separates responsibilities.

In your development harness, tool calls and task delegation should be handled separately as shown below.

Lead Coding Agent

git, test, file, search

Review Agent

review artifact

Test Agent

test artifact

Docs Agent

docs artifact

Agent Card is the agent’s self-introduction.

[A2A Protocol v1.0 Announcement] (https://a2a-protocol.org/latest/announcing-1.0/) describes A2A as a stable, production-ready open standard for communication between AI agents. One of the important concepts in this protocol is Agent Card.

The Agent Card describes what capabilities the agent has, how it communicates, and what inputs and outputs it supports.

From a development harness perspective, the Agent Card can be viewed like this.

1{2  "name": "Code Review Agent",3  "description": "Review PRs with a focus on security, maintainability, and missing tests.",4  "skills": [5    "review-diff",6    "detect-risky-auth-change",7    "suggest-test-cases"8  ],9  "inputModes": ["diff", "file-list", "test-result"],10  "outputModes": ["review-artifact"],11  "requiresApprovalFor": [12    "suggest-db-migration",13    "modify-auth-policy"14  ]15}

Without an Agent Card, the Lead Agent cannot know what other agents are good at. Just as human teams need role definitions, agent teams also need role definitions.

A task is a unit of work with state.

A2A specification describes a task as a stateful unit of work processed by the A2A server. Tasks can have fields such as id, contextId, status, history, and artifacts.

In the development harness, Task can be interpreted as follows.

1Task:2id: review-payment-webhook-change3owner: security-review-agent4input:5- changed files6- payment webhook diff7- related tests8status: input_required9question:10It currently includes changes to the webhook signature verification logic.11Because the existing secret rotation policy cannot be verified, we will stop the implementation and request user approval.

The important state here isinput_required. We need a state where the agent does not make risky decisions directly but asks people to do so.

1Can I proceed?2Can I replace it with a mock?3Can I create DB migration?4Can I call an external API?

When this question disappears, automation becomes faster but riskier.

Artifact is a result that can be reviewed

Artifact is a result created by an agent. A2A v0.1 specification describes artifact as a tangible output generated during a task.

Artifacts are especially important in coding agents. Chat messages flow. Artifacts are reviewed.

bad result	Good Artifact
“I fixed it”	Change File List
“I passed the test”	Execution commands and log summary
“It doesn’t seem like a problem”	Blocker, warning, suggestion classification
“I’ll remember it”	memory candidate with source, scope, validated_at

In practice, the artifact contract is more important than the final response.

1Review Artifact:2Summary:3Reviewed payment webhook changes.4 5Blockers:6- There is no fallback path when signature verification fails.7 8Warnings:9- The retry policy is different from the existing payment API.10 11Verified:12- Passed 12 unit tests13- No change in webhook parser14 15Not Verified:16- No actual PG sandbox call is performed.

These artifacts must exist for human review.

Applying it to your development workflow

There is no need to complicate the multi-agent structure from scratch.

At the MVP stage, the following should be sufficient:

role	explanation
Lead Agent	Decompose goals, determine execution order, and final summary
Implementation Agent	code fix
Test Agent	Test creation and failure analysis
Review Agent	Review of risks and omissions
Human Owner	Approval, distribution, final merge decision

The key is not to split every agent into actual separate processes. Initially, you only need to separate roles within one agent.

1At this stage, it only acts as a Test Agent.2The goal is to analyze the cause of test failure.3Do not modify the code, but only leave test artifacts.

This method is simple but effective. When agents try to analyze, implement, test, and review all at once, their output becomes blurred. Dividing roles makes the results clearer.

Risks of Multi-Agent

Multi-agent is powerful, but it is also risky. [MetaGPT paper] (https://arxiv.org/abs/2308.00352) suggests that errors can propagate when simply connecting multiple LLM agents, and proposes an approach that forces SOP and modular output to reduce this. ChatDev paper also models software development as a collaboration of multiple communicative agents, but this structure requires roles, steps, conversation methods, and verification to work.

Practice risks include:

risk	explanation	react
Unclear responsibility	Not sure who makes the final decision	Separation of Lead Agent and Human Owner
Error propagation	Incorrect analysis by one agent is passed on to the next step	Add artifact review step
excessive automation	Performing hazardous tasks without authorization	Permission Gate
Output mix	Conversation and results are mixed	artifact contract
increased cost	Multiple agent repeated calls	Task budget settings

checklist

When designing a multi-agent structure, contracts below the number of agents come first.

1[ ] MCP and A2A roles were distinguished.2[ ] Defined Lead Agent responsibilities.3[ ] Defined the input and output of each Agent.4[ ] Records task status.5[ ] Artifact format was decided.6[ ] Ask the person a question in the input_required state.7[ ] The final judge was left as the Human Owner.8[ ] There is a Permission Gate for dangerous tasks.

Standards to be taken in this episode

MCP increases the tools available to agents, while A2A defines responsibilities that can be passed on to other agents. If you mix the two, the structure becomes blurred. Multi-agent MVP must first capture the Agent Card, Task status, and Artifact formats.

Next time

In Part 4, we move on to memory. In particular, we distinguish AI Memory from RAG and organize how failure logs and Run Ledgers should be handled.

Continue reading the series

Part 1: Why are coding agents runtime?
Part 2: Goal-based development through Codex/goal
Part 3: Multi-agent development workflow from A2A and MCP perspective
Part 4: AI Memory is not RAG
Part 5: AI Coding Agent Document Set for Application to Development Harness

References

GitHub 계정으로 로그인하면 댓글을 남길 수 있습니다. 댓글은 GitHub Discussions를 통해 운영됩니다.