---
title: "Agent Loop는 모델 호출이 아니라 상태 기계다"
slug: "07-query-loop"
canonicalUrl: "https://moonshotnotes.com/posts/07-query-loop/"
sourceUrl: "https://moonshotnotes.com/posts/07-query-loop/"
markdownUrl: "https://moonshotnotes.com/agent/posts/07-query-loop.md"
language: "ko"
category: "AI Agent"
updatedAt: "2026-05-26"
agentTokenEstimate: 1806
---

# Agent Loop는 모델 호출이 아니라 상태 기계다

Claude Code CLI 분석을 기반으로 agent query loop를 streaming model call, tool request, result injection이 반복되는 상태 기계로 설명합니다.

## Agent metadata

- Source: https://moonshotnotes.com/posts/07-query-loop/
- Markdown: https://moonshotnotes.com/agent/posts/07-query-loop.md
- Language: ko
- Category: AI Agent
- Tags: Claude Code, AI Agent, Runtime, CLI
- Updated: 2026-05-26
- Estimated tokens: 1806

## 핵심 요약

- Agent loop는 단발성 모델 API 호출이 아니라 반복 가능한 상태 기계다.
- 모델은 텍스트를 생성하다가 tool request를 만들 수 있고, runtime은 결과를 다시 message로 주입한다.
- stream event, usage update, stop reason, tool call은 모두 runtime event로 처리해야 한다.
- continuation condition이 명확해야 무한 루프, 누락된 tool result, 비용 폭주를 막을 수 있다.

이번 글에서는 agent runtime의 심장인 query loop를 다룹니다.

많은 LLM 앱은 모델 호출을 “요청 하나, 응답 하나”로 생각합니다. 하지만 agent는 다릅니다. 모델이 응답 중간에 도구 사용을 요청할 수 있고, runtime은 도구를 실행한 뒤 그 결과를 다시 메시지로 넣어 모델에게 이어서 추론하게 해야 합니다.

즉 agent loop는 모델 호출 함수가 아니라 **streaming event와 tool result injection이 반복되는 상태 기계**입니다.

## 1. 모델 호출 하나로 agent가 되지 않는 이유

모델이 도구를 요청하면 응답은 끝난 것이 아닙니다. 오히려 그때부터 agent 실행이 시작됩니다.

```text
# 읽는 법: 아래 항목은 동작 흐름을 빠르게 확인하기 위한 요약 예시입니다.
messages
→ model stream
→ assistant text 일부 출력
→ tool request 발생
→ tool runtime 실행
→ tool result message 생성
→ messages에 재주입
→ model stream 재개
→ final answer
```

이 반복을 처리하지 못하면 도구 호출은 단발성 함수 실행에 머뭅니다. 모델은 외부 세계를 관찰한 결과를 보고 다음 판단을 할 수 없습니다.

## 2. Query loop의 상태

agent loop는 다음 상태를 관리합니다.

| 상태 | 설명 |
|---|---|
| messages | 현재 모델에게 보낼 conversation snapshot |
| turn_count | tool result 이후 모델을 몇 번 다시 호출했는지 |
| tool_context | 현재 turn에서 사용 가능한 도구와 권한 정보 |
| usage | streaming 중 누적되는 token/cost 정보 |
| compaction | context가 길어질 때 요약 또는 압축 상태 |
| cancellation | 사용자 중단 또는 timeout 상태 |
| pending_tool_results | 실행 완료 후 재주입할 result messages |

이 상태들은 provider 호출 하나에서 끝나지 않습니다. tool result를 넣고 다시 모델을 부를 때 이어져야 합니다.

## 3. Stream event를 runtime event로 바꾸기

provider API가 주는 event 이름을 앱 전체에 그대로 퍼뜨리면 provider를 바꾸기 어렵습니다. 따라서 query loop는 provider stream을 내부 runtime event로 변환해야 합니다.

| Provider event 성격 | Runtime event 예시 |
|---|---|
| message start | `assistant_started` |
| text delta | `assistant_text_delta` |
| tool call delta/done | `tool_request_ready` |
| usage update | `usage_updated` |
| message stop | `assistant_message_finished` |
| error | `model_stream_failed` |

이렇게 하면 session shell, ledger, cost tracker는 provider SDK 타입을 몰라도 됩니다.

## 4. Tool result 재주입

도구 결과는 단순 출력 문자열이 아닙니다. 다음 모델 호출에 들어갈 관찰값입니다.

도구가 실패해도 result message를 만들어야 합니다. 모델이 요청한 tool call에 대응되는 결과가 없으면 다음 추론이 불안정해질 수 있습니다.

```text
# 읽는 법: 아래 항목은 동작 흐름을 빠르게 확인하기 위한 요약 예시입니다.
tool request id: T-17
→ schema validation failed
→ permission denied
→ execution failed
→ success result
```

위 네 경우 모두 “결과”입니다. 성공만 결과로 보는 것이 아니라, 실패도 모델이 읽을 수 있는 관찰값으로 만들어야 합니다.

> **실무 팁**
> tool error를 Python exception으로만 처리하지 말고 model-visible error result로 바꿔보세요. agent가 스스로 다른 전략을 선택할 수 있습니다.

## 5. Continuation condition 설계

agent loop가 언제 계속 돌고 언제 멈추는지 명확해야 합니다.

| 조건 | 처리 |
|---|---|
| tool request 없음 | final answer로 종료 |
| tool request 있음 | tool runtime 실행 후 messages에 result 주입 |
| max turn 초과 | turn limit final event 생성 |
| context 초과 | compaction 후 계속 또는 사용자에게 안내 |
| cancel 요청 | 중단 result 기록 후 종료 |
| budget 초과 | 정책에 따라 중단 또는 승인 요청 |

이 조건이 흩어져 있으면 agent가 무한히 돌거나, tool result가 빠지거나, 비용이 예상보다 커질 수 있습니다.

## 6. 개념 코드로 보는 agent loop

아래 코드는 설명용으로 새로 작성한 코드입니다.

```python
# 읽는 법: 실제 구현 복제가 아니라 runtime 경계를 설명하는 개념 코드입니다.
class LoopState:
    # 객체가 이후 단계에서 참조할 runtime 의존성과 상태 저장소를 초기화합니다.
    def __init__(self, messages, tool_context):
        self.messages = list(messages)
        self.tool_context = tool_context
        self.turn_count = 0
        self.usage = UsageSnapshot()
        self.cancelled = False

# 모델 스트림을 읽고 tool request를 실행한 뒤 결과를 다시 model message로 주입합니다.
async def run_agent_loop(initial_messages, runtime):
    state = LoopState(initial_messages, runtime.tool_context)

    while True:
        if state.turn_count >= runtime.limits.max_model_rounds:
            await runtime.ledger.write("turn_limit_reached", state.turn_count)
            return FinalAnswer(reason="turn_limit")

        if runtime.abort_signal.requested:
            return FinalAnswer(reason="cancelled")

        request = runtime.provider_adapter.build_request(state.messages, state.tool_context)
        tool_requests = []

        async for event in runtime.provider_adapter.stream(request):
            if event.type == "assistant_text_delta":
                runtime.shell.publish(event)

            elif event.type == "tool_request_ready":
                tool_requests.append(event.tool_request)

            elif event.type == "usage_updated":
                state.usage.merge(event.usage)
                runtime.accounting.update(state.usage)

        if not tool_requests:
            return FinalAnswer(reason="assistant_done", usage=state.usage)

        async for result in runtime.tools.execute_batch(tool_requests, state.tool_context):
            state.messages.append(result.to_model_message())
            runtime.shell.publish(result.to_screen_event())
            await runtime.ledger.write("tool_result", result.summary())

        state.turn_count += 1
```

이 구조에서 provider adapter와 tool runtime은 서로의 내부 구현을 모릅니다. query loop가 둘 사이를 연결합니다.

## 7. AI 활용 개발자 관점

AI coding agent를 사용할 때 긴 작업이 어떻게 진행되는지 관찰해보면 좋습니다.

- 모델이 도구를 요청했는지 화면에 표시되는가?
- 도구 실행 결과가 요약되어 보이는가?
- 실패한 도구에 대해 agent가 다른 전략을 시도하는가?
- 반복 실행이 너무 길어질 때 제한이 있는가?
- 비용/토큰이 중간에 갱신되는가?

이 정보가 보이지 않는 agent는 편해 보일 수 있지만, 팀 환경에서는 감사와 디버깅이 어렵습니다.

## 8. Agent 개발자 체크리스트

```text
# 읽는 법: 아래 항목은 동작 흐름을 빠르게 확인하기 위한 요약 예시입니다.
Query Loop 체크리스트

[ ] 모델 호출은 반복 가능한 loop로 설계되어 있다.
[ ] provider stream은 내부 runtime event로 변환된다.
[ ] tool request는 수집된 뒤 tool runtime으로 넘어간다.
[ ] tool result는 model-visible message로 재주입된다.
[ ] 실패한 tool request도 result message를 만든다.
[ ] max turn, cancel, budget, context compaction 조건이 명확하다.
[ ] usage/cost는 stream 중간 event에서도 갱신된다.
```

## 실패 사례: tool result를 넣지 않고 다음 호출로 넘어간 경우

Agent loop에서 가장 위험한 버그 중 하나는 모델이 tool request를 냈는데 runtime이 그 결과를 message history에 정확히 다시 넣지 않는 경우입니다. 화면에는 도구가 실행된 것처럼 보이지만 다음 모델 호출에는 결과가 빠져 있습니다. 그러면 모델은 같은 도구를 다시 요청하거나, 결과를 상상해서 답변하거나, 이미 실패한 작업을 성공한 것처럼 이어갑니다.

이 문제는 streaming UI에서 더 자주 발생합니다. 화면은 token stream을 기준으로 움직이고, backend는 tool event를 기준으로 움직이며, ledger는 최종 message만 저장할 수 있습니다. 세 계층이 같은 turn id와 tool call id를 공유하지 않으면 재현하기 어려운 ghost state가 생깁니다. 사용자는 "방금 파일을 읽었다"고 봤는데 모델은 읽지 않은 상태로 답하는 식입니다.

## 상태 전이 예시

Query loop를 상태 기계로 두면 이 실패를 줄일 수 있습니다.

```text
idle
  -> streaming_model
  -> collecting_tool_requests
  -> executing_tools
  -> injecting_tool_results
  -> streaming_model
  -> done
```

각 상태에는 들어갈 수 있는 event와 나갈 수 있는 event가 정해져야 합니다. `executing_tools` 상태에서 새 model stream을 시작하려면 모든 required tool result가 history에 들어갔는지 확인해야 합니다. 반대로 optional telemetry event는 model history에 넣지 않아도 됩니다. 이 구분이 없으면 context는 불필요하게 커지고, 중요한 결과는 누락됩니다.

| Event | Model history | UI stream | Ledger |
|---|---|---|---|
| assistant token | yes | yes | yes |
| tool request | yes | yes | yes |
| tool progress | no | yes | optional |
| tool result | yes | yes | yes |
| usage update | no | yes | yes |

## 체크리스트 적용 결과

실제 구현을 리뷰할 때는 "모델을 몇 번 호출하는가"보다 "반복 조건이 어디에 있는가"를 봅니다. max turn, max tool calls, cancel signal, budget exceeded, provider stop reason이 모두 같은 loop controller에서 판단되어야 합니다. 각각의 adapter가 임의로 재시도하면 agent는 멈추기 어려워집니다. 특히 비용이 걸린 제품에서는 usage update가 늦게 도착해도 다음 turn 전에 예산을 다시 계산해야 합니다.

## 마무리

Agent loop를 이해하면 AI agent의 본질이 보입니다. 모델은 혼자 실행하지 않습니다. runtime이 stream을 읽고, 도구를 실행하고, 결과를 다시 넣고, 언제 멈출지 결정합니다.

다음 글에서는 provider API boundary를 보겠습니다. 제품 로직이 provider SDK 타입에 끌려가지 않게 하려면 모델 호출 경계를 어떻게 설계해야 할까요?