MemoryGraph는 자동 저장소가 아니다

Workflow · 2026-05-08 · 5분 읽기

Markdown약 1954 tokens

AI 에이전트에 memory를 붙이면 문제가 해결될 것처럼 보입니다. 하지만 개발 하네스에서는 반대로 시작해야 합니다. MemoryGraph의 핵심 가치는 무엇을 저장하느냐보다 무엇을 저장하지 않느냐에 있습니다.

1편에서는 Ctx2Skill의 context-to-skill 구조를 개발 하네스의 Repo Skill Memory 관점으로 적용해 봤습니다. 이번 글에서는 그 기억을 어디까지 MemoryGraph에 넣어야 하는지 다룹니다.

결론은 보수적입니다. MemoryGraph는 raw trace 저장소가 아닙니다. 실패 로그를 자동으로 밀어 넣는 곳도 아닙니다. MemoryGraph에는 replay 또는 human approval을 통과한 compact rule만 들어가야 합니다.

핵심 요약

MemoryGraph는 source of truth가 아니라 recall acceleration layer입니다.
진실의 원천은 code, test, CI, contract, production behavior입니다.
raw trace, imported evidence, 실패 로그를 그대로 장기 기억에 넣으면 다음 실행을 오염시킬 수 있습니다.
실패에서 나온 규칙은 먼저 memory candidate로 남기고, replay 또는 human approval을 거친 뒤 승격해야 합니다.
memory fact에는 적용 범위, 제외 범위, 검증 근거, lifecycle metadata가 있어야 합니다.

MemoryGraph를 어디에 두어야 하는가

개발 하네스의 지식 계층을 분리하면 역할이 명확해집니다.

계층	역할	저장해도 되는 것	저장하면 위험한 것
Source of truth	실제 판정 기준	코드, 테스트, CI, contract, docs	추론 요약만 있는 규칙
Raw trace	실행 관측	action, observation, judge event	장기 기억으로 바로 쓰는 실패 로그
Candidate	승격 후보	failure attribution, scope, evidence refs	근거 없는 일반화
MemoryGraph	재사용 기억	검증된 compact rule	raw trace 전체, imported-only evidence
Prompt context	현재 실행 힌트	관련성이 높은 작은 recall	전체 memory dump

MemoryGraph는 source of truth를 대체하지 않습니다. 에이전트가 올바른 source of truth로 빨리 가도록 돕는 색인에 가깝습니다.

반복 실행

Truth source
code / tests / CI / docs
Raw trace
action / observation / judge
Memory candidate
failure attribution
Promotion gate
replay or human approval
MemoryGraph
verified compact rule
Small recall context
next run

실패 시 다시 실행 단계로 돌아갑니다.

가장 위험한 실수: raw trace를 기억으로 착각하기

raw trace에는 유용한 단서가 많습니다. 동시에 노이즈도 많습니다.

환경 문제 때문에 실패한 명령
flaky test의 일회성 실패
잘못된 가정으로 시작한 탐색
중간에 폐기된 계획
사용자가 거절한 접근
외부 runtime에서 import한 불완전한 로그

이런 내용을 그대로 장기 기억에 넣으면 에이전트는 다음 실행에서 잘못된 규칙을 따릅니다.

1나쁜 memory:2이 프로젝트에서는 e2e 테스트가 자주 실패하므로 closeout에서 생략한다.3 4좋은 candidate:5최근 run에서 e2e가 브라우저 바이너리 누락으로 실패했다.6failure_class는 environment_blocker이며, 코드 변경에 대한 memory promotion 대상이 아니다.7재시도 전에는 browser dependency availability를 먼저 확인한다.

좋은 후보는 실패를 숨기지 않습니다. 다만 실패를 일반 규칙으로 승격하기 전에 원인과 적용 범위를 분리합니다.

승격 파이프라인

MemoryGraph write는 마지막 단계입니다. 그 전에는 candidate와 gate가 있어야 합니다.

Memory candidate

Replay probe

Promotion decision

Human approval

Promotion decision

이 파이프라인의 정책은 다음처럼 단순하게 유지하는 편이 좋습니다.

정책	이유
replay 또는 human approval 없이는 승격하지 않는다	실패 원인 오판을 줄입니다.
imported-only candidate는 차단한다	외부 trace를 진실로 착각하지 않습니다.
explicit write flag 없이는 쓰지 않는다	분석과 장기 상태 변경을 분리합니다.
auto promotion은 verified-only만 허용한다	편의보다 기억 신뢰도를 우선합니다.
scope와 anti-scope가 없으면 보류한다	맞는 규칙을 틀린 상황에 적용하는 일을 막습니다.

memory candidate의 최소 구조

candidate는 문장 하나가 아니라 검증 가능한 운영 지식 후보여야 합니다.

1{2  "schema": "awtl.memory_candidate.v2",3  "candidate_id": "memcand-demo-a",4  "failure_class": "agent_failure",5  "failure_type": "contract_mismatch",6  "source_action_ids": ["action-12"],7  "root_cause_summary": "Public response shape changed without updating contract evidence.",8  "proposed_memory": "public response schema가 바뀌는 경우 contract artifact와 verifier evidence를 함께 갱신한다.",9  "scope": {10    "applies_to": ["public_api", "contract_change"],11    "does_not_apply_to": ["internal_refactor", "test_only_change"]12  },13  "evidence_refs": ["judge:contract-verifier", "artifact:contract-output"],14  "verification_probe_candidate": "public-contract-regression",15  "promotion_status": "candidate",16  "requires_human_review": true17}

이 구조에서 빠지면 안 되는 것은 네 가지입니다.

필드	없을 때 생기는 문제
`failure_class`	환경 실패를 agent failure로 오판합니다.
`source_action_ids`	어떤 행동에서 규칙이 나왔는지 추적할 수 없습니다.
`scope`	규칙이 너무 넓게 적용됩니다.
`evidence_refs`	왜 이 기억이 생겼는지 검증할 수 없습니다.

실패 class를 먼저 나누기

모든 실패가 memory promotion 대상은 아닙니다.

failure class	기본 처리	예시
`agent_failure`	candidate 가능	파일을 잘못 수정했거나 contract 갱신을 누락함
`verification_failure`	probe와 함께 candidate 가능	새 verifier가 실제 회귀를 잡음
`environment_blocker`	promotion 차단	네트워크, 권한, 브라우저 바이너리 누락
`flaky_blocker`	promotion 차단	재현되지 않는 일회성 테스트 실패
`harness_blocker`	하네스 수정 대상으로 분리	verifier 자체 버그

이 분리가 없으면 memory가 빠르게 오염됩니다. 네트워크 장애 때문에 실패한 작업을 보고 “이 검증은 생략해도 된다”는 규칙을 저장하면, 다음 실행은 더 위험해집니다.

fact lifecycle이 필요하다

한 번 맞았던 memory도 시간이 지나면 틀릴 수 있습니다. 따라서 MemoryGraph fact에는 lifecycle metadata가 있어야 합니다.

1{2  "origin": "awtl",3  "origin_candidate_id": "memcand-demo-a",4  "validated_by": "replay",5  "valid_for": ["contract_change", "public_api"],6  "not_valid_for": ["internal_refactor"],7  "last_validated_at": "example-timestamp",8  "verification_contract_version": "v1",9  "status": "active"10}

운영 상태는 최소한 다음 흐름을 가져야 합니다.

준비

candidate

반복 실행

active
denied
challenged

실패 시 다시 실행 단계로 돌아갑니다.

정리

deprecated
archived

prompt에 넣는 memory도 작아야 한다

MemoryGraph에 검증된 fact만 넣어도, 매 실행에 전부 주입하면 다시 노이즈가 됩니다. 현재 작업과 관련된 작은 recall만 써야 합니다.

좋은 recall은 이런 형태입니다.

1Relevant project memory2- public API request/response schema 변경 시 contract artifact와 verifier evidence를 함께 갱신한다.3- 적용 범위: public_api, contract_change4- 제외 범위: internal_refactor, test_only_change5- 검증: contract verifier를 closeout 전에 실행한다.

나쁜 recall은 프로젝트 memory 전체를 그대로 붙이는 것입니다. 에이전트는 중요도와 적용 범위를 스스로 안정적으로 판정하지 못할 수 있습니다.

실전 체크리스트

승격 정책을 설계할 때는 다음 항목을 기준으로 점검합니다.

1MemoryGraph promotion 체크리스트2 3[ ] MemoryGraph를 raw trace 저장소로 쓰지 않는다.4[ ] phase 시작 전에는 read-only recall만 수행한다.5[ ] projectMemoryContext는 작고 관련성 높은 내용만 포함한다.6[ ] rules, developer instruction, hard rule과 중복되는 memory는 제거한다.7[ ] MemoryGraph 조회 실패와 workflow 실패를 분리한다.8[ ] MemoryGraph write는 explicit flag가 있을 때만 수행한다.9[ ] auto promotion은 verified-only만 허용한다.10[ ] replay evidence 또는 human approval 없는 candidate는 차단한다.11[ ] imported-only candidate는 장기 memory로 승격하지 않는다.12[ ] promotion decision은 replay scorecard에 남긴다.13[ ] fact lifecycle과 stale detection을 설계한다.

마무리

MemoryGraph를 “많이 저장하는 곳”으로 설계하면 에이전트는 더 똑똑해지는 것이 아니라 더 위험해질 수 있습니다. AI는 memory를 단순 참고가 아니라 작업 규칙처럼 따르는 경향이 있기 때문입니다.

그래서 원칙은 보수적이어야 합니다.

1read-only recall은 넓게 허용한다.2promotion은 좁게 허용한다.3write는 검증된 경우에만 허용한다.

다음 편에서는 이 구조의 원재료가 되는 AWTL을 다룹니다. 실패 로그를 어떻게 다음 실행의 힌트로 바꾸고, failed turn case와 replay scorecard로 운영할 수 있는지 살펴보겠습니다.

운영 예시: memory candidate를 거절하는 기준

MemoryGraph 승격은 많이 저장하는 게임이 아닙니다. 예를 들어 한 run에서 "Windows PowerShell에서 특정 명령이 실패했다"는 로그가 나왔다고 해봅시다. 이것은 그대로 memory가 되면 안 됩니다. 일시적인 환경 문제일 수 있고, 경로와 사용자명이 섞여 있을 수 있습니다. 대신 세 번 이상 반복되고 원인이 repo 관례로 확인될 때만 compact rule로 바꿉니다.

후보	승격 판단
일회성 stack trace	ledger에만 보존
반복된 검증 명령 순서	compact rule 후보
사용자 개인 경로가 포함된 로그	원문 승격 금지
repo 고유 closeout 절차	review 후 memory 승격

실패 사례는 raw trace를 검색 가능하다는 이유로 통째로 넣는 것입니다. 다음 agent가 그 trace 안의 임시 지시나 오류 문구를 규칙처럼 사용할 수 있습니다. MemoryGraph에는 사실, 조건, 재사용 범위, 만료 기준이 있어야 합니다. 그래야 장기 기억이 작업 품질을 높이지, 오래된 로그를 계속 되살리는 노이즈가 되지 않습니다.

It seems like attaching memory to the AI agent will solve the problem. But with a development harness, you have to start the other way. MemoryGraph's core value lies not in what it stores, but in what it does not store.

In Part 1, we applied Ctx2Skill's context-to-skill structure from the perspective of the Repo Skill Memory of the development harness. In this article, we will cover how much of that memory should be stored in MemoryGraph.

The conclusion is conservative. MemoryGraph is not a raw trace store. It's not even a place to automatically push failure logs. Only compact rules that have passed replay or human approval should be included in MemoryGraph.

Key takeaways

MemoryGraph is not a source of truth but a recall acceleration layer.
The source of truth is code, test, CI, contract, and production behavior.
Leaving raw traces, imported evidence, and failure logs intact in long-term memory can contaminate subsequent executions.
Rules resulting from failure must first remain as memory candidates and be promoted after replay or human approval.
The memory fact must have coverage, exclusion scope, verification basis, and lifecycle metadata.

Where to put MemoryGraph

Separating the knowledge layers of the development harness clarifies roles.

hierarchy	role	What you can save	Dangerous to store
Source of truth	Actual Judgment Criteria	Code, testing, CI, contract, docs	Rules with inference summary only
Raw trace	Running Observation	action, observation, judge event	Failure log written directly into long-term memory
Candidate	candidate for promotion	failure attribution, scope, evidence refs	an unfounded generalization
MemoryGraph	reusable memory	Proven compact rule	All raw traces, imported-only evidence
Prompt context	Current execution hint	Small, highly relevant recall	full memory dump

MemoryGraph does not replace a source of truth. It is more like an index that helps agents quickly get to the right source of truth.

반복 실행

Truth source
code / tests / CI / docs
Raw trace
action / observation / judge
Memory candidate
failure attribution
Promotion gate
replay or human approval
MemoryGraph
verified compact rule
Small recall context
next run

실패 시 다시 실행 단계로 돌아갑니다.

The most dangerous mistake: mistaking raw traces for memories

The raw trace has many useful clues. At the same time, there is a lot of noise.

Command failed due to environmental issues
One-time failure of flaky test
Search begins with incorrect assumptions
plan abandoned midway
Access denied by user
Incomplete log imported from external runtime

If we put this information into long-term memory, the agent will follow the wrong rule in the next execution.

1bad memory:2In this project, e2e tests fail frequently, so they are omitted from closeout.3 4Good candidate:5In a recent run, e2e failed due to missing browser binaries.6failure_class is environment_blocker and is not subject to memory promotion for code changes.7Before retrying, check browser dependency availability first.

Good candidates don't hide their failures. However, before elevating a failure to a general rule, it separates the cause from the scope of application.

Promotion Pipeline

MemoryGraph write is the final step. Before that, there must be a candidate and a gate.

Memory candidate

Replay probe

Promotion decision

Human approval

Promotion decision

It's best to keep the policy for this pipeline simple:

policy	reason
No promotion without replay or human approval	Reduces misjudgment of failure causes.
Block imported-only candidates	Don't mistake external traces for truth.
Do not write without explicit write flag	Separate analysis from long-term state changes.
Auto promotion only allows verified-only	Prioritize memory reliability over convenience.
If there is no scope or anti-scope, it is withheld.	It prevents applying the right rules to the wrong situation.

Minimal structure of memory candidate

The candidate must be a verifiable operational knowledge candidate, not just a single sentence.

1{2  "schema": "awtl.memory_candidate.v2",3  "candidate_id": "memcand-demo-a",4  "failure_class": "agent_failure",5  "failure_type": "contract_mismatch",6  "source_action_ids": ["action-12"],7  "root_cause_summary": "Public response shape changed without updating contract evidence.",8  "proposed_memory": "If the public response schema changes, the contract artifact and verifier evidence are updated together.",9  "scope": {10    "applies_to": ["public_api", "contract_change"],11    "does_not_apply_to": ["internal_refactor", "test_only_change"]12  },13  "evidence_refs": ["judge:contract-verifier", "artifact:contract-output"],14  "verification_probe_candidate": "public-contract-regression",15  "promotion_status": "candidate",16  "requires_human_review": true17}

There are four things that should not be left out of this structure.

field	Problems that arise when there is no
`failure_class`	Environmental failure is mistaken for agent failure.
`source_action_ids`	You can't trace which actions led to the rules.
`scope`	The rule applies too broadly.
`evidence_refs`	We cannot verify why this memory occurred.

Divide the failure class first

Not all failures are eligible for memory promotion.

failure class	default processing	example
`agent_failure`	candidate available	You modified the file incorrectly or missed a contract update.
`verification_failure`	Candidate available with probe	New verifier catches real regression
`environment_blocker`	block promotion	Missing network, permission, and browser binaries
`flaky_blocker`	block promotion	One-off test failure that cannot be reproduced
`harness_blocker`	Separated into harness modification targets	verifier own bug

Without this separation, memory becomes corrupted quickly. If you see a job failing due to a network failure and save a rule saying “you can skip this verification,” the next run will be more risky.

A fact lifecycle is needed

Even a memory that was once correct can become incorrect over time. Therefore, the MemoryGraph fact must have lifecycle metadata.

1{2  "origin": "awtl",3  "origin_candidate_id": "memcand-demo-a",4  "validated_by": "replay",5  "valid_for": ["contract_change", "public_api"],6  "not_valid_for": ["internal_refactor"],7  "last_validated_at": "example-timestamp",8  "verification_contract_version": "v1",9  "status": "active"10}

The operational state must have at least the following flow:

준비

candidate

반복 실행

active
denied
challenged

실패 시 다시 실행 단계로 돌아갑니다.

정리

deprecated
archived

The memory put into the prompt should also be small.

Even if you only put verified facts into MemoryGraph, if you inject them all in every execution, they will become noise again. You should only write small recalls that are relevant to the current task.

Good recall looks like this.

1Relevant project memory2- When changing the public API request/response schema, contract artifact and verifier evidence are updated together.3- Scope of application: public_api, contract_change4- Exclusion range: internal_refactor, test_only_change5- Verification: Run the contract verifier before closeout.

Bad recall means pasting the entire project memory as is. Agents may not be able to reliably determine importance and coverage on their own.

Practical Checklist

When designing a promotion policy, check the following items:

1MemoryGraph promotion checklist2 3[ ] Do not use MemoryGraph as a raw trace storage.4[ ] Only read-only recall is performed before the start of the phase.5[ ] projectMemoryContext contains only small and relevant content.6[ ] Memory overlapping with rules, developer instructions, and hard rules is removed.7[ ] Separate MemoryGraph query failure and workflow failure.8[ ] MemoryGraph write is performed only when there is an explicit flag.9[ ] Auto promotion only allows verified-only.10[ ] Block candidates without replay evidence or human approval.11[ ] Imported-only candidates are not promoted to long-term memory.12[ ] Promotion decisions are left on the replay scorecard.13[ ] Design fact lifecycle and stale detection.

finish

Designing the MemoryGraph to be a “place to store a lot” can make the agent more dangerous, not smarter. This is because AI tends to follow memory like a work rule rather than a simple reference.

So the principles must be conservative.

1Read-only recall is widely permitted.2Promotion is allowed narrowly.3Write is allowed only when verified.

In the next part, we will cover AWTL, which is the raw material for this structure. Let's take a look at how we can turn the failure log into a hint for the next run and operate it with a failed turn case and replay scorecard.

Operational example: Criteria for rejecting memory candidates

MemoryGraph Promotion is not a game where you save a lot. For example, let's say one run logs "A specific command failed in Windows PowerShell." This should not be stored in memory as is. This may be a temporary environmental issue, or the path and username may be mixed up. Instead, change to compact rule only when it is repeated three or more times and the cause is confirmed to be a repo convention.

candidate	promotion decision
One-time stack trace	Stored only in ledger
Repeated verification command sequence	compact rule candidate
Logs containing user personal paths	No promotion of original text
repo-specific closeout procedure	Memory promotion after review

A failure case is to include the raw trace entirely because it is searchable. Next, the agent can use temporary instructions or error phrases in the trace as rules. MemoryGraph must have facts, conditions, reuse scope, and expiration criteria. That way, long-term memory improves the quality of your work and doesn't become a source of noise that keeps replaying old logs.

GitHub 계정으로 로그인하면 댓글을 남길 수 있습니다. 댓글은 GitHub Discussions를 통해 운영됩니다.