Function Calling 설계

AI Backend · 2026-05-12 · 마지막 수정일 2026-05-26 · 6분 읽기

Markdown약 1871 tokens

LLM과 내부 API의 경계를 어디에 둘 것인가

이 글에서는 Function Calling을 이용해 LLM과 내부 API를 연결할 때 어떤 경계를 잡아야 하는지 정리합니다.

Function Calling은 모델이 외부 시스템과 상호작용할 수 있게 해줍니다. 사용자의 질문에 답만 하는 것이 아니라, DB에서 데이터를 조회하고, 티켓을 만들고, 문서를 검색하고, 워크플로우를 실행할 수 있습니다.

하지만 여기서 중요한 질문이 생깁니다.

모델에게 어디까지 맡길 것인가?
어떤 API를 tool로 노출할 것인가?
권한 검사는 누가 할 것인가?
실패했을 때 어떻게 되돌릴 것인가?

분석 기준일: 2026-05-12
실습 기준 환경: OpenAI API, FastAPI, PostgreSQL
주요 참고자료: OpenAI Function Calling Docs, Structured Outputs Docs, AWS Idempotency

핵심 요약

Function Calling은 모델이 내부 API를 직접 실행하는 것이 아니라, 호출 의도를 구조화해서 백엔드에 전달하는 방식으로 이해해야 한다.
실제 실행 권한은 백엔드 tool executor가 가져야 한다.
read-only tool과 mutating tool을 분리해야 한다.
상태 변경 tool에는 idempotency, audit log, human approval이 필요할 수 있다.
tool schema는 내부 API 전체를 노출하지 말고 모델에게 필요한 최소 인터페이스만 제공해야 한다.

1. Function Calling을 어떻게 이해할 것인가

Function Calling은 모델이 함수를 “직접 실행”하는 기능으로 이해하면 위험합니다. 더 정확히는 모델이 어떤 함수를 어떤 인자로 호출해야 할지 구조화된 형태로 제안하는 기능입니다.

실제 실행은 백엔드가 해야 합니다.

1# 예시입니다.2User request3→ LLM decides tool call4→ Backend validates tool call5→ Permission check6→ Execute internal API7→ Return tool result to model8→ Generate final response

모델은 결정 보조자이고, 백엔드는 집행자입니다.

2. Tool Boundary란 무엇인가

Tool Boundary는 모델에게 노출되는 기능의 경계입니다.

나쁜 예:

1# 예시입니다.2execute_sql(query: string)

이 tool은 너무 강력합니다. 모델이 임의 SQL을 만들 수 있고, 권한과 안전성을 제어하기 어렵습니다.

좋은 예:

1# 예시입니다.2search_documents(query: string, scope: enum, limit: integer)3get_document_summary(document_id: string)4create_support_ticket(title: string, description: string, priority: enum)

모델에게 필요한 행동을 작은 도구로 제한해야 합니다.

3. Read Tool과 Write Tool 분리

Tool 유형	예시	위험도	보호 장치
Read	문서 검색, 상태 조회	낮음–중간	권한 필터, rate limit
Write	티켓 생성, 이메일 발송	중간–높음	idempotency, approval
Destructive	삭제, 환불, 권한 변경	높음	기본적으로 노출 금지 또는 강한 승인

초기 Function Calling은 read-only tool부터 시작하는 것이 안전합니다.

4. Tool Schema 설계

문서 검색 tool 예시:

1// 예시 JSON 구조입니다.2{3  "name": "search_documents",4  "description": "사용자가 접근 가능한 문서에서 관련 내용을 검색한다.",5  "parameters": {6    "type": "object",7    "properties": {8      "query": {9        "type": "string",10        "description": "검색할 자연어 질의"11      },12      "scope": {13        "type": "string",14        "enum": ["official_docs", "papers", "tech_blogs"]15      },16      "limit": {17        "type": "integer",18        "minimum": 1,19        "maximum": 1020      }21    },22    "required": ["query", "scope", "limit"],23    "additionalProperties": false24  }25}

설계 포인트는 다음과 같습니다.

1# 예시입니다.2[ ] enum으로 범위를 제한한다.3[ ] limit 최댓값을 둔다.4[ ] 내부 ID나 권한 조건은 서버가 결정한다.5[ ] 사용자의 원문 입력을 그대로 내부 API에 넘기지 않는다.

5. 권한과 인증

권한 검사는 모델이 아니라 서버가 해야 합니다.

1# 예시입니다.2사용자 A가 질문3→ 모델이 search_documents 호출 제안4→ 서버가 사용자 A의 문서 권한 확인5→ 권한 있는 scope만 검색6→ 결과 반환

모델에게 “이 사용자가 접근 가능한 문서만 검색해”라고 말하는 것만으로는 부족합니다. 실제 필터는 백엔드에서 강제해야 합니다.

6. 실행 전 검증과 Human Approval

상태 변경 tool은 실행 전 검증이 필요합니다.

1# 예시입니다.2LLM tool call 생성3→ schema validation4→ business rule validation5→ permission check6→ risk classification7→ human approval if needed8→ execute

예를 들어 이메일 발송, 티켓 생성, 고객 상태 변경은 사람이 승인해야 할 수 있습니다.

1// 예시 JSON 구조입니다.2{3  "tool": "create_support_ticket",4  "arguments": {5    "title": "Redis cache issue",6    "description": "사용자가 Redis cache miss 문제를 보고했습니다.",7    "priority": "medium"8  },9  "requires_approval": true10}

7. 감사 로그와 재현성

Function Calling은 반드시 감사 로그를 남겨야 합니다.

로그 항목	이유
user_id	누가 요청했는지
tool_name	어떤 tool이 호출됐는지
tool_arguments	어떤 인자가 사용됐는지
validation_result	검증 통과 여부
execution_result	실행 결과
trace_id	장애 분석
prompt_version	재현성
model	품질 분석

단, 민감정보는 마스킹해야 합니다.

8. 실패와 rollback

Tool 실행은 실패할 수 있습니다.

실패 유형	대응
schema validation 실패	모델 재시도 또는 fallback
권한 없음	tool 실행 거부
내부 API timeout	retry 또는 사용자 안내
중복 요청	idempotency key로 기존 결과 반환
잘못된 상태 변경	rollback 또는 보상 트랜잭션

상태 변경 tool은 rollback 계획이 없으면 노출하지 않는 편이 안전합니다.

9. 실무 체크리스트

1# 예시입니다.2[ ] read tool과 write tool을 분리했는가?3[ ] tool schema가 최소 권한 원칙을 따르는가?4[ ] enum, max limit 등 입력 제한이 있는가?5[ ] 권한 검사를 서버에서 강제하는가?6[ ] 상태 변경 tool에 idempotency key가 있는가?7[ ] 위험 작업에 human approval이 있는가?8[ ] 모든 tool call이 audit log로 남는가?9[ ] 실패 시 사용자에게 설명 가능한 응답이 있는가?

실패 사례: 내부 API를 그대로 tool로 공개한 경우

Function Calling에서 자주 보는 실수는 이미 존재하는 backend API를 거의 그대로 tool로 노출하는 것입니다. 예를 들어 POST /tickets, PATCH /users/:id, DELETE /documents/:id 같은 endpoint를 모델이 호출할 수 있게 만들면 구현은 빠릅니다. 하지만 이 API들은 사람 또는 서버가 명시적으로 호출한다는 전제로 설계되어 있습니다. 모델에게 그대로 주면 인자 범위, 권한, 승인, 재시도 의미가 너무 넓습니다.

특히 write API는 idempotency가 없으면 위험합니다. 모델이 timeout을 보고 같은 tool call을 다시 시도했는데 실제로는 첫 요청이 성공했다면 티켓이 두 개 생길 수 있습니다. 사용자에게 확인받아야 하는 작업도 tool description에 "주의해서 호출"이라고 쓰는 것만으로는 충분하지 않습니다. 최종 실행 전 서버가 approval state를 확인해야 합니다.

구현 예시: 모델용 command API 만들기

내부 API를 감싸는 좁은 command를 두면 boundary가 선명해집니다.

1type CreateTicketCommand = {2  kind: "create_support_ticket";3  idempotencyKey: string;4  title: string;5  summary: string;6  priority: "low" | "normal" | "high";7  requiresHumanApproval: true;8};

이 command는 내부 ticket API보다 의도적으로 작습니다. assignee, internal label, billing field 같은 값은 모델이 고르지 못하게 서버에서 채웁니다. 모델은 "티켓을 만들어야 한다"는 제안을 구조화하고, backend는 권한과 approval을 확인한 뒤 내부 API를 호출합니다.

공식 사실과 설계 해석 분리

공식 문서가 제공하는 사실은 "모델이 structured tool call을 만들 수 있다"는 수준입니다. 하지만 "어떤 내부 API를 tool로 노출할 것인가", "어떤 작업을 approval 뒤로 보낼 것인가", "어떤 field를 서버가 계산할 것인가"는 제품 설계 해석입니다. 이 둘을 섞으면 tool 지원 기능이 곧 운영 안전성이라고 착각하기 쉽습니다.

결정	공식 기능	제품 설계 책임
schema	tool input 형식 지정	field 최소화와 versioning
execution	tool call 수신	권한, 승인, idempotency
error	실패 응답 전달	재시도 가능 여부와 사용자 문구
audit	호출 정보 확보 가능	누가, 왜, 무엇을 실행했는지 저장

이렇게 분리하면 Function Calling은 모델 기능이 아니라 backend boundary로 다룰 수 있습니다.

실무 적용 예시: 고객 지원 티켓 생성

고객 지원 챗봇이 대화 내용을 보고 티켓을 만들어야 한다고 가정해봅시다. 나쁜 설계는 모델에게 내부 ticket API의 모든 field를 채우게 하는 것입니다. 좋은 설계는 모델이 "티켓 생성 의도"만 구조화하고, 서버가 사용자 정보, 조직, SLA, 담당 queue, approval 상태를 채우는 방식입니다.

필드	모델이 채워도 되는가	이유
title	예	대화 요약에서 추출 가능
summary	예	사용자 문제 설명에 해당
priority	제한적으로	enum과 서버 보정 필요
assignee	아니오	조직 정책과 근무 상태 필요
customerTier	아니오	내부 CRM 권한 필요
idempotencyKey	서버 생성	중복 생성 방지

실패 사례는 timeout 뒤 재시도입니다. 모델이 create_support_ticket을 호출했고 provider 응답이 늦어졌습니다. 사용자는 다시 요청하고, 모델은 같은 내용을 한 번 더 호출합니다. idempotency key가 없다면 티켓이 두 개 생깁니다. approval 상태가 없다면 사용자가 "초안만 만들어줘"라고 했는데 실제 티켓이 발송될 수도 있습니다.

따라서 write tool은 세 단계로 나누는 편이 안전합니다. 첫 번째는 모델이 command draft를 만들고, 두 번째는 서버가 validation과 idempotency key를 붙이고, 세 번째는 사용자 또는 policy가 승인한 뒤 내부 API를 실행합니다. 이 구조에서는 Function Calling 결과가 곧 side effect가 아니라, side effect 후보를 검증 가능한 command로 바꾸는 중간 산출물이 됩니다.

10. Q&A

Q1. 내부 API를 그대로 tool로 노출해도 되나요?

권장하지 않습니다. 모델용 tool은 내부 API보다 더 작고 제한적인 인터페이스여야 합니다.

Q2. 모델이 잘못된 tool을 고르면 어떻게 하나요?

서버가 tool call을 검증하고 거부해야 합니다. 모델 판단은 제안일 뿐입니다.

Q3. Function Calling은 Agent와 같은 건가요?

Agent를 구성하는 요소가 될 수 있지만 동일하지는 않습니다. Function Calling은 외부 기능 호출을 구조화하는 인터페이스입니다.

11. 참고자료와 불확실성

참고자료

OpenAI Function Calling: https://platform.openai.com/docs/guides/function-calling
OpenAI Structured Outputs: https://platform.openai.com/docs/guides/structured-outputs
AWS Builders Library — Idempotent APIs: https://aws.amazon.com/builders-library/making-retries-safe-with-idempotent-APIs/

불확실성

Function Calling의 세부 API와 지원 모델은 시점에 따라 달라질 수 있습니다.
실제 tool approval 기준은 조직의 보안·운영 정책에 맞춰야 합니다.

Where to draw the line between LLM and internal API

In this article, we outline what boundaries should be drawn when connecting LLM and internal API using function calling.

function calling allows models to interact with external systems. In addition to answering user questions, you can also query data in the database, create tickets, search documents, and run workflows.

But an important question arises here.

How much should we leave to the model? Which API will be exposed as a tool? Who will do the permission checking? How do you bounce back when you fail?

Analysis base date: 2026-05-12 Practice environment: OpenAI API, FastAPI, PostgreSQL Key references: OpenAI function calling Docs, Structured Outputs Docs, AWS Idempotency

Key takeaways

function calling should be understood as a method in which the model structures the call intent and delivers it to the backend, rather than directly executing an internal API.
The backend tool executor must have actual execution permission.
Read-only tools and mutating tools must be separated.
State change tools may require idempotency, audit logs, and human approval.
The tool schema should not expose the entire internal API, but only provide the minimum interface required for the model.

1. How to understand function calling

function calling is dangerous when understood as the ability of a model to “directly execute” a function. More precisely, it is a function that suggests in a structured form which function the model should call with which arguments.

The actual execution must be done by the backend.

1# This is an example.2User request3→ LLM decides tool call4→ Backend validates tool call5→ Permission check6→ Execute internal API7→ Return tool result to model8→ Generate final response

The model is the decision assistant, and the backend is the executor.

2. What is Tool Boundary?

Tool Boundary is the boundary of functions exposed to the model.

Bad example:

1# This is an example.2execute_sql(query: string)

This tool is so powerful. Models can generate arbitrary SQL, and permissions and safety are difficult to control.

Good example:

1# This is an example.2search_documents(query: string, scope: enum, limit: integer)3get_document_summary(document_id: string)4create_support_ticket(title: string, description: string, priority: enum)

The actions required of the model should be limited to small tools.

3. Separation of Read Tool and Write Tool

Tool type	example	risk	protection device
Read	Document search, status inquiry	low to medium	Permission filter, rate limit
Write	Create ticket, send email	medium to high	idempotency, approval
Destructive	Delete, refund, change permissions	height	Default ban or strong approval

For initial function calling, it is safe to start with a read-only tool.

4. Tool schema design

Example document search tool:

1// This is an example JSON structure.2{3  "name": "search_documents",4  "description": "Search for relevant content in documents accessible to the user.",5  "parameters": {6    "type": "object",7    "properties": {8      "query": {9        "type": "string",10        "description": "Natural language query to search"11      },12      "scope": {13        "type": "string",14        "enum": ["official_docs", "papers", "tech_blogs"]15      },16      "limit": {17        "type": "integer",18        "minimum": 1,19        "maximum": 1020      }21    },22    "required": ["query", "scope", "limit"],23    "additionalProperties": false24  }25}

The design points are as follows:

1# This is an example.2[ ] Limit the scope with enum.3[ ] limit Sets the maximum value.4[ ] Internal ID or permission conditions are determined by the server.5[ ] The user's original text input is not passed on to the internal API as is.

5. Authorization and authentication

Permission checking should be done by the server, not the model.

1# This is an example.2User A asks a question3to model suggests calling search_documents4to server checks user A's document permissions5to Search only authorized scopes6to return result

It is not enough to tell the model, “Retrieve only documents accessible to this user.” The actual filter must be enforced on the backend.

6. Pre-execution Verification and Human Approval

State change tools require verification before execution.

1# This is an example.2Create LLM tool call3→ schema validation4→ business rule validation5→ permission check6→ risk classification7→ human approval if needed8→ execute

For example, sending emails, creating tickets, or changing customer status may require human approval.

1// This is an example JSON structure.2{3  "tool": "create_support_ticket",4  "arguments": {5    "title": "Redis cache issue",6    "description": "Users reported Redis cache miss issues.",7    "priority": "medium"8  },9  "requires_approval": true10}

7. Audit logs and reproducibility

function calling must leave an audit log.

log entry	reason
user_id	who requested it
tool_name	Which tool was called
tool_arguments	What arguments were used
validation_result	Verification passed or not
execution_result	execution result
trace_id	Failure analysis
prompt_version	Reproducibility
model	quality analysis

However, sensitive information must be masked.

8. Failure and rollback

Tool execution may fail.

failure type	react
schema validation failed	Retry model or fallback
No permission	refuse to run tool
Internal API timeout	retry or user guidance
duplicate request	Return existing results with idempotency key
Invalid state change	rollback or compensation transaction

It is safer not to expose state change tools unless there is a rollback plan.

9. Practical checklist

1# This is an example.2[ ] Are the read tool and write tool separated?3[ ] Does the tool schema follow the principle of least privilege?4[ ] Are there input restrictions such as enum, max limit, etc.?5[ ] Is permission checking enforced on the server?6[ ] Does the status change tool have an idempotency key?7[ ] Is there human approval for hazardous work?8[ ] Are all tool calls left in the audit log?9[ ] Is there an explainable response to the user in case of failure?

Failure case: When the internal API is exposed as a tool

A common mistake seen in function calling is exposing an already existing backend API almost as a tool. For example, if the model can call endpoints such asPOST /tickets,PATCH /users/:id, andDELETE /documents/:id, the implementation will be faster. However, these APIs are designed with the assumption that they are explicitly called by a person or server. If you give it to the model as is, the scope of arguments, permissions, approval, and retry implications are too broad.

In particular, the write API is dangerous without idempotency. If the model sees a timeout and tries the same tool call again, but the first request was actually successful, two tickets may be created. For tasks that require confirmation from the user, simply writing “call with caution” in the tool description is not enough. The server must check the approval state before final execution.

Implementation example: Creating a command API for a model

If you have a narrow command surrounding the internal API, the boundary becomes clearer.

1type CreateTicketCommand = {2  kind: "create_support_ticket";3  idempotencyKey: string;4  title: string;5  summary: string;6  priority: "low" | "normal" | "high";7  requiresHumanApproval: true;8};

This command is intentionally smaller than the internal ticket API. Values such as assignee, internal label, and billing field are filled in by the server, making the model uneven. The model structures the proposal, “I need to create a ticket,” and the backend checks permissions and approval and calls the internal API.

Separate design interpretation from formal facts

What the official documentation provides is that “the model can make structured tool calls.” However, “which internal APIs will be exposed as tools,” “which tasks will be sent after approval,” and “which fields will be calculated by the server” are product design interpretations. If you mix the two, it's easy to mistake tool support for operational safety.

decision	official function	Product Design Responsibility
schema	Specify tool input format	Field minimization and versioning
execution	Receive tool call	authority, authorization, idempotency
error	Pass failure response	Retry availability and user phrase
audit	Call information can be obtained	Store who did what, why, and what

With this separation, function calling can be treated as a backend boundary rather than a model feature.

Practical example: Creating a customer support ticket

Let's say your customer support chatbot needs to view the conversation and create a ticket. A bad design would be to have the model populate every field in the internal ticket API. A good design is for the model to structure only the “ticket creation intent” and the server to populate the user information, organization, SLA, responsible queue, and approval status.

field	Can the model fill it?	reason
title	yes	Can be extracted from conversation summary
summary	yes	Corresponds to user problem description
priority	limitedly	Enum and server calibration required
assignee	no	Organizational policies and work status required.
customerTier	no	Internal CRM permission required
idempotencyKey	Create server	Avoid duplicate creation

Failure case is retry after timeout. The model calledcreate_support_ticketand the provider response was delayed. The user makes the request again, and the model calls the same thing one more time. If you do not have an idempotency key, you will have two tickets. If there is no approval status, the user may say "just create a draft" but an actual ticket will be sent.

Therefore, it is safer to divide the write tool into three steps. First, the model creates a command draft, second, the server attaches validation and idempotency keys, and third, the user or policy approves and executes the internal API. In this structure, the result of function calling is not a side effect, but an intermediate product that turns side effect candidates into verifiable commands.

10. Q&A

Q1. Can I expose the internal API as a tool?

Not recommended. Tools for models should have a smaller and more limited interface than the internal API.

Q2. What if I select the wrong tool for the model?

The server must verify and reject the tool call. Model judgments are suggestions only.

Q3. Is function calling the same as Agent?

They can be components of an Agent, but they are not the same. function calling is an interface that structures external function calls.

11. References and uncertainty

References

OpenAI function calling:https://platform.openai.com/docs/guides/function-calling
OpenAI Structured Outputs:https://platform.openai.com/docs/guides/structured-outputs
AWS Builders Library — Idempotent APIs:https://aws.amazon.com/builders-library/making-retries-safe-with-idempotent-APIs/

uncertainty

function calling's detailed API and support model may vary depending on the time.
Actual tool approval criteria must be aligned with the organization's security and operational policies.

GitHub 계정으로 로그인하면 댓글을 남길 수 있습니다. 댓글은 GitHub Discussions를 통해 운영됩니다.