SDK 내부 Interface 확장을 통한 0ms 지연시간 Local Inference 구현

How to Unlock Local Inference in the Google Gemini SDK (Without Forking)

Agustin Sacco2026년 4월 26일2분advanced

AI 요약

Context

@google/gemini-cli-core SDK의 기본 ClassifierStrategy로 인한 강제적 Cloud Routing 및 API Key 의존성 발생. 모듈형 오케스트레이터 구조임에도 불구하고 Local Model 지원을 위한 명시적 모드 부재로 인한 제약 존재.

Technical Solution

특정 Model Name 명시를 통한 OverrideStrategy 트리거로 기본 Cloud Router 우회 설계
ContentGenerator Interface 구현을 통한 generateContent 및 streamGenerateContent 호출 인터셉트
SDK의 Multi-part 메시지를 OpenAI 호환 JSON으로 변환하는 데이터 매핑 레이어 구축
response.functionCalls prototype getter 수동 매핑을 통한 Local Model의 Tool-calling 루프 통합
표준 Interface 준수를 통한 SDK Core 업데이트 시 하위 호환성 유지 구조 확보

실천 포인트

1. SDK 도입 시 기본 전략(Default Strategy) 외에 Override 가능한 인터페이스 존재 여부 확인

2. 외부 의존성 제거를 위해 추상화된 Interface(예: ContentGenerator) 기반의 어댑터 패턴 적용 검토

3. Tool-calling 구현 시 프로토타입 게터(Prototype Getter)를 통한 런타임 응답 매핑 가능성 분석

태그

#Strategy Pattern #Tool Calling #SDK Architecture #Local Inference #Interface Extension

원문 읽기