Gokin implements a three-level memory system and intelligent context management.
Memory Types#
| Type | Scope | Lifetime | Description |
|---|---|---|---|
| Session | Current session | Until session ends | Temporary notes |
| Project | Current project | Persistent | Project knowledge |
| Global | All projects | Persistent | General preferences |
Entry Structure#
Entry {
ID string // SHA256: "mem_" + hash[:8]
Key string // Optional key for lookup
Content string // Memory content
Type MemoryType // session, project, global
Tags []string // Auto-extracted tags
Timestamp time.Time
Project string // Project identifier (path hash)
}Auto-Tagging#
Tags are automatically extracted from content:
- File paths:
/[a-zA-Z0-9_.\-/]+ - Function names:
function [a-zA-Z_][a-zA-Z0-9_]* - Package names:
package [a-zA-Z_][a-zA-Z0-9_]*
Memory Tools#
| Tool | Description |
|---|---|
| memory | CRUD: add, get, search, list, remove |
| memorize | Typed storage: fact, preference, convention, pattern |
| pin_context | Pin text to system prompt for the session |
| history_search | Regex search through session history |
Auto-Inject#
When memory.auto_inject: true, relevant memories are automatically added to the system prompt based on conversation context.
Storage#
- Persistent storage in JSON files
- Debounced writes (coalescing frequent saves)
- Separated by project (path hash)
- Entry limit (default: 1000)
- Key replacement (same key overwrites)
Context Management#
Token Counting#
Token counting with caching and fallback estimation:
- LRU cache: 1000 entries
- API counting: via provider client API
- Fallback: character-based estimation (~4 chars = 1 token)
- Async: semaphore for 3 parallel requests
Token Limits by Model#
| Model | Input | Output |
|---|---|---|
| gemini-3-flash | 1M | 65K |
| gemini-3-pro | 1M | 65K |
| gemini-2.5-flash | 1M | 8K |
| gemini-2.5-pro | 1M | 8K |
| gemini-2.0-flash | 1M | 8K |
| glm-4.7 | 128K | 131K |
Context Warnings#
TokenUsage {
InputTokens int
MaxTokens int
PercentUsed float64
NearLimit bool // > warning_threshold (80%)
ExceedsLimit bool // > max
IsEstimate bool // API call failed
}Auto-Summarization#
When context reaches warning_threshold (80%), automatic summarization triggers.
Summarization priorities (what is preserved):
- File paths and modified functions/methods
- Error messages and their solutions
- Component dependencies
- Configuration changes
- Architectural decisions
- Unresolved issues and next steps
Exclusions (what is removed):
- Verbose tool output and raw logs
- Intermediate failed attempts
- UI confirmations
- Repeated file reads
Hierarchical Summarization#
For large conversations (>100 messages):
- Split into chunks
- Intermediate summaries for each chunk
- Merge summaries into final
Result Compaction#
ResultCompactor compresses long tool results:
- Errors — higher limit (10K chars), fully preserved
- Successful results — trimmed to
tool_result_max_chars - Logs — preserve beginning and end (head/tail)
Recognized error indicators:
"error:", "panic:", "fatal:", "stack trace:", "exception:",
"traceback:", "undefined:", "cannot use", "--- fail",
"permission denied", "not found", "syntax error",
"compilation failed", "build failed"Context Configuration#
context:
warning_threshold: 0.8 # Warning at 80%
summarization_ratio: 0.5 # Summarize to 50%
tool_result_max_chars: 10000 # Max chars per tool result
enable_auto_summary: true # Auto-summarization
auto_compact_threshold: 0.75 # Auto-compact at 75%Summary Cache#
Summaries are cached for reuse:
- Hash-based key from content
- Prevents re-summarizing identical context
- Invalidated on conversation changes