Gokin implements a three-level memory system and intelligent context management.


Memory Types#

TypeScopeLifetimeDescription
SessionCurrent sessionUntil session endsTemporary notes
ProjectCurrent projectPersistentProject knowledge
GlobalAll projectsPersistentGeneral preferences

Entry Structure#

Entry {
    ID        string     // SHA256: "mem_" + hash[:8]
    Key       string     // Optional key for lookup
    Content   string     // Memory content
    Type      MemoryType // session, project, global
    Tags      []string   // Auto-extracted tags
    Timestamp time.Time
    Project   string     // Project identifier (path hash)
}

Auto-Tagging#

Tags are automatically extracted from content:

  • File paths: /[a-zA-Z0-9_.\-/]+
  • Function names: function [a-zA-Z_][a-zA-Z0-9_]*
  • Package names: package [a-zA-Z_][a-zA-Z0-9_]*

Memory Tools#

ToolDescription
memoryCRUD: add, get, search, list, remove
memorizeTyped storage: fact, preference, convention, pattern
pin_contextPin text to system prompt for the session
history_searchRegex search through session history

Auto-Inject#

When memory.auto_inject: true, relevant memories are automatically added to the system prompt based on conversation context.

Storage#

  • Persistent storage in JSON files
  • Debounced writes (coalescing frequent saves)
  • Separated by project (path hash)
  • Entry limit (default: 1000)
  • Key replacement (same key overwrites)

Context Management#

Token Counting#

Token counting with caching and fallback estimation:

  • LRU cache: 1000 entries
  • API counting: via provider client API
  • Fallback: character-based estimation (~4 chars = 1 token)
  • Async: semaphore for 3 parallel requests

Token Limits by Model#

ModelInputOutput
gemini-3-flash1M65K
gemini-3-pro1M65K
gemini-2.5-flash1M8K
gemini-2.5-pro1M8K
gemini-2.0-flash1M8K
glm-4.7128K131K

Context Warnings#

TokenUsage {
    InputTokens  int
    MaxTokens    int
    PercentUsed  float64
    NearLimit    bool      // > warning_threshold (80%)
    ExceedsLimit bool      // > max
    IsEstimate   bool      // API call failed
}

Auto-Summarization#

When context reaches warning_threshold (80%), automatic summarization triggers.

Summarization priorities (what is preserved):

  1. File paths and modified functions/methods
  2. Error messages and their solutions
  3. Component dependencies
  4. Configuration changes
  5. Architectural decisions
  6. Unresolved issues and next steps

Exclusions (what is removed):

  • Verbose tool output and raw logs
  • Intermediate failed attempts
  • UI confirmations
  • Repeated file reads

Hierarchical Summarization#

For large conversations (>100 messages):

  1. Split into chunks
  2. Intermediate summaries for each chunk
  3. Merge summaries into final

Result Compaction#

ResultCompactor compresses long tool results:

  • Errors — higher limit (10K chars), fully preserved
  • Successful results — trimmed to tool_result_max_chars
  • Logs — preserve beginning and end (head/tail)

Recognized error indicators:

"error:", "panic:", "fatal:", "stack trace:", "exception:",
"traceback:", "undefined:", "cannot use", "--- fail",
"permission denied", "not found", "syntax error",
"compilation failed", "build failed"

Context Configuration#

context:
  warning_threshold: 0.8           # Warning at 80%
  summarization_ratio: 0.5         # Summarize to 50%
  tool_result_max_chars: 10000     # Max chars per tool result
  enable_auto_summary: true        # Auto-summarization
  auto_compact_threshold: 0.75     # Auto-compact at 75%

Summary Cache#

Summaries are cached for reuse:

  • Hash-based key from content
  • Prevents re-summarizing identical context
  • Invalidated on conversation changes
GitHub MIT License © Gokin Contributors