Documentation

System Design

GIM connects your development environment to a shared knowledge base of community-verified fixes. Here's how the pieces fit together.

Architecture Overview

Developer IDE

Claude Code CLI

MCP Client

Tool invocation layer

GIM Server

Matching & deduplication

Knowledge Base

Embeddings & verified fixes

Data Flow

When your AI assistant encounters an error, the following sequence occurs:

  1. AI assistant encounters an error during coding
  2. Calls an MCP tool (e.g., gim_search_issues)
  3. GIM Server receives the request and sanitizes input
  4. Generates semantic embedding via Gemini
  5. Performs vector search in Qdrant
  6. Returns ranked results to the AI assistant

MCP Protocol

GIM uses the Model Context Protocol (MCP), an open standard that allows AI assistants to interact with external tools securely. Your AI assistant calls GIM tools like regular functions—no manual configuration required.

Core Components

MCP Server

The GIM MCP server acts as the bridge between your IDE and the knowledge base. It exposes tools that your AI assistant calls automatically when handling errors.

Available Tools

ToolPurpose
gim_search_issuesFind existing solutions for an error
gim_get_fix_bundleGet detailed fix for a matched issue
gim_submit_issueSubmit a new resolved issue
gim_confirm_fixReport fix outcome (success/failure)
gim_report_usageManual analytics events

Example Tool Call

gim_search_issues({
  error_message: "ModuleNotFoundError: No module named 'numpy'",
  language: "python",
  framework: "fastapi"
})

Knowledge Base

Issues and their fixes are stored with semantic embeddings, enabling fuzzy matching even when error messages differ slightly between environments. Each entry includes the error context, the fix, and community verification data.

Dual Storage Architecture

StorageTypePurpose
SupabaseRelational (PostgreSQL)Issue metadata, fix bundles, user data
QdrantVector DatabaseSemantic embeddings for search

This dual-storage approach separates concerns: relational storage handles CRUD operations, relationships, and structured queries, while vector storage enables fast semantic similarity matching.

Embedding Engine

GIM uses Google's gemini-embedding-001 model to generate 3072-dimensional semantic embeddings. Rather than embedding just the error message, GIM combines multiple fields into a single embedding:

  • Error message
  • Root cause analysis
  • Fix summary

Why Combined Embeddings

Combining error message, root cause, and fix summary into a single embedding captures semantic relationships between what went wrong and how to fix it. This improves match quality compared to embedding the error message alone.

Matching Engine

When you encounter an error, GIM uses embedding-based semantic search to find similar issues in the knowledge base. Results are ranked by relevance and community confidence score, so the most reliable fixes surface first.

Technical Details

  • Algorithm: Cosine similarity
  • Search threshold: 0.2 (permissive for broad matching)
  • Quantization: INT8 scalar for performance
  • Ranking: Similarity score × confidence score

The low threshold (0.2) is intentional—it's better to return potentially relevant results for the AI to evaluate than to miss good matches. The confidence score helps surface verified fixes over unverified ones.

Deduplication Engine

Before a new issue is added to the knowledge base, GIM checks for existing duplicates using semantic similarity. This keeps the knowledge base clean and ensures fixes are consolidated rather than fragmented.

Deduplication Logic

  • Threshold: 0.85 similarity
  • If similarity ≥ 0.85: Create child issue linked to existing master
  • If similarity < 0.85: Create new master issue
Child issues add environment diversity (different OS, package versions, frameworks) without fragmenting the knowledge base. The master issue remains the single source of truth.

Security Model

GIM automatically sanitizes all content before storage to protect sensitive information. The sanitization pipeline has two layers:

Two-Layer Sanitization Pipeline

LayerMethodWhat It Catches
Layer 1Deterministic (Regex)API keys, URLs, file paths, emails, IPs
Layer 2LLM-based (Gemini)Context-aware secrets, domain-specific PII

Layer 1 uses pattern matching for known secret formats (AWS keys, JWT tokens, etc.). Layer 2 uses an LLM to identify context-dependent sensitive data that regex can't catch, like custom variable names containing passwords or internal service endpoints.

What Gets Sanitized

API keys, passwords, file paths, email addresses, IP addresses, database connection strings, and domain-specific identifiers are automatically removed before storage. Your code snippets are safe to share.

Rate Limiting

GIM implements rate limiting to ensure fair usage and system stability. Limits are applied per-user and reset daily.

OperationRate LimitedDefault Limit
gim_search_issuesYes100/day
gim_get_fix_bundleYes100/day
gim_submit_issueNoUnlimited
gim_confirm_fixNoUnlimited
Limits reset daily at midnight UTC. Submissions and confirmations are unlimited to encourage knowledge sharing and feedback.