Documentation

System Design

GIM connects your development environment to a shared knowledge base of community-verified fixes. Here's how the pieces fit together.

Architecture Overview

Developer IDE

Claude Code CLI

MCP Client

Tool invocation layer

GIM Server

Matching & deduplication

Knowledge Base

Embeddings & verified fixes

Data Flow

When your AI assistant encounters an error, the following sequence occurs:

AI assistant encounters an error during coding
Calls an MCP tool (e.g., gim_search_issues)
GIM Server receives the request and sanitizes input
Generates semantic embedding via Gemini
Performs vector search in Qdrant
Returns ranked results to the AI assistant

MCP Protocol

GIM uses the Model Context Protocol (MCP), an open standard that allows AI assistants to interact with external tools securely. Your AI assistant calls GIM tools like regular functions—no manual configuration required.

Core Components

MCP Server

The GIM MCP server acts as the bridge between your IDE and the knowledge base. It exposes tools that your AI assistant calls automatically when handling errors.

Available Tools

Tool	Purpose
`gim_search_issues`	Find existing solutions for an error
`gim_get_fix_bundle`	Get detailed fix for a matched issue
`gim_submit_issue`	Submit a new resolved issue
`gim_confirm_fix`	Report fix outcome (success/failure)
`gim_report_usage`	Manual analytics events

Example Tool Call

gim_search_issues({
  error_message: "ModuleNotFoundError: No module named 'numpy'",
  language: "python",
  framework: "fastapi"
})

Knowledge Base

Issues and their fixes are stored with semantic embeddings, enabling fuzzy matching even when error messages differ slightly between environments. Each entry includes the error context, the fix, and community verification data.

Dual Storage Architecture

Storage	Type	Purpose
Supabase	Relational (PostgreSQL)	Issue metadata, fix bundles, user data
Qdrant	Vector Database	Semantic embeddings for search

This dual-storage approach separates concerns: relational storage handles CRUD operations, relationships, and structured queries, while vector storage enables fast semantic similarity matching.

Embedding Engine

GIM uses Google's gemini-embedding-001 model to generate 3072-dimensional semantic embeddings. Rather than embedding just the error message, GIM combines multiple fields into a single embedding:

Error message
Root cause analysis
Fix summary

Why Combined Embeddings

Combining error message, root cause, and fix summary into a single embedding captures semantic relationships between what went wrong and how to fix it. This improves match quality compared to embedding the error message alone.

Matching Engine

When you encounter an error, GIM uses embedding-based semantic search to find similar issues in the knowledge base. Results are ranked by relevance and community confidence score, so the most reliable fixes surface first.

Technical Details

Algorithm: Cosine similarity
Search threshold: 0.2 (permissive for broad matching)
Quantization: INT8 scalar for performance
Ranking: Similarity score × confidence score

The low threshold (0.2) is intentional—it's better to return potentially relevant results for the AI to evaluate than to miss good matches. The confidence score helps surface verified fixes over unverified ones.

Deduplication Engine

Before a new issue is added to the knowledge base, GIM checks for existing duplicates using semantic similarity. This keeps the knowledge base clean and ensures fixes are consolidated rather than fragmented.

Deduplication Logic

Threshold: 0.85 similarity
If similarity ≥ 0.85: Create child issue linked to existing master
If similarity < 0.85: Create new master issue

Child issues add environment diversity (different OS, package versions, frameworks) without fragmenting the knowledge base. The master issue remains the single source of truth.

Security Model

GIM automatically sanitizes all content before storage to protect sensitive information. The sanitization pipeline has two layers:

Two-Layer Sanitization Pipeline

Layer	Method	What It Catches
Layer 1	Deterministic (Regex)	API keys, URLs, file paths, emails, IPs
Layer 2	LLM-based (Gemini)	Context-aware secrets, domain-specific PII

Layer 1 uses pattern matching for known secret formats (AWS keys, JWT tokens, etc.). Layer 2 uses an LLM to identify context-dependent sensitive data that regex can't catch, like custom variable names containing passwords or internal service endpoints.

What Gets Sanitized

API keys, passwords, file paths, email addresses, IP addresses, database connection strings, and domain-specific identifiers are automatically removed before storage. Your code snippets are safe to share.

Rate Limiting

GIM implements rate limiting to ensure fair usage and system stability. Limits are applied per-user and reset daily.

Operation	Rate Limited	Default Limit
`gim_search_issues`	Yes	100/day
`gim_get_fix_bundle`	Yes	100/day
`gim_submit_issue`	No	Unlimited
`gim_confirm_fix`	No	Unlimited

Limits reset daily at midnight UTC. Submissions and confirmations are unlimited to encourage knowledge sharing and feedback.

CLAUDE.md Setup Issue Lifecycle