Skip to content

Core Concepts


The Semantic Cache

Traditional caches work on exact key matches. A semantic cache works on meaning. Two questions that are phrased differently but mean the same thing — "How many active users do we have?" and "Count of active users" — should return the same cached SQL query.

Medha solves a specific problem: LLM-backed Text-to-Query systems spend most of their inference budget regenerating structurally identical queries. Once a question has been translated to SQL (or Cypher or GraphQL), repeating that translation is pure waste. Medha stores the question embedding and its resulting query, and short-circuits the LLM for any semantically equivalent future question.

The value proposition is direct: lower latency, lower cost, and deterministic query output for common patterns.


Medha evaluates every incoming question through five tiers in sequence, stopping at the first match:

graph TD
    Q[Incoming Question] --> L1[Tier 0: L1 Cache\nexact hash match]
    L1 -->|HIT| R[Return CacheHit]
    L1 -->|MISS| TM[Tier 1: Template Match\nintent + NER parameters]
    TM -->|HIT| R
    TM -->|MISS| EV[Tier 2: Exact Vector\ncosine ≥ 0.99]
    EV -->|HIT| R
    EV -->|MISS| SM[Tier 3: Semantic Match\ncosine ≥ 0.85]
    SM -->|HIT| R
    SM -->|MISS| FZ[Tier 4: Fuzzy Match\nLevenshtein ≥ 85]
    FZ -->|HIT| R
    FZ -->|MISS| MISS[Cache Miss]
Tier Strategy Trigger Typical Latency Confidence
0 L1 Cache Exact hash match on normalized text < 0.1 ms 1.0
1 Template Match Intent match + NER parameter extraction 1–5 ms 0.9–1.0
2 Exact Vector Cosine similarity ≥ 0.99 5–20 ms 0.99
3 Semantic Cosine similarity ≥ 0.85 5–20 ms score × 0.9
4 Fuzzy Levenshtein distance ratio ≥ 85 20–50 ms varies

Tier 4 (Fuzzy) is optional and disabled by default. Enable it in Settings when your user base uses highly inconsistent spelling or phrasing.


Scoring Model

Cosine Similarity

All vector tiers use cosine similarity to compare the incoming question embedding against stored embeddings:

\[\text{cosine}(\vec{q}, \vec{e}) = \frac{\vec{q} \cdot \vec{e}}{\|\vec{q}\| \|\vec{e}\|}\]

The result is in the range [−1, 1], where 1.0 means identical direction (semantically equivalent questions). In practice, all scores are in [0, 1] because sentence embeddings are non-negative.

Semantic Confidence Penalty

When Tier 3 (Semantic Match) fires, the raw cosine score is penalized to reflect that the match is not exact:

\[\text{confidence}_{\text{semantic}} = \cos(\vec{q}, \vec{e}) \times 0.9\]

This ensures that CacheHit.confidence from a semantic match is always strictly below a confidence from an exact vector match, giving downstream consumers a clear quality signal.


Data Flow

sequenceDiagram
    participant App as Application
    participant M as Medha
    participant L1 as L1 Cache
    participant VB as Vector Backend
    participant LLM as LLM (external)

    App->>M: search("How many active users?")
    M->>L1: lookup(hash)
    L1-->>M: MISS
    M->>VB: query(embedding, top_k=1)
    VB-->>M: [(entry, score=0.92)]
    M-->>App: CacheHit(strategy=SEMANTIC_MATCH, confidence=0.828)

    App->>LLM: (cache miss path)
    LLM-->>App: "SELECT COUNT(*) FROM users WHERE active=true"
    App->>M: store(question, query)
    M->>L1: set(hash, entry)
    M->>VB: upsert(embedding, entry)
    M-->>App: ok

Key Types

Type Description
CacheHit Returned on a successful search; contains generated_query, strategy, confidence, response_summary
SearchStrategy Enum: L1_CACHE, TEMPLATE_MATCH, EXACT_VECTOR_MATCH, SEMANTIC_MATCH, FUZZY_MATCH
QueryTemplate A parameterised question pattern with named slots (e.g. {city}, {date_range})
CacheEntry The stored record: question text, embedding, query, optional TTL
CacheStats Aggregate statistics: total_hits, total_misses, hit_rate, per-strategy breakdown

See the Types API reference for full field definitions.