Core Concepts¶
The Semantic Cache¶
Traditional caches work on exact key matches. A semantic cache works on meaning. Two questions that are phrased differently but mean the same thing — "How many active users do we have?" and "Count of active users" — should return the same cached SQL query.
Medha solves a specific problem: LLM-backed Text-to-Query systems spend most of their inference budget regenerating structurally identical queries. Once a question has been translated to SQL (or Cypher or GraphQL), repeating that translation is pure waste. Medha stores the question embedding and its resulting query, and short-circuits the LLM for any semantically equivalent future question.
The value proposition is direct: lower latency, lower cost, and deterministic query output for common patterns.
Waterfall Search¶
Medha evaluates every incoming question through five tiers in sequence, stopping at the first match:
graph TD
Q[Incoming Question] --> L1[Tier 0: L1 Cache\nexact hash match]
L1 -->|HIT| R[Return CacheHit]
L1 -->|MISS| TM[Tier 1: Template Match\nintent + NER parameters]
TM -->|HIT| R
TM -->|MISS| EV[Tier 2: Exact Vector\ncosine ≥ 0.99]
EV -->|HIT| R
EV -->|MISS| SM[Tier 3: Semantic Match\ncosine ≥ 0.85]
SM -->|HIT| R
SM -->|MISS| FZ[Tier 4: Fuzzy Match\nLevenshtein ≥ 85]
FZ -->|HIT| R
FZ -->|MISS| MISS[Cache Miss]
| Tier | Strategy | Trigger | Typical Latency | Confidence |
|---|---|---|---|---|
| 0 | L1 Cache | Exact hash match on normalized text | < 0.1 ms | 1.0 |
| 1 | Template Match | Intent match + NER parameter extraction | 1–5 ms | 0.9–1.0 |
| 2 | Exact Vector | Cosine similarity ≥ 0.99 | 5–20 ms | 0.99 |
| 3 | Semantic | Cosine similarity ≥ 0.85 | 5–20 ms | score × 0.9 |
| 4 | Fuzzy | Levenshtein distance ratio ≥ 85 | 20–50 ms | varies |
Tier 4 (Fuzzy) is optional and disabled by default. Enable it in Settings when your user base uses highly inconsistent spelling or phrasing.
Scoring Model¶
Cosine Similarity¶
All vector tiers use cosine similarity to compare the incoming question embedding against stored embeddings:
The result is in the range [−1, 1], where 1.0 means identical direction (semantically equivalent questions). In practice, all scores are in [0, 1] because sentence embeddings are non-negative.
Semantic Confidence Penalty¶
When Tier 3 (Semantic Match) fires, the raw cosine score is penalized to reflect that the match is not exact:
This ensures that CacheHit.confidence from a semantic match is always strictly below a confidence from an exact vector match, giving downstream consumers a clear quality signal.
Data Flow¶
sequenceDiagram
participant App as Application
participant M as Medha
participant L1 as L1 Cache
participant VB as Vector Backend
participant LLM as LLM (external)
App->>M: search("How many active users?")
M->>L1: lookup(hash)
L1-->>M: MISS
M->>VB: query(embedding, top_k=1)
VB-->>M: [(entry, score=0.92)]
M-->>App: CacheHit(strategy=SEMANTIC_MATCH, confidence=0.828)
App->>LLM: (cache miss path)
LLM-->>App: "SELECT COUNT(*) FROM users WHERE active=true"
App->>M: store(question, query)
M->>L1: set(hash, entry)
M->>VB: upsert(embedding, entry)
M-->>App: ok
Key Types¶
| Type | Description |
|---|---|
CacheHit |
Returned on a successful search; contains generated_query, strategy, confidence, response_summary |
SearchStrategy |
Enum: L1_CACHE, TEMPLATE_MATCH, EXACT_VECTOR_MATCH, SEMANTIC_MATCH, FUZZY_MATCH |
QueryTemplate |
A parameterised question pattern with named slots (e.g. {city}, {date_range}) |
CacheEntry |
The stored record: question text, embedding, query, optional TTL |
CacheStats |
Aggregate statistics: total_hits, total_misses, hit_rate, per-strategy breakdown |
See the Types API reference for full field definitions.