Observability¶
Medha exposes hit/miss counters, per-strategy breakdowns, and latency percentiles through the CacheStats object and standard Python logging.
CacheStats Fields¶
Retrieve statistics with get_stats():
async with Medha("demo", embedder=embedder, settings=settings) as cache:
# ... store and search calls ...
stats = await cache.get_stats()
| Field | Type | Description |
|---|---|---|
total_hits |
int |
Number of successful cache lookups |
total_misses |
int |
Number of cache misses |
hit_rate |
float |
Fraction of requests that hit the cache |
avg_latency_ms |
float |
Mean search latency across all requests |
p50_latency_ms |
float |
Median search latency |
p95_latency_ms |
float |
95th-percentile search latency |
p99_latency_ms |
float |
99th-percentile search latency |
by_strategy |
dict[SearchStrategy, StrategyStats] |
Per-tier breakdown |
Hit Rate¶
\[\text{hit\_rate} = \frac{\text{total\_hits}}{\text{total\_hits} + \text{total\_misses}}\]
A hit rate of 0.8 means 80% of LLM calls were avoided.
Latency Percentiles¶
Search latency is tracked per request. Percentiles are computed over a rolling window:
stats = await cache.get_stats()
print(f"P50: {stats.p50_latency_ms:.1f} ms")
print(f"P95: {stats.p95_latency_ms:.1f} ms")
print(f"P99: {stats.p99_latency_ms:.1f} ms")
Expected ranges (in-memory backend, FastEmbed):
| Tier | P50 | P95 |
|---|---|---|
| L1 Cache | < 0.1 ms | < 0.5 ms |
| Template Match | 1–3 ms | 5 ms |
| Exact / Semantic Vector | 5–15 ms | 20 ms |
| Fuzzy | 20–40 ms | 50 ms |
Per-Strategy Breakdown¶
stats.by_strategy maps each SearchStrategy to a StrategyStats object:
from medha.types import SearchStrategy
stats = await cache.get_stats()
for strategy, s in stats.by_strategy.items():
print(f"{strategy.name}: hits={s.hits}, avg={s.avg_latency_ms:.1f} ms")
Output example:
L1_CACHE: hits=142, avg=0.08 ms
TEMPLATE_MATCH: hits=38, avg=2.1 ms
EXACT_VECTOR_MATCH: hits=21, avg=8.3 ms
SEMANTIC_MATCH: hits=64, avg=11.2 ms
FUZZY_MATCH: hits=5, avg=31.4 ms
Logging¶
Use setup_logging() to configure the medha logger:
from medha.logging import setup_logging
# Human-readable text format
setup_logging(level="INFO", format="text")
# Structured JSON for log aggregation (Datadog, CloudWatch, etc.)
setup_logging(level="INFO", format="json")
Or configure the medha logger directly:
Key log events:
| Event | Level | Description |
|---|---|---|
cache.hit |
INFO | A search returned a cache hit |
cache.miss |
INFO | A search returned no result |
backend.init |
INFO | Backend connected successfully |
backend.error |
ERROR | Backend connection or query failed |
cleanup.run |
DEBUG | Background cleanup sweep started |
cleanup.deleted |
DEBUG | Number of expired entries removed |
Prometheus Integration¶
Medha does not ship a Prometheus exporter, but CacheStats is easy to bridge:
import asyncio
from prometheus_client import Counter, Histogram, start_http_server
from medha import Medha, Settings
from medha.embeddings.fastembed_adapter import FastEmbedAdapter
hits_counter = Counter("medha_hits_total", "Cache hits", ["strategy"])
misses_counter = Counter("medha_misses_total", "Cache misses")
latency_hist = Histogram(
"medha_search_latency_seconds",
"Search latency",
buckets=[0.001, 0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1.0],
)
async def search_with_metrics(cache, question: str):
import time
t0 = time.perf_counter()
hit = await cache.search(question)
elapsed = time.perf_counter() - t0
latency_hist.observe(elapsed)
if hit:
hits_counter.labels(strategy=hit.strategy.name).inc()
else:
misses_counter.inc()
return hit
async def main():
start_http_server(8000) # expose /metrics on :8000
settings = Settings(backend_type="memory")
async with Medha("demo", embedder=FastEmbedAdapter(), settings=settings) as cache:
while True:
await search_with_metrics(cache, "How many users?")
await asyncio.sleep(1)
asyncio.run(main())
Access metrics at http://localhost:8000/metrics.