Batch Operations¶
Medha provides several methods for efficiently loading large numbers of entries into the cache without calling store() in a loop.
store_batch¶
Store a list of (question, query) tuples in a single call. Embeddings are computed concurrently up to batch_embed_concurrency:
pairs = [
("How many users?", "SELECT COUNT(*) FROM users"),
("List all products", "SELECT * FROM products"),
("Active sessions count", "SELECT COUNT(*) FROM sessions WHERE active = true"),
]
async with Medha("demo", embedder=embedder, settings=settings) as cache:
await cache.store_batch(pairs)
store_many¶
Like store_batch, but accepts a list of dicts with full metadata (TTL, tags, response summary):
entries = [
{
"question": "How many users?",
"generated_query": "SELECT COUNT(*) FROM users",
"tags": ["users", "aggregation"],
"ttl_seconds": 3600,
},
{
"question": "List all products",
"generated_query": "SELECT * FROM products",
"tags": ["products"],
},
]
async with Medha("demo", embedder=embedder, settings=settings) as cache:
await cache.store_many(entries)
warm_from_file¶
Load entries from a JSON or JSONL file. Useful for seeding a new deployment from a curated query library.
async with Medha("demo", embedder=embedder, settings=settings) as cache:
count = await cache.warm_from_file("queries.json")
print(f"Loaded {count} entries")
Note
File reads are restricted to Settings.allowed_file_dir when set, and files exceeding Settings.max_file_size_mb are rejected.
warm_from_dataframe¶
Load entries from a pandas DataFrame with configurable column mapping:
import pandas as pd
df = pd.DataFrame({
"nl_question": ["How many users?", "List all products"],
"sql_query": ["SELECT COUNT(*) FROM users", "SELECT * FROM products"],
"category": ["users", "products"],
})
async with Medha("demo", embedder=embedder, settings=settings) as cache:
count = await cache.warm_from_dataframe(
df,
question_col="nl_question",
query_col="sql_query",
tags_col="category", # optional
)
print(f"Loaded {count} entries")
export_to_dataframe¶
Export the entire cache collection to a pandas DataFrame for analysis or migration:
async with Medha("demo", embedder=embedder, settings=settings) as cache:
df = await cache.export_to_dataframe()
print(df.columns)
# ['id', 'question', 'generated_query', 'strategy', 'confidence', 'tags', 'created_at']
df.to_csv("cache_export.csv", index=False)
dedup_collection¶
Remove duplicate entries from the collection. Two entries are considered duplicates when their stored questions have cosine similarity above a configurable threshold:
async with Medha("demo", embedder=embedder, settings=settings) as cache:
removed = await cache.dedup_collection(similarity_threshold=0.99)
print(f"Removed {removed} duplicate entries")
Tuning batch_embed_concurrency
The batch_embed_concurrency setting (default 8) controls how many embedding requests run in parallel during batch ingestion. Increase it for cloud embedders with high rate limits; decrease it if you see rate-limit errors or memory pressure from large ONNX models.
settings = Settings(
batch_embed_concurrency=16, # double the default
batch_upsert_size=200, # larger upsert batches
)
For FastEmbedAdapter (ONNX in-process), batch_embed_concurrency governs thread-pool parallelism. For API-based embedders, it governs concurrent HTTP requests.