Contributing¶
Contributions are welcome — bug fixes, new backends, new embedders, documentation improvements, and test coverage expansions.
Development Setup¶
Clone the repo and install all extras with uv:
Install pre-commit hooks:
The hooks run ruff (lint + format), mypy (type check), and pytest (fast unit tests) on every commit.
Running Tests¶
Tests are organized with pytest markers:
# Fast unit tests only (no external services)
pytest -m unit
# Integration tests against Qdrant (requires Docker)
docker run -d -p 6333:6333 qdrant/qdrant
pytest -m integration
# All tests
pytest
# With coverage report
pytest --cov=medha --cov-report=term-missing
Marker definitions are in pyproject.toml under [tool.pytest.ini_options].
Adding a New Backend¶
- Create
src/medha/backends/my_backend.py -
Implement all abstract methods from
VectorStorageBackend:from medha.interfaces.storage import VectorStorageBackend from medha.types import CacheEntry class MyBackend(VectorStorageBackend): async def initialize(self, collection: str, dimension: int) -> None: # Connect and create/verify the collection ... async def upsert(self, entries: list[CacheEntry]) -> None: # Insert or update entries ... async def query( self, vector: list[float], top_k: int ) -> list[tuple[CacheEntry, float]]: # Return (entry, cosine_score) pairs sorted by score descending ... async def delete(self, entry_ids: list[str]) -> None: ... async def count(self) -> int: ... async def close(self) -> None: # Release connections / cleanup resources ... -
Register the backend in
src/medha/backends/__init__.py: -
Add the optional dependency in
pyproject.toml: -
Add
!!! info "Install"and a config snippet to backends.md - Write integration tests in
tests/backends/test_my_backend.pywith@pytest.mark.integration
Adding a New Embedder¶
- Create
src/medha/embeddings/my_embedder_adapter.py -
Subclass
BaseEmbedder:from medha.interfaces.embedder import BaseEmbedder class MyEmbedderAdapter(BaseEmbedder): def __init__(self, api_key: str, model: str = "default-model"): self._client = MyEmbedderClient(api_key=api_key) self._model = model async def embed(self, text: str) -> list[float]: response = await self._client.embed(text, model=self._model) return response.vector async def embed_batch(self, texts: list[str]) -> list[list[float]]: responses = await self._client.embed_batch(texts, model=self._model) return [r.vector for r in responses] -
Add the optional dependency to
pyproject.toml - Document it in embedders.md
- Write unit tests in
tests/embeddings/test_my_embedder_adapter.py
Code Style¶
| Rule | Value |
|---|---|
| Linter | ruff (configured in pyproject.toml) |
| Type checker | mypy --strict |
| Line length | 120 characters |
| Docstrings | Google style |
| Import order | ruff isort-compatible |
Run checks locally:
Submitting a Pull Request¶
- Branch from
main:git checkout -b feat/my-feature - Commits follow Conventional Commits:
feat: add LanceDB backendfix: handle Qdrant timeout on large collectionsdocs: add pgvector configuration exampletest: add integration tests for Chroma backend
- Changelog — add an entry to
CHANGELOG.mdunder[Unreleased] - Open the PR against
mainwith a description that explains why, not just what - All CI checks must pass before merge:
ruff,mypy, unit tests, andmkdocs build --strict