Embedders¶
Embedder classes implement the BaseEmbedder ABC. Pass any embedder to Medha at construction time.
See the Embedders guide for comparison, install instructions, and usage examples.
FastEmbedAdapter¶
Bases: BaseEmbedder
Embedding adapter using Onnx Runtime via FastEmbed.
Supports any model available in the fastembed registry or custom HuggingFace models via ONNX export.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model_name
|
str
|
Model identifier. Defaults to "BAAI/bge-small-en-v1.5". |
'BAAI/bge-small-en-v1.5'
|
max_length
|
int
|
Maximum token length. Defaults to 512. |
512
|
cache_dir
|
str | None
|
Optional directory for model cache. |
None
|
Raises:
| Type | Description |
|---|---|
EmbeddingError
|
If the model cannot be loaded. |
Source code in src/medha/embeddings/fastembed_adapter.py
25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 | |
aembed(text)
async
¶
Generate embedding for a single text.
FastEmbed is synchronous under the hood; we run it in a thread to avoid blocking the event loop.
Source code in src/medha/embeddings/fastembed_adapter.py
aembed_batch(texts, **kwargs)
async
¶
Generate embeddings for multiple texts.
Uses fastembed's native batching for efficiency.
Source code in src/medha/embeddings/fastembed_adapter.py
OpenAIAdapter¶
Bases: BaseEmbedder
Embedding adapter using the OpenAI Embeddings API.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model_name
|
str
|
OpenAI model identifier. Defaults to "text-embedding-3-small". |
'text-embedding-3-small'
|
api_key
|
str | None
|
OpenAI API key. If None, reads from OPENAI_API_KEY env var. |
None
|
dimensions
|
int | None
|
Optional dimension override (for models that support it). |
None
|
Raises:
| Type | Description |
|---|---|
EmbeddingError
|
If the OpenAI client cannot be initialized. |
Source code in src/medha/embeddings/openai_adapter.py
24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 | |
aembed(text)
async
¶
Generate embedding via OpenAI API.
Uses the async client for non-blocking operation.
Source code in src/medha/embeddings/openai_adapter.py
aembed_batch(texts, **kwargs)
async
¶
Generate embeddings for a batch via OpenAI API.
OpenAI's API natively supports batched input. Respects rate limits via the client's built-in retry logic.
Source code in src/medha/embeddings/openai_adapter.py
CohereAdapter¶
Bases: BaseEmbedder
Embedding adapter using the Cohere Embed API v2.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
api_key
|
str
|
Cohere API key. |
required |
model
|
str
|
Cohere embedding model. Defaults to "embed-multilingual-v3.0". |
'embed-multilingual-v3.0'
|
input_type_query
|
str
|
Input type used for query embeddings. |
'search_query'
|
input_type_document
|
str
|
Input type used for document embeddings. |
'search_document'
|
embedding_types
|
list[str] | None
|
Optional list of embedding types to request. |
None
|
Raises:
| Type | Description |
|---|---|
ConfigurationError
|
If the cohere package is not installed. |
Source code in src/medha/embeddings/cohere_adapter.py
21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 | |
aembed(text)
async
¶
Generate a query embedding via Cohere API.
Source code in src/medha/embeddings/cohere_adapter.py
aembed_batch(texts, **kwargs)
async
¶
Generate embeddings for multiple texts.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
texts
|
list[str]
|
List of texts to embed. |
required |
**kwargs
|
Any
|
Pass |
{}
|
Source code in src/medha/embeddings/cohere_adapter.py
GeminiAdapter¶
Bases: BaseEmbedder
Embedding adapter using the Google Gemini Embedding API.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
api_key
|
str
|
Google AI API key. |
required |
model
|
str
|
Gemini embedding model. Defaults to "models/text-embedding-004". |
'models/text-embedding-004'
|
task_type_query
|
str
|
Task type for query embeddings. |
'RETRIEVAL_QUERY'
|
task_type_document
|
str
|
Task type for document embeddings. |
'RETRIEVAL_DOCUMENT'
|
output_dimensionality
|
int | None
|
Optional dimension truncation. |
None
|
Raises:
| Type | Description |
|---|---|
ConfigurationError
|
If the google-genai package is not installed. |
Source code in src/medha/embeddings/gemini_adapter.py
26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 | |
aembed(text)
async
¶
Generate a query embedding via Gemini API (via thread to avoid blocking).
Source code in src/medha/embeddings/gemini_adapter.py
aembed_batch(texts, **kwargs)
async
¶
Generate embeddings for multiple texts in chunks of 100.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
texts
|
list[str]
|
List of texts to embed. |
required |
**kwargs
|
Any
|
Pass |
{}
|