Skip to main content

Advanced

Unlike Ahnlich DB, which is concerned with similarity algorithms and indexing, Ahnlich AI focuses on embedding generation. The service introduces model-aware stores, where you define the embedding models used for both data insertion (indexing) and querying. This abstraction lets developers work directly with raw inputs (text or images) while the AI proxy handles embedding generation.

Supported Models​

Ahnlich AI includes several pre-trained models that can be configured depending on your workload. These cover both text embeddings and image embeddings:

Model NameString NameTypeMax InputEmbedding DimDescription
ALL_MINI_LM_L6_V2all-minilm-l6-v2Text256 tokens384Lightweight sentence transformer. Fast and memory-efficient, ideal for semantic similarity in applications like FAQ search or chatbots.
ALL_MINI_LM_L12_V2all-minilm-l12-v2Text256 tokens384Larger variant of MiniLM. Higher accuracy for nuanced text similarity tasks, but with increased compute requirements.
BGE_BASE_EN_V15bge-base-en-v1.5Text512 tokens768Base version of the BGE (English v1.5) model. Balanced performance and speed, suitable for production-scale applications.
BGE_LARGE_EN_V15bge-large-en-v1.5Text512 tokens1024High-accuracy embedding model for semantic search and retrieval. Best choice when precision is more important than latency.
RESNET50resnet-50Image224x224 px2048Convolutional Neural Network (CNN) for extracting embeddings from images. Useful for content-based image retrieval and clustering.
CLIP_VIT_B32_IMAGEclip-vit-b32-imageImage224x224 px512Vision Transformer encoder from the CLIP model. Produces embeddings aligned with its paired text encoder for multimodal tasks.
CLIP_VIT_B32_TEXTclip-vit-b32-textText77 tokens512Text encoder from CLIP. Designed to map textual inputs into the same space as CLIP image embeddings for text-to-image or image-to-text search.
BUFFALO_Lbuffalo-lImage (Face)640x640 px512Face detection and recognition model. Detects faces in images and generates embeddings for each detected face. Non-commercial use only.
SFACE_YUNETsface-yunetImage (Face)640x640 px128Lightweight face detection (YuNet) + recognition (SFace) pipeline. Apache 2.0 / MIT licensed - commercially usable.
CLAP_AUDIOclap-audioAudio10 sec max512Audio encoder from the CLAP model. Produces embeddings from audio inputs for audio similarity search and audio-to-text retrieval.
CLAP_TEXTclap-textText512 tokens512Text encoder from the CLAP model. Maps textual descriptions into the same embedding space as CLAP audio embeddings for text-to-audio search.

Model Constraints​

Audio Models (CLAP)​

ConstraintValueNotes
Max duration10 secondsLonger clips will error with AudioTooLongError
Sample rate48 kHzAudio is automatically resampled
Max samples480,00048,000 Hz Γ— 10 seconds
PreprocessingRequiredNoPreprocessing not supported - always use ModelPreprocessing

Face Models (Buffalo_L, SFace+YuNet)​

ConstraintValueNotes
Input size640x640 pxImages are resized internally
Face alignment112x112 pxStandard ArcFace alignment
Embedding modeOneToManyReturns one embedding per detected face
PreprocessingRequiredNoPreprocessing not supported
Query constraintSingle faceQuery images must contain exactly 1 face

Cross-Modal Compatibility​

Model PairShared DimUse Case
clip-vit-b32-text + clip-vit-b32-image512Text-to-image / image-to-text search
clap-text + clap-audio512Text-to-audio / audio-to-text search

Supported Input Types​

Input TypeDescription
RAW_STRINGAccepts natural text (sentences, paragraphs). Transformed into embeddings via a selected text-based model.
IMAGEAccepts image files as input. Converted into embeddings via a selected image-based model (e.g., ResNet or CLIP).
AUDIOAccepts audio data as input. Converted into embeddings via an audio-based model (e.g., CLAP Audio).

Example – Creating a Model-Aware Store​

CREATESTORE my_store QUERYMODEL all-minilm-l6-v2 INDEXMODEL all-minilm-l6-v2
  • index_model - defines how inserted data is embedded before being stored in Ahnlich DB.

  • query_model - defines how queries are embedded at search time.

  • Both models must output embeddings of the same dimensionality to ensure compatibility.

Choosing the Right Model​

ModelBest Use Case
MiniLM (L6/L12)Fast, efficient semantic similarity (FAQs, chatbots).
BGE (Base/Large)High semantic accuracy for production-scale applications.
ResNet50Image-to-image similarity and clustering.
CLIP (Text+Image)Multimodal retrieval (text-to-image / image-to-text search).
Buffalo_LFace detection and recognition in images (e.g., group photos, ID verification).
SFace+YuNetLightweight face detection and recognition (e.g., real-time face matching).
CLAP (Audio+Text)Audio similarity search and text-to-audio retrieval.

Model Parameters (model_params)​

Some AI models accept optional runtime parameters via model_params β€” a map<string, string> field available on Set, GetSimN, and ConvertStoreInputToEmbeddings requests. These parameters let you tune model behavior at inference time without changing store configuration.

When model_params is empty (or omitted), models use their built-in defaults. Models that don't support any parameters simply ignore the field.

Supported Parameters by Model​

ModelParameterTypeDefaultDescription
Buffalo_Lconfidence_thresholdfloat (0.0–1.0)0.5Minimum detection confidence for a face to be included. Higher values = fewer but more confident detections.
SFace+YuNetconfidence_thresholdfloat (0.0–1.0)0.6Minimum detection confidence for a face to be included. Higher values = fewer but more confident detections.

Text embedding models (MiniLM, BGE), image models (ResNet, CLIP), and audio models (CLAP) do not currently use model_params.

Usage Examples​

Rust β€” setting a high confidence threshold for face detection:

use std::collections::HashMap;

let mut model_params = HashMap::new();
model_params.insert("confidence_threshold".to_string(), "0.9".to_string());

let set_params = Set {
store: "faces_store".to_string(),
inputs: vec![/* ... */],
preprocess_action: PreprocessAction::NoPreprocessing as i32,
execution_provider: None,
model_params,
};

Python β€” using default parameters (empty dict):

await client.set(
ai_query.Set(
store="faces_store",
inputs=[...],
preprocess_action=preprocess.PreprocessAction.NoPreprocessing,
model_params={} # uses model defaults
)
)

Python β€” custom confidence threshold:

await client.set(
ai_query.Set(
store="faces_store",
inputs=[...],
preprocess_action=preprocess.PreprocessAction.NoPreprocessing,
model_params={"confidence_threshold": "0.9"}
)
)

When to Tune model_params​

  • Inclusive detection (e.g., group photos where you want all faces): Use a lower threshold like 0.3
  • Standard detection (balanced): Use the model default (0.5 for Buffalo_L, 0.6 for SFace+YuNet)
  • Strict detection (e.g., ID verification where only clear faces matter): Use a higher threshold like 0.9

Embedding Metadata​

Starting from version 0.2.2, face detection models (Buffalo_L and SFace+YuNet) return bounding box metadata alongside embeddings. This allows you to access face location and confidence information without re-running detection.

Metadata Fields (Face Detection Models)​

For each detected face, the following metadata is automatically included:

FieldTypeRangeDescription
bbox_x1float0.0–1.0Normalized x-coordinate of top-left corner
bbox_y1float0.0–1.0Normalized y-coordinate of top-left corner
bbox_x2float0.0–1.0Normalized x-coordinate of bottom-right corner
bbox_y2float0.0–1.0Normalized y-coordinate of bottom-right corner
confidencefloat0.0–1.0Detection confidence score

Coordinates are normalized to the 0-1 range, making them independent of the original image resolution. To convert to pixel coordinates, multiply by the image width/height:

pixel_x1 = bbox_x1 * image_width
pixel_y1 = bbox_y1 * image_height

Metadata Storage​

When you insert images using face detection models:

  • Embeddings are stored in Ahnlich DB as usual
  • Metadata (bounding boxes, confidence) is merged into the StoreValue for each face
  • Metadata is returned in GetSimN, GetPred, and ConvertStoreInputToEmbeddings responses

API Response Structure​

The ConvertStoreInputToEmbeddings API returns EmbeddingWithMetadata for face models:

message EmbeddingWithMetadata {
keyval.StoreKey embedding = 1; // The face embedding vector
optional keyval.StoreValue metadata = 2; // Bounding box + confidence
}

For OneToMany models (face detection), multiple EmbeddingWithMetadata objects are returnedβ€”one per detected face.

Usage Examples​

Rust β€” accessing bounding box metadata:

use ahnlich_client_rs::prelude::*;

let response = client.convert_to_embeddings(
store_name,
vec![StoreInput::Image(image_bytes)],
PreprocessAction::ModelPreprocessing,
None,
HashMap::new(),
).await?;

// For face detection models, variant is OneToMany
if let Some(Variant::Multiple(multi)) = &response.values[0].variant {
for face in &multi.embeddings {
if let Some(embedding) = &face.embedding {
println!("Embedding dimensions: {}", embedding.key.len());
}

if let Some(metadata) = &face.metadata {
let bbox_x1 = metadata.value.get("bbox_x1").unwrap();
let bbox_y1 = metadata.value.get("bbox_y1").unwrap();
let confidence = metadata.value.get("confidence").unwrap();

println!("Face at ({}, {}) with confidence {}",
bbox_x1, bbox_y1, confidence);
}
}
}

Python β€” accessing bounding box metadata:

from ahnlich_client_py import AhnlichAIClient

response = await client.convert_store_input_to_embeddings(
store="faces_store",
inputs=[image_bytes],
preprocess_action=PreprocessAction.ModelPreprocessing,
)

# Each face has embedding + metadata
for face_data in response.values[0].multiple.embeddings:
embedding = face_data.embedding.key # 512-dim vector for Buffalo_L
metadata = face_data.metadata.value

bbox_x1 = float(metadata["bbox_x1"].value)
bbox_y1 = float(metadata["bbox_y1"].value)
confidence = float(metadata["confidence"].value)

print(f"Face at ({bbox_x1}, {bbox_y1}) with confidence {confidence}")

TypeScript β€” accessing bounding box metadata:

import { AhnlichAIClient } from '@deven96/ahnlich-client-node';

const response = await client.convertStoreInputToEmbeddings({
store: "faces_store",
inputs: [{ image: imageBytes }],
preprocessAction: PreprocessAction.MODEL_PREPROCESSING,
});

// Each detected face has embedding + metadata
for (const faceData of response.values[0].multiple.embeddings) {
const embedding = faceData.embedding.key; // Float32Array
const metadata = faceData.metadata.value;

const bboxX1 = parseFloat(metadata.bbox_x1.value);
const bboxY1 = parseFloat(metadata.bbox_y1.value);
const confidence = parseFloat(metadata.confidence.value);

console.log(`Face at (${bboxX1}, ${bboxY1}) with confidence ${confidence}`);
}

Use Cases for Metadata​

  • Face cropping: Use bounding boxes to extract face regions from original images
  • Visualization: Draw bounding boxes on images to show detected faces
  • Quality filtering: Filter results by confidence score (e.g., only faces with confidence > 0.8)
  • Spatial queries: Find faces in specific image regions (e.g., "faces in the top-left quadrant")
  • Deduplication: Identify overlapping detections using bounding box coordinates

Models Without Metadata​

Text and image embedding models (MiniLM, BGE, ResNet, CLIP) do not return metadata. The metadata field will be None or empty for these models.