Skip to content

Commit 9cccb79

Browse files
Merge pull request #4 from dev-diaries41/docs
Add documentation
2 parents 79c60f7 + a5066a5 commit 9cccb79

File tree

8 files changed

+1046
-0
lines changed

8 files changed

+1046
-0
lines changed

docs/README.md

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
# SmartScanSdk Docs
2+
3+
## Table of Contents
4+
5+
### Core
6+
- [ML / Embeddings](core/ml/embeddings.md)
7+
- [ML / Models](core/ml/models.md)
8+
- [Processors](core/processors.md)
9+
- [Utils](core/utils.md)
10+
11+
### Extensions
12+
- [Embeddings](extensions/embeddings.md)
13+
- [Indexers](extensions/indexers.md)
14+
- [Organisers](extensions/organiser.md)

docs/core/ml/embeddings.md

Lines changed: 275 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,275 @@
1+
# **Embeddings**
2+
3+
## Overview
4+
5+
Provides unified interfaces and base data types for generating and managing vector embeddings across different media types.
6+
Implements both generic contracts and reference CLIP-based providers for image and text, plus few-shot classification utilities.
7+
8+
---
9+
10+
## **Core Data Types**
11+
12+
### `Embedding`
13+
14+
Represents a raw embedding vector for a single media item.
15+
16+
| Property | Type | Description |
17+
|--------------|--------------|----------------------------------------------|
18+
| `id` | `Long` | Unique MediaStore or item ID |
19+
| `date` | `Long` | Timestamp associated with embedding creation |
20+
| `embeddings` | `FloatArray` | Vector representation |
21+
22+
### `PrototypeEmbedding`
23+
24+
Represents a class-level prototype vector for few-shot classification.
25+
26+
| Property | Type | Description |
27+
|--------------|--------------|----------------------------------------------|
28+
| `id` | `String` | Class identifier |
29+
| `date` | `Long` | Timestamp associated with prototype creation |
30+
| `embeddings` | `FloatArray` | Averaged vector representation |
31+
32+
---
33+
34+
## **Interfaces**
35+
36+
### `IEmbeddingStore`
37+
38+
Defines a persistence interface for managing embedding data.
39+
**Responsibilities:**
40+
41+
* Add or remove stored embeddings
42+
* Retrieve all embeddings for in-memory indexing
43+
* Clear cache or local data
44+
45+
| Member | Type | Description |
46+
|------------|---------------------------------------------------|---------------------------------|
47+
| `isCached` | `Boolean` | Indicates if results are cached |
48+
| `exists` | `Boolean` | Checks if store data exists |
49+
| `add()` | `suspend fun add(newEmbeddings: List<Embedding>)` | Inserts embeddings |
50+
| `remove()` | `suspend fun remove(ids: List<Long>)` | Removes embeddings by ID |
51+
| `getAll()` | `suspend fun getAll(): List<Embedding>` | Loads full embedding index |
52+
| `clear()` | `fun clear()` | Clears local data |
53+
54+
---
55+
56+
### `IRetriever`
57+
58+
Defines nearest-neighbor or similarity-based retrieval over stored embeddings.
59+
60+
| Method | Description |
61+
|-------------------------------------|---------------------------------------------|
62+
| `query(embedding, topK, threshold)` | Returns a ranked list of similar embeddings |
63+
64+
---
65+
66+
### `IEmbeddingProvider<T>`
67+
68+
Defines the contract for embedding generators (text, image, etc.).
69+
70+
| Member | Type | Description |
71+
|------------------|---------------|--------------------------------------|
72+
| `embeddingDim` | `Int?` | Embedding vector dimension |
73+
| `embed(data: T)` | `suspend fun` | Generates embedding for input |
74+
| `closeSession()` | `fun` | Releases underlying model or session |
75+
76+
**Type aliases:**
77+
78+
* `TextEmbeddingProvider = IEmbeddingProvider<String>`
79+
* `ImageEmbeddingProvider = IEmbeddingProvider<Bitmap>`
80+
81+
---
82+
83+
## **Implementations**
84+
85+
### `ClipImageEmbedder`
86+
87+
Reference implementation of `ImageEmbeddingProvider` using a CLIP ONNX model.
88+
Supports on-device embedding generation for bitmaps.
89+
90+
**Key points:**
91+
92+
* Accepts `FilePath` or `ResourceId` model source.
93+
* Requires explicit `initialize()` before embedding.
94+
* Returns 512-D normalized vectors.
95+
* Supports batch processing via `BatchProcessor`.
96+
97+
| Method | Description |
98+
|--------------------------------|-----------------------------------|
99+
| `initialize()` | Loads the ONNX model into memory |
100+
| `isInitialized()` | Checks model state |
101+
| `embed(bitmap)` | Generates embedding from a bitmap |
102+
| `embedBatch(context, bitmaps)` | Batch embedding |
103+
| `closeSession()` | Frees model resources |
104+
105+
**Usage Example:**
106+
107+
```kotlin
108+
val imageEmbedder = ClipImageEmbedder(resources, ModelSource.FilePath("/models/clip_image.onnx"))
109+
imageEmbedder.initialize()
110+
val embedding = imageEmbedder.embed(bitmap)
111+
imageEmbedder.closeSession()
112+
```
113+
114+
---
115+
116+
### `ClipTextEmbedder`
117+
118+
Reference implementation of `TextEmbeddingProvider` using a CLIP ONNX model and built-in tokenizer.
119+
120+
**Key points:**
121+
122+
* Tokenizes text using CLIP’s BPE vocabulary and merge rules.
123+
* Accepts bundled (`ResourceId`) or local (`FilePath`) models.
124+
* Produces normalized 512-D vectors.
125+
* Includes batch processing support.
126+
127+
| Method | Description |
128+
|------------------------------|-------------------------------|
129+
| `initialize()` | Loads model weights |
130+
| `isInitialized()` | Checks model state |
131+
| `embed(text)` | Encodes and embeds input text |
132+
| `embedBatch(context, texts)` | Batch text embedding |
133+
| `closeSession()` | Releases resources |
134+
135+
**Usage Example:**
136+
137+
```kotlin
138+
val textEmbedder = ClipTextEmbedder(resources, ModelSource.FilePath("/models/clip_text.onnx"))
139+
textEmbedder.initialize()
140+
val embedding = textEmbedder.embed(text)
141+
textEmbedder.closeSession()
142+
```
143+
144+
---
145+
146+
## **Few-Shot Classification**
147+
148+
### `ClassificationResult`
149+
150+
Represents the outcome of a classification attempt.
151+
152+
| Type | Description |
153+
|-----------|-----------------------------------------------------------------------|
154+
| `Success` | Contains `classId` of the closest match and similarity score |
155+
| `Failure` | Contains a `ClassificationError` indicating why classification failed |
156+
157+
### `ClassificationError`
158+
159+
Enumerates possible failure reasons:
160+
161+
| Error | Description |
162+
|----------------------|---------------------------------------------------------------------|
163+
| `MINIMUM_CLASS_SIZE` | Not enough class prototypes to perform classification (requires ≥2) |
164+
| `THRESHOLD` | Top similarity below minimum threshold |
165+
| `CONFIDENCE_MARGIN` | Gap between top 2 similarities too small to be conclusive |
166+
| `LABELLED_BAD` | Optional: indicates invalid or corrupted class prototype |
167+
168+
### `classify`
169+
170+
Performs few-shot classification of a single embedding.
171+
172+
```kotlin
173+
fun classify(
174+
embedding: FloatArray,
175+
classPrototypes: List<PrototypeEmbedding>,
176+
threshold: Float = 0.4f,
177+
confidenceMargin: Float = 0.05f
178+
): ClassificationResult
179+
```
180+
181+
**Behavior:**
182+
183+
1. Returns `Failure(MINIMUM_CLASS_SIZE)` if fewer than 2 prototypes.
184+
2. Computes similarities between the embedding and all class prototypes.
185+
3. Finds top 2 most similar prototypes.
186+
4. Returns `Failure(THRESHOLD)` if best similarity < threshold.
187+
5. Returns `Failure(CONFIDENCE_MARGIN)` if top-2 similarity gap < confidenceMargin.
188+
6. Returns `Success(classId, similarity)` if criteria are met.
189+
190+
**Usage Example:**
191+
192+
```kotlin
193+
val result = classify(embedding, classPrototypes, threshold = 0.5f)
194+
when(result) {
195+
is ClassificationResult.Success -> println("Matched class: ${result.classId}, similarity=${result.similarity}")
196+
is ClassificationResult.Failure -> println("Classification failed: ${result.error}")
197+
}
198+
```
199+
200+
---
201+
202+
## **Utilities**
203+
204+
Provides helper functions for embedding operations such as similarity calculation, normalization, and prototype generation.
205+
206+
### `FloatArray.dot(other: FloatArray)`
207+
208+
Computes the dot product between two vectors.
209+
210+
```kotlin
211+
infix fun FloatArray.dot(other: FloatArray): Float
212+
```
213+
214+
---
215+
216+
### `normalizeL2(inputArray: FloatArray)`
217+
218+
Performs L2 normalization on a vector.
219+
220+
```kotlin
221+
fun normalizeL2(inputArray: FloatArray): FloatArray
222+
```
223+
224+
**Behavior:** Returns a normalized vector with Euclidean norm = 1.
225+
226+
---
227+
228+
### `getSimilarities(embedding: FloatArray, comparisonEmbeddings: List<FloatArray>)`
229+
230+
Computes similarity scores between a single embedding and a list of embeddings.
231+
232+
```kotlin
233+
fun getSimilarities(embedding: FloatArray, comparisonEmbeddings: List<FloatArray>): List<Float>
234+
```
235+
236+
**Behavior:** Returns a list of dot-product similarities.
237+
238+
---
239+
240+
### `getTopN(similarities: List<Float>, n: Int, threshold: Float = 0f)`
241+
242+
Selects the indices of the top `n` similarities above a given threshold.
243+
244+
```kotlin
245+
fun getTopN(similarities: List<Float>, n: Int, threshold: Float = 0f): List<Int>
246+
```
247+
248+
---
249+
250+
### `generatePrototypeEmbedding(rawEmbeddings: List<FloatArray>)`
251+
252+
Generates a class-level prototype embedding by averaging multiple embeddings and normalizing.
253+
254+
```kotlin
255+
suspend fun generatePrototypeEmbedding(rawEmbeddings: List<FloatArray>): FloatArray
256+
```
257+
258+
**Behavior:**
259+
260+
* Computes the element-wise average of input embeddings.
261+
* Returns the L2-normalized prototype vector.
262+
* Throws `IllegalStateException` if input is empty.
263+
264+
---
265+
266+
## **Extending**
267+
268+
To implement a custom provider:
269+
270+
1. Implement `IEmbeddingProvider<T>` for your data type.
271+
2. Ensure consistent output dimension (`embeddingDim`).
272+
3. Return L2-normalized vectors for compatibility with retrievers.
273+
4. Few-shot classification can directly use `PrototypeEmbedding` outputs.
274+
275+
---

0 commit comments

Comments
 (0)