Documents the index→cluster workflow and SQL reset procedures. Adds exponential backoff (5s/10s/20s, up to 3 retries) for connection and 5xx errors from the ML API; exhausted retries skip the image gracefully so it is retried on the next index run. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2.9 KiB
rbv
Image gallery indexer and facial recognition tool.
Subcommands
migrate
Create or update the database schema.
rbv migrate --database <CONNSTR>
Run this once before first use, and again after pulling schema changes.
index
Extract CLIP embeddings and face detections from image galleries and store them in the database.
rbv index \
--target <PATH>... \
--database <CONNSTR> \
--ml-uri <URL> \
[--concurrency <N>] # default 4
[--include <GLOB>...]
[--exclude <GLOB>...]
--target can be any of:
- A single gallery directory (contains
index.jsonandtn/) - A chunk directory (immediate children are galleries)
- A root directory (immediate children are chunks)
- Any arbitrary directory — galleries are discovered recursively
Images already present in the database are skipped, so re-running against the same target is safe and cheap. Failed images (e.g. due to a transient ML API error) are not written to the database and will be retried on the next run.
Quality note: Indexing one gallery or the whole tree produces identical
embeddings. Recognition quality is determined entirely by cluster (below),
which always operates over the full database regardless of how many index
runs contributed to it.
cluster
Group all indexed face embeddings into person identities using cosine similarity and connected-components clustering.
rbv cluster \
--database <CONNSTR> \
[--threshold <FLOAT>] # default 0.65 (range 0.0–1.0)
- Only faces without an assigned
person_idare clustered; existing assignments are preserved. - Re-run after each
indexpass to assign identities to newly indexed faces. - Raise
--threshold(e.g.0.75) for stricter grouping (fewer false merges, more splits). Lower it for looser grouping.
Typical workflow
# 1. One-time setup
rbv migrate --database "$DATABASE_URL"
# 2. Index all galleries (incremental — safe to re-run)
rbv index --target /mnt/galleries --database "$DATABASE_URL" --ml-uri http://ml:3003
# 3. Cluster faces into persons
rbv cluster --database "$DATABASE_URL"
# 4. As new galleries are added, repeat steps 2–3
rbv index --target /mnt/galleries/new-chunk --database "$DATABASE_URL" --ml-uri http://ml:3003
rbv cluster --database "$DATABASE_URL"
Resetting face assignments
To discard all clustering results and start fresh (e.g. after tuning
--threshold or after a large new batch is indexed):
-- Unassign all faces
UPDATE face_detections SET person_id = NULL;
-- Remove all person records (cascade-clears person_names too)
DELETE FROM persons;
Then re-run rbv cluster. The embeddings themselves are not touched, so
re-clustering is fast.
To reset a single person's faces without affecting others:
UPDATE face_detections SET person_id = NULL WHERE person_id = '<uuid>';
DELETE FROM persons WHERE id = '<uuid>';