Files
rbv/crates/rbv-cli/README.md
rob thijssen 3afded695b Add rbv-cli README and ML retry/backoff on transient errors
Documents the index→cluster workflow and SQL reset procedures.
Adds exponential backoff (5s/10s/20s, up to 3 retries) for connection
and 5xx errors from the ML API; exhausted retries skip the image
gracefully so it is retried on the next index run.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-22 16:57:22 +02:00

2.9 KiB
Raw Blame History

rbv

Image gallery indexer and facial recognition tool.

Subcommands

migrate

Create or update the database schema.

rbv migrate --database <CONNSTR>

Run this once before first use, and again after pulling schema changes.


index

Extract CLIP embeddings and face detections from image galleries and store them in the database.

rbv index \
  --target <PATH>... \
  --database <CONNSTR> \
  --ml-uri <URL> \
  [--concurrency <N>]    # default 4
  [--include <GLOB>...]
  [--exclude <GLOB>...]

--target can be any of:

  • A single gallery directory (contains index.json and tn/)
  • A chunk directory (immediate children are galleries)
  • A root directory (immediate children are chunks)
  • Any arbitrary directory — galleries are discovered recursively

Images already present in the database are skipped, so re-running against the same target is safe and cheap. Failed images (e.g. due to a transient ML API error) are not written to the database and will be retried on the next run.

Quality note: Indexing one gallery or the whole tree produces identical embeddings. Recognition quality is determined entirely by cluster (below), which always operates over the full database regardless of how many index runs contributed to it.


cluster

Group all indexed face embeddings into person identities using cosine similarity and connected-components clustering.

rbv cluster \
  --database <CONNSTR> \
  [--threshold <FLOAT>]  # default 0.65 (range 0.01.0)
  • Only faces without an assigned person_id are clustered; existing assignments are preserved.
  • Re-run after each index pass to assign identities to newly indexed faces.
  • Raise --threshold (e.g. 0.75) for stricter grouping (fewer false merges, more splits). Lower it for looser grouping.

Typical workflow

# 1. One-time setup
rbv migrate --database "$DATABASE_URL"

# 2. Index all galleries (incremental — safe to re-run)
rbv index --target /mnt/galleries --database "$DATABASE_URL" --ml-uri http://ml:3003

# 3. Cluster faces into persons
rbv cluster --database "$DATABASE_URL"

# 4. As new galleries are added, repeat steps 23
rbv index --target /mnt/galleries/new-chunk --database "$DATABASE_URL" --ml-uri http://ml:3003
rbv cluster --database "$DATABASE_URL"

Resetting face assignments

To discard all clustering results and start fresh (e.g. after tuning --threshold or after a large new batch is indexed):

-- Unassign all faces
UPDATE face_detections SET person_id = NULL;

-- Remove all person records (cascade-clears person_names too)
DELETE FROM persons;

Then re-run rbv cluster. The embeddings themselves are not touched, so re-clustering is fast.

To reset a single person's faces without affecting others:

UPDATE face_detections SET person_id = NULL WHERE person_id = '<uuid>';
DELETE FROM persons WHERE id = '<uuid>';