Initial workspace scaffold

Cargo workspace with 5 crates: buh-entity (pure data structs),
buh-data (Turso/libsql data access), buh-util (scraper, rules,
processor, sync modules), buh-cli (binary "buh" with client/daemon
subcommands), and buh-ws (axum WebSocket server).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-03-20 09:46:15 +02:00
commit b11a0b7c56
26 changed files with 4131 additions and 0 deletions

1
.gitignore vendored Normal file
View File

@@ -0,0 +1 @@
/target

53
CLAUDE.md Normal file
View File

@@ -0,0 +1,53 @@
# CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
## What is buh
buh is a media automation engine that scrapes torrent indexers, indexes torrents and metadata, selects downloads via operator-defined ingestion/discard rules, queues downloads, processes downloaded files (renaming), and syncs them to LAN targets (e.g., Jellyfin media folders). Media types: shows, movies, music, books, audio-books.
## Build & Test Commands
```bash
cargo check --workspace # fast type-check all crates
cargo build --workspace # full build
cargo test --workspace # run all tests
cargo test -p buh-entity # test a single crate
cargo run -p buh-cli -- --help # run the CLI
cargo run -p buh-cli -- daemon --config daemon.toml # run daemon mode
cargo run -p buh-ws # start the WebSocket server
```
The CLI binary is named `buh` (not `buh-cli`). When no subcommand is given, it defaults to `client` mode.
## Architecture
Cargo workspace with edition 2024, resolver 3. All shared dependencies are declared in the root `Cargo.toml` under `[workspace.dependencies]` and inherited by crates.
### Dependency graph (arrows = "depends on")
```
buh-entity ← leaf crate, no internal deps, serde-only
buh-data ← Turso/libsql data access layer
buh-util ← domain logic (scraper, rules, processor, sync modules)
buh-cli, buh-ws ← binaries
```
### Crate responsibilities
- **buh-entity** — Pure data structs. No business logic, no database awareness. Only depends on `serde`. Every other crate imports this.
- **buh-data** — All database access goes through `Db` struct wrapping a libsql `Connection`. Uses `thiserror` for typed errors. Schema migrations live in `Db::migrate()`.
- **buh-util** — Domain logic organized as modules: `scraper` (indexer scraping), `rules` (ingestion/discard evaluation), `processor` (file renaming), `sync` (LAN file sync).
- **buh-cli** — Binary `buh` with two clap subcommands: `client` (default, interactive) and `daemon` (reads TOML config, runs routines). Config path defaults to `/etc/buh/daemon.toml`.
- **buh-ws** — Axum-based WebSocket server on port 9000, endpoint `/ws`.
### Error handling convention
Libraries (`buh-data`, `buh-util`, `buh-entity`) use `thiserror` for typed errors. Binaries (`buh-cli`, `buh-ws`) use `anyhow` for error erasure.
### Config format
Daemon configuration is TOML. The schema is defined in `buh-entity::config`. See `daemon.toml` at the repo root for an example.

3561
Cargo.lock generated Normal file

File diff suppressed because it is too large Load Diff

40
Cargo.toml Normal file
View File

@@ -0,0 +1,40 @@
[workspace]
resolver = "3"
members = ["crates/*"]
[workspace.package]
edition = "2024"
version = "0.1.0"
license = "MIT"
[workspace.dependencies]
# internal
buh-entity = { path = "crates/buh-entity" }
buh-data = { path = "crates/buh-data" }
buh-util = { path = "crates/buh-util" }
# async runtime
tokio = { version = "1", features = ["full"] }
# serialization
serde = { version = "1", features = ["derive"] }
serde_json = "1"
toml = "0.8"
# http / websocket
reqwest = { version = "0.12", features = ["json"] }
axum = { version = "0.8", features = ["ws"] }
# database
libsql = "0.9"
# cli
clap = { version = "4", features = ["derive"] }
# observability
tracing = "0.1"
tracing-subscriber = { version = "0.3", features = ["env-filter"] }
# errors
thiserror = "2"
anyhow = "1"

79
README.md Normal file
View File

@@ -0,0 +1,79 @@
# buh
A media automation engine that scrapes torrent indexers, indexes torrents and their metadata, selects interesting downloads using operator-defined rules, queues downloads, processes downloaded files using renaming rules, and syncs renamed files to configured LAN targets.
The end goal is to populate a Jellyfin (or similar) media server with content — shows, movies, music, books, and audio-books — based on ingestion rules, and to later discard content based on operator discard rules.
## Architecture
```
buh-entity pure data structs (serde only, no logic)
buh-data data access layer (Turso/libsql)
buh-util domain logic (scraper, rules, processor, sync)
buh-cli CLI binary ("buh")
buh-ws WebSocket server (axum)
```
All crates live under `crates/` in a single Cargo workspace.
## Usage
### CLI
```bash
# interactive client mode (default)
buh
# run daemon routines from config
buh daemon --config /etc/buh/daemon.toml
```
The `client` subcommand is assumed when no subcommand is provided.
### WebSocket server
```bash
cargo run -p buh-ws
# listens on 0.0.0.0:9000, endpoint /ws
```
## Configuration
The daemon reads a TOML configuration file. See [`daemon.toml`](daemon.toml) for an example.
```toml
[database]
url = "libsql://your-db.turso.io"
auth_token = "your-token"
[[indexers]]
name = "example-indexer"
url = "https://example.com/api"
media_types = ["show", "movie"]
[[sync]]
name = "jellyfin-shows"
media_type = "show"
path = "/mnt/media/shows"
[routines]
scrape = true
process = true
sync = true
```
## Building
```bash
cargo build --workspace
cargo test --workspace
```
Requires Rust edition 2024 (stable 1.85+).
## License
MIT

20
crates/buh-cli/Cargo.toml Normal file
View File

@@ -0,0 +1,20 @@
[package]
name = "buh-cli"
edition.workspace = true
version.workspace = true
[[bin]]
name = "buh"
path = "src/main.rs"
[dependencies]
buh-entity = { workspace = true }
buh-data = { workspace = true }
buh-util = { workspace = true }
clap = { workspace = true }
tokio = { workspace = true }
toml = { workspace = true }
serde = { workspace = true }
anyhow = { workspace = true }
tracing = { workspace = true }
tracing-subscriber = { workspace = true }

View File

@@ -0,0 +1,4 @@
pub async fn run() -> anyhow::Result<()> {
tracing::info!("client mode not yet implemented");
Ok(())
}

View File

@@ -0,0 +1,24 @@
use std::path::Path;
use buh_entity::config::DaemonConfig;
pub async fn run(config_path: &Path) -> anyhow::Result<()> {
let raw = std::fs::read_to_string(config_path)?;
let config: DaemonConfig = toml::from_str(&raw)?;
let db = buh_data::Db::connect(&config.database.url, config.database.auth_token.as_deref())
.await?;
db.migrate().await?;
if config.routines.scrape {
tracing::info!("scrape routine: not yet implemented");
}
if config.routines.process {
tracing::info!("process routine: not yet implemented");
}
if config.routines.sync {
tracing::info!("sync routine: not yet implemented");
}
Ok(())
}

View File

@@ -0,0 +1,37 @@
mod client;
mod daemon;
use clap::{Parser, Subcommand};
#[derive(Parser)]
#[command(name = "buh", version, about = "Media automation engine")]
struct Cli {
#[command(subcommand)]
command: Option<Command>,
}
#[derive(Subcommand)]
enum Command {
/// Interactive client (default when no subcommand is given)
Client,
/// Run daemon routines from a configuration file
Daemon {
/// Path to daemon TOML configuration file
#[arg(long, default_value = "/etc/buh/daemon.toml")]
config: std::path::PathBuf,
},
}
#[tokio::main]
async fn main() -> anyhow::Result<()> {
tracing_subscriber::fmt()
.with_env_filter(tracing_subscriber::EnvFilter::from_default_env())
.init();
let cli = Cli::parse();
match cli.command.unwrap_or(Command::Client) {
Command::Client => client::run().await,
Command::Daemon { config } => daemon::run(&config).await,
}
}

View File

@@ -0,0 +1,11 @@
[package]
name = "buh-data"
edition.workspace = true
version.workspace = true
[dependencies]
buh-entity = { workspace = true }
libsql = { workspace = true }
tokio = { workspace = true }
thiserror = { workspace = true }
tracing = { workspace = true }

View File

@@ -0,0 +1,77 @@
use libsql::Connection;
#[derive(Debug, thiserror::Error)]
pub enum DataError {
#[error("database error: {0}")]
Database(#[from] libsql::Error),
#[error("not found: {0}")]
NotFound(String),
}
pub struct Db {
conn: Connection,
}
impl Db {
pub async fn connect(url: &str, auth_token: Option<&str>) -> Result<Self, DataError> {
let db = match auth_token {
Some(token) => {
libsql::Builder::new_remote(url.to_string(), token.to_string())
.build()
.await?
}
None => {
libsql::Builder::new_local(url)
.build()
.await?
}
};
let conn = db.connect()?;
Ok(Self { conn })
}
pub async fn migrate(&self) -> Result<(), DataError> {
self.conn
.execute_batch(
"
CREATE TABLE IF NOT EXISTS torrents (
id INTEGER PRIMARY KEY AUTOINCREMENT,
info_hash TEXT NOT NULL UNIQUE,
name TEXT NOT NULL,
media_type TEXT,
state TEXT NOT NULL DEFAULT 'discovered',
source_url TEXT NOT NULL,
created_at TEXT NOT NULL DEFAULT (datetime('now')),
updated_at TEXT NOT NULL DEFAULT (datetime('now'))
);
CREATE TABLE IF NOT EXISTS rules (
id INTEGER PRIMARY KEY AUTOINCREMENT,
name TEXT NOT NULL,
media_type TEXT NOT NULL,
action TEXT NOT NULL,
pattern TEXT NOT NULL,
priority INTEGER NOT NULL DEFAULT 0,
created_at TEXT NOT NULL DEFAULT (datetime('now'))
);
CREATE TABLE IF NOT EXISTS media_items (
id INTEGER PRIMARY KEY AUTOINCREMENT,
media_type TEXT NOT NULL,
title TEXT NOT NULL,
year INTEGER,
torrent_id INTEGER REFERENCES torrents(id),
created_at TEXT NOT NULL DEFAULT (datetime('now'))
);
",
)
.await?;
tracing::info!("database migrations applied");
Ok(())
}
pub fn conn(&self) -> &Connection {
&self.conn
}
}

View File

@@ -0,0 +1,7 @@
[package]
name = "buh-entity"
edition.workspace = true
version.workspace = true
[dependencies]
serde = { workspace = true }

View File

@@ -0,0 +1,39 @@
use serde::{Deserialize, Serialize};
use crate::media::MediaType;
#[derive(Debug, Deserialize, Serialize)]
pub struct DaemonConfig {
pub database: DatabaseConfig,
pub indexers: Vec<IndexerConfig>,
pub sync: Vec<SyncTarget>,
pub routines: RoutineConfig,
}
#[derive(Debug, Deserialize, Serialize)]
pub struct DatabaseConfig {
pub url: String,
pub auth_token: Option<String>,
}
#[derive(Debug, Deserialize, Serialize)]
pub struct IndexerConfig {
pub name: String,
pub url: String,
pub api_key: Option<String>,
pub media_types: Vec<MediaType>,
}
#[derive(Debug, Deserialize, Serialize)]
pub struct SyncTarget {
pub name: String,
pub media_type: MediaType,
pub path: String,
}
#[derive(Debug, Deserialize, Serialize)]
pub struct RoutineConfig {
pub scrape: bool,
pub process: bool,
pub sync: bool,
}

View File

@@ -0,0 +1,4 @@
pub mod config;
pub mod media;
pub mod rule;
pub mod torrent;

View File

@@ -0,0 +1,19 @@
use serde::{Deserialize, Serialize};
#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash, Serialize, Deserialize)]
#[serde(rename_all = "kebab-case")]
pub enum MediaType {
Show,
Movie,
Music,
Book,
AudioBook,
}
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct MediaItem {
pub id: Option<i64>,
pub media_type: MediaType,
pub title: String,
pub year: Option<u16>,
}

View File

@@ -0,0 +1,20 @@
use serde::{Deserialize, Serialize};
use crate::media::MediaType;
#[derive(Debug, Clone, Copy, PartialEq, Eq, Serialize, Deserialize)]
#[serde(rename_all = "kebab-case")]
pub enum RuleAction {
Ingest,
Discard,
}
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct Rule {
pub id: Option<i64>,
pub name: String,
pub media_type: MediaType,
pub action: RuleAction,
pub pattern: String,
pub priority: i32,
}

View File

@@ -0,0 +1,25 @@
use serde::{Deserialize, Serialize};
use crate::media::MediaType;
#[derive(Debug, Clone, Copy, PartialEq, Eq, Serialize, Deserialize)]
#[serde(rename_all = "kebab-case")]
pub enum TorrentState {
Discovered,
Queued,
Downloading,
Downloaded,
Processed,
Synced,
Discarded,
}
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct Torrent {
pub id: Option<i64>,
pub info_hash: String,
pub name: String,
pub media_type: Option<MediaType>,
pub state: TorrentState,
pub source_url: String,
}

View File

@@ -0,0 +1,14 @@
[package]
name = "buh-util"
edition.workspace = true
version.workspace = true
[dependencies]
buh-entity = { workspace = true }
buh-data = { workspace = true }
reqwest = { workspace = true }
tokio = { workspace = true }
serde = { workspace = true }
serde_json = { workspace = true }
thiserror = { workspace = true }
tracing = { workspace = true }

View File

@@ -0,0 +1,4 @@
pub mod processor;
pub mod rules;
pub mod scraper;
pub mod sync;

View File

@@ -0,0 +1 @@
//! Post-download file processing and renaming.

View File

@@ -0,0 +1 @@
//! Rule evaluation engine for torrent selection and discard.

View File

@@ -0,0 +1 @@
//! Scraper routines for torrent indexers.

View File

@@ -0,0 +1 @@
//! File synchronization to LAN targets.

16
crates/buh-ws/Cargo.toml Normal file
View File

@@ -0,0 +1,16 @@
[package]
name = "buh-ws"
edition.workspace = true
version.workspace = true
[dependencies]
buh-entity = { workspace = true }
buh-data = { workspace = true }
buh-util = { workspace = true }
axum = { workspace = true }
tokio = { workspace = true }
serde = { workspace = true }
serde_json = { workspace = true }
anyhow = { workspace = true }
tracing = { workspace = true }
tracing-subscriber = { workspace = true }

34
crates/buh-ws/src/main.rs Normal file
View File

@@ -0,0 +1,34 @@
use axum::{
Router,
extract::ws::{WebSocket, WebSocketUpgrade},
response::IntoResponse,
routing::get,
};
async fn ws_handler(ws: WebSocketUpgrade) -> impl IntoResponse {
ws.on_upgrade(handle_socket)
}
async fn handle_socket(mut socket: WebSocket) {
while let Some(Ok(msg)) = socket.recv().await {
tracing::debug!(?msg, "received");
if socket.send(msg).await.is_err() {
break;
}
}
}
#[tokio::main]
async fn main() -> anyhow::Result<()> {
tracing_subscriber::fmt()
.with_env_filter(tracing_subscriber::EnvFilter::from_default_env())
.init();
let app = Router::new().route("/ws", get(ws_handler));
let listener = tokio::net::TcpListener::bind("0.0.0.0:9000").await?;
tracing::info!("buh-ws listening on {}", listener.local_addr()?);
axum::serve(listener, app).await?;
Ok(())
}

38
daemon.toml Normal file
View File

@@ -0,0 +1,38 @@
[database]
url = "libsql://your-db.turso.io"
auth_token = "your-token"
[[indexers]]
name = "example-indexer"
url = "https://example.com/api"
media_types = ["show", "movie"]
[[sync]]
name = "jellyfin-shows"
media_type = "show"
path = "/mnt/media/shows"
[[sync]]
name = "jellyfin-movies"
media_type = "movie"
path = "/mnt/media/movies"
[[sync]]
name = "jellyfin-music"
media_type = "music"
path = "/mnt/media/music"
[[sync]]
name = "jellyfin-books"
media_type = "book"
path = "/mnt/media/books"
[[sync]]
name = "jellyfin-audiobooks"
media_type = "audio-book"
path = "/mnt/media/audiobooks"
[routines]
scrape = true
process = true
sync = true