feat: add analytics subcommand for mbox sender analysis
Adds a new `analytics` subcommand that analyzes Google Takeout mbox files to identify top senders by message count. Designed for efficient processing of large files (60GB+) with minimal memory usage. Features: - Streams files line-by-line with 1MB buffer (never loads entire file) - Extracts sender email addresses from From: headers - Counts messages per sender and displays top N (default 10) - Shows progress output every 10,000 messages - No Gmail API access needed Usage: cull-gmail analytics <MBOX_FILE> [-n TOP] Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
This commit is contained in:
70
README.md
70
README.md
@@ -53,8 +53,9 @@ Get started with cull-gmail in minutes using the built-in setup command:
|
||||
- **Flexible configuration**: Support for file-based config, environment variables, and ephemeral tokens
|
||||
- **Safety first**: Dry-run mode by default, interactive confirmations, and timestamped backups
|
||||
- **Label management**: List and inspect Gmail labels for rule planning
|
||||
- **Message operations**: Query, filter, and perform batch operations on Gmail messages
|
||||
- **Message operations**: Query, filter, and perform batch operations on Gmail messages
|
||||
- **Rule-based automation**: Configure retention rules with time-based filtering and automated actions
|
||||
- **Mbox analysis**: Analyze Google Takeout exports to identify top senders (efficient streaming, no API needed)
|
||||
- **Token portability**: Export/import OAuth2 tokens for containerized and CI/CD environments
|
||||
|
||||
### Running the optional Gmail integration test
|
||||
@@ -201,9 +202,12 @@ cull-gmail [OPTIONS] [COMMAND]
|
||||
|
||||
### Commands
|
||||
|
||||
- `init`: Initialize configuration and OAuth2 credentials
|
||||
- `labels`: List available Gmail labels
|
||||
- `messages`: Query and operate on messages
|
||||
- `rules`: Configure and run retention rules
|
||||
- `analytics`: Analyze mbox files for sender statistics
|
||||
- `token`: Export and import OAuth2 tokens
|
||||
|
||||
## Command Reference
|
||||
|
||||
@@ -370,6 +374,70 @@ cull-gmail rules run --execute --skip-trash
|
||||
cull-gmail rules run --execute --skip-delete
|
||||
```
|
||||
|
||||
### Analytics Command
|
||||
|
||||
Analyze Google Takeout mbox files to identify top senders by message count.
|
||||
|
||||
**Note**: This command does NOT require Gmail API access. It efficiently streams local mbox files with minimal memory usage, making it suitable for analyzing large exports (60GB+).
|
||||
|
||||
#### Syntax
|
||||
|
||||
```bash
|
||||
cull-gmail analytics [OPTIONS] <MBOX_FILE>
|
||||
```
|
||||
|
||||
#### Arguments
|
||||
|
||||
- `<MBOX_FILE>`: Path to mbox file to analyze (typically from Google Takeout)
|
||||
|
||||
#### Options
|
||||
|
||||
- `-n, --top <TOP>`: Number of top senders to display [default: 10]
|
||||
|
||||
#### Examples
|
||||
|
||||
**Show top 10 senders from a Google Takeout mbox**:
|
||||
```bash
|
||||
cull-gmail analytics ~/takeout/All\ mail\ Including\ Spam\ and\ Trash.mbox
|
||||
```
|
||||
|
||||
**Show top 20 senders**:
|
||||
```bash
|
||||
cull-gmail analytics -n 20 ~/takeout/All\ mail.mbox
|
||||
```
|
||||
|
||||
**Example Output**:
|
||||
```
|
||||
[INFO] Scanned 1234567 messages total.
|
||||
Top 10 senders:
|
||||
45678 newsletter@example.com
|
||||
23456 promotions@example.com
|
||||
18901 notifications@example.com
|
||||
12345 support@example.com
|
||||
9876 marketing@example.com
|
||||
8765 updates@example.com
|
||||
7654 alerts@example.com
|
||||
6543 digests@example.com
|
||||
5432 reports@example.com
|
||||
4321 announcements@example.com
|
||||
```
|
||||
|
||||
#### Use Cases
|
||||
|
||||
- Identify top email senders in your mailbox before configuring rules
|
||||
- Analyze historical email patterns from a full account export
|
||||
- Find unexpected high-volume senders for further investigation
|
||||
- Plan email retention policies based on actual sender frequency
|
||||
|
||||
#### Getting a Google Takeout mbox File
|
||||
|
||||
1. Visit [Google Takeout](https://takeout.google.com)
|
||||
2. Select "Gmail" and choose the desired email account
|
||||
3. Select export format "Standard" (generates .mbox files)
|
||||
4. Download the export (can be very large - multiple parts possible)
|
||||
5. Extract/combine the mbox files if needed
|
||||
6. Use `cull-gmail analytics` on the mbox file
|
||||
|
||||
## Gmail Query Syntax
|
||||
|
||||
The `-Q, --query` option supports Gmail's powerful search syntax:
|
||||
|
||||
Reference in New Issue
Block a user