Architecture Overview
msgvault syncs your Gmail to a local SQLite database, the only step that touches the network. Everything else (search, analytics, the TUI, and the MCP server) runs entirely offline against SQLite, Parquet, and local attachment files.
Package Structure
msgvault/├── cmd/msgvault/ # CLI entrypoint│ └── cmd/ # Cobra commands├── internal/ # Core packages│ ├── tui/ # Bubble Tea TUI│ ├── query/ # DuckDB query engine over Parquet│ ├── store/ # SQLite database access│ ├── deletion/ # Deletion staging and manifest│ ├── gmail/ # Gmail API client│ ├── sync/ # Sync orchestration│ ├── oauth/ # OAuth2 flows (browser + device)│ └── mime/ # MIME parsing├── go.mod└── MakefileKey Packages
| Package | Responsibility |
|---|---|
cmd/ | Cobra CLI commands, config loading |
internal/store | SQLite database operations, schema management |
internal/sync | Sync orchestration, MIME parsing, checkpoint management |
internal/gmail | Gmail API client with token bucket rate limiting |
internal/oauth | OAuth2 browser and device authorization flows |
internal/query | DuckDB engine over Parquet files, SQLite fallback |
internal/tui | Bubble Tea model, lipgloss-styled views |
internal/deletion | Deletion staging, manifest generation |
internal/mime | MIME message parsing, charset detection |
Design Decisions
- Offline by design: The Gmail API is only contacted during explicit
sync-full,sync, and deletion commands. Every other operation (search, analytics, TUI, MCP) runs entirely against local data with no network access required. This means no background OAuth sessions, no persistent API connections, and no possibility of an external tool or AI assistant reaching your live mailbox. - SQLite as system of record: All message data lives in SQLite. Parquet is a derived cache.
- DuckDB + Parquet for analytics: The TUI runs an embedded DuckDB engine over Parquet metadata exports, delivering aggregate queries hundreds of times faster than SQLite JOINs. The entire analytics cache for hundreds of thousands of messages fits in a few megabytes, making drill-down and re-aggregation feel instant.
- Content-addressed attachments: Deduplicated by SHA-256 hash, stored on disk.
- Resumable sync: Checkpoints allow interrupted syncs to resume without re-downloading.
- Token bucket rate limiting: Respects Gmail API quotas without manual throttling.