This document describes the current TorresVault Proxmox cluster as it exists today. It focuses on hardware, networking, storage, workloads, and backup/restore, and is intended as the authoritative reference for how virtualization is implemented in TorresVault 2.0 (current state).
Future redesigns (new NAS, X570D4U, Mini PC cluster, etc.) will be documented separately on the roadmap page.
The Proxmox environment is a 2-node cluster with a qdevice, running on older but solid Intel desktop platforms with expanded SATA and NIC capacity.
High-level logical view:
The design intentionally does not use shared storage for HA; instead, VMs are pinned to nodes and protected via image-based backups to PBS.
Role: General compute node, many core services
Primary role summary:
—
Role: General compute node, media & application workloads
Primary role summary:
—
—
The Proxmox cluster uses:
UniFi VLANs exist on the network side (stark_user, stark_IOT, guest, IOT+, Torres Family Lights); for now, Proxmox sees mostly the flat LAN plus specific lab networks for testing.
—
From the Proxmox UI:
Design notes:
—
Design notes:
—
This separation keeps corosync and cluster traffic off the main LAN and avoids cluster instability if LAN becomes noisy.
There are three main storage layers:
—
Typical Proxmox storages (names as shown in the UI):
Backed by:
—
Also backed by multiple 1 TB Seagate ST91000640NS disks via the same combination of controllers.
—
the system.
Over time, this VM may be migrated to a dedicated physical NAS, but for now it is virtualized.
—
Important backup rule:
This prevents:
PBS instead focuses on backing up critical application VMs only.
The current cluster runs a mix of core services and lab workloads. VM IDs/names:
—
These assignments are not HA; VMs are pinned to nodes and protected by PBS backups.
Backups are handled by the PBS VM (ID 105), writing into datastore `pbs-main` hosted on TrueNAS.
Key points:
—
existing datastore
Instead, PBS backup jobs focus on stateless or easily rebuildable VMs where immutable data is stored externally (e.g., on TrueNAS, Nextcloud data, or other locations).
—
Scenario: single VM failure
1. Identify affected VM in Proxmox UI. 2. In PBS UI: * Go to **Datastore → pbs-main → Content → VM group**. * Select the latest successful backup. 3. Choose **Restore**: * Target node: original host (or alternate host if needed) * Disk storage: appropriate local storage (`local-lvm`, `apps-pool`, etc.) 4. Start VM in Proxmox and validate: * Application health checks (web UI, API, etc.) * Network connectivity (LAN, DNS, etc.)
Scenario: node loss (pve1 or pve2)
1. Replace/fix hardware and reinstall Proxmox VE. 2. Rejoin node to `torres-cluster`. 3. Recreate necessary storages pointing at local disks. 4. From PBS, restore VMs to the rebuilt node using the procedure above.
Scenario: PBS VM lost but TrueNAS datastore intact
1. Recreate PBS VM from Proxmox template. 2. Reattach existing `pbs-main` datastore on TrueNAS. 3. PBS will rediscover existing backups. 4. Resume normal operations.
Monitoring in TorresVault is layered:
Operational practice:
1. Decide which node (pve1 vs pve2) based on workload: * Storage-heavy → whichever has more free disk * Media or GPU heavy (later) → pve2 2. Create VM in Proxmox: * Attach to `vmbr0` for LAN access * Store disks on `local-lvm`, `apps-pool`, or `VM-pool` 3. Install OS and configure network. 4. In **PBS**, add VM to an existing or new backup group. 5. Verify first backup completes successfully.
—
Proxmox nodes:
1. Live-migrate or gracefully shut down VMs on the target node if needed. 2. `apt update && apt full-upgrade` on the node (via console or SSH). 3. Reboot node. 4. Verify: * Corosync quorum healthy * VMs auto-started where expected
PBS & TrueNAS:
—
Planned maintenance that requires full stack shutdown:
Shutdown order:
1. Application VMs (web, Nextcloud, Immich, Jellyfin, etc.) 2. Monitoring VMs (Kuma, Prometheus) 3. PBS VM 4. TrueNAS VM 5. pve2 node 6. pve1 node (last Proxmox node) 7. Network gear / UPS if necessary
Power-up order:
1. Network gear & UPS 2. pve1 and pve2 nodes 3. TrueNAS VM 4. PBS VM 5. Core apps (web, Nextcloud, Immich, Jellyfin, n8n, NPM, wiki) 6. Monitoring stack (Kuma, Prometheus/Grafana)
This order ensures that storage is ready before PBS, and PBS is ready before depending VMs (if any use backup features like guest-initiated restore).
These are acceptable for a home lab / prosumer environment but are captured here explicitly for future planning.
The following items are out of scope for this document but are tracked on the roadmap:
See: roadmap (to be created).