# VO Booth Prototype — Full Session Log

**Project:** VO Booth — Browser-based DAW-like recording platform for dubbing/localization
**Location:** `~/Documents/CLAUDE_ENVIRONMENTS/SANDBOX/VO-BOOTH-PROTOTYPE/`
**Main file:** `vo-booth.html` (single-file prototype, all HTML/CSS/JS)
**Server:** `serve.py` (Range-aware HTTP server on port 8765)
**Launch:** `cd ~/Documents/CLAUDE_ENVIRONMENTS/SANDBOX/VO-BOOTH-PROTOTYPE && python3 serve.py` → open `http://localhost:8765/vo-booth.html`

---

## All Features Built (Cumulative)

### 1. Core Structure
- Two-panel layout: sidebar (episode/cue list) + main viewer (video, waveform, controls)
- JSON-driven cue sheet loading (`script_batch1.json`)
- Waveform visualization from `waveform.json` rendered on Canvas 2D with DPR-aware sizing
- Video player with `<video>` element, currentTime seeking, Range request support

### 2. Green Focus Section + Fade-Out
- The active recording region is highlighted green on the waveform so the user knows exactly where they're focused
- Customizable fade-out at end of focus section (replaces post-roll entirely)
- Fade-out gradient drawn from group.end forward
- `GLOBAL_FADE_MS = 1500` default, adjustable via UI slider (0ms = hard stop, up to full fade)
- `getPostroll()` returns `GLOBAL_FADE_MS` — fade IS the tail, no separate post-roll

### 3. Auto-Join Cues
- Inverse punctuation logic: cues that DON'T end with sentence-ending punctuation (. ! ?) auto-join with the next cue
- Ellipsis (... or …) also triggers auto-join
- `SENTENCE_END_RE = /[.!?]["'\u201D\u2019)]*$/` and `ELLIPSIS_RE = /\.{2,}$|…$/`
- Toggle button: "Auto-join: ON/OFF"
- Users can manually unjoin any auto-joined pair; manual unjoins are tracked in `userUnjoined` Set
- `joinedWithNext` Set tracks all active joins

### 4. Live Mic Recording Waveform
- When recording, a second waveform canvas ("YOUR RECORDING") appears beneath the reference waveform
- Web Audio API pipeline: `MediaStreamSource → recGainNode → recDestNode` for recording
- `recAnalyser` with `getByteTimeDomainData` feeds live waveform visualization
- Peaks stored keyed to `video.currentTime` (not `performance.now()`) to stay in sync
- Canvas optimized: only resizes when dimensions actually change, uses `setTransform` + `clearRect`

### 5. Audio Device Management
- Mic selection via `navigator.mediaDevices.enumerateDevices()`
- Output routing via `setSinkId()` on the video element
- Monitor toggle (hear yourself while recording) via `monitorNode` GainNode
- Mic level meter using AnalyserNode
- AudioContext suspended state handling (browser autoplay policy) with click-to-resume fallback

### 6. Time-Aligned Subtitles with Word-by-Word Highlighting
- Subtitle text positioned in `#phraseOverlay` directly beneath waveform, aligned to waveform zones
- Per-cue boxes positioned absolutely using timecode-to-pixel math
- Word-by-word yellow highlight tracks playback via `updateActiveSyllable()`
- Each `<span class="word">` has `data-t0` and `data-t1` attributes for timing
- CSS: active words get white color, golden text-shadow, subtle background glow

### 7. Persistent Mini Mixer
- Always-visible mixer bar with DUB and TAKE volume sliders
- Independent volume control for reference dub track and recorded take
- Mute buttons for each channel
- A/B playback button to hear both together
- Mixer respects fade logic — during A/B playback, dub volume is direct (no fade applied)

### 8. Auto-Save Takes + Channel Ping-Pong
- Clicking "Next" auto-saves the current take to session storage
- Take metadata includes: `group_in_ms`, `record_out_ms`, `fade_in_ms`, `fade_out_ms`, `channel`
- Channel ping-pong: when consecutive takes overlap within 200ms (`CROSSFADE_MARGIN_MS`), the new take goes on the alternate channel (Ch1 ↔ Ch2)
- `assignChannel()` checks the last take's `record_out_ms` vs current group's start time

### 9. Scrollable Working Area
- `.viewer` section uses `overflow-y: auto` so all content (video, waveform, subtitles, rec waveform, mixer) is accessible via scroll
- Video wrap uses fixed height (`36vh`) instead of flex-grow to prevent content compression

---

## Bugs Fixed

| Bug | Root Cause | Fix |
|-----|-----------|-----|
| `getTimeDomainData is not a function` | Wrong Web Audio API method name | Changed to `getByteTimeDomainData` everywhere |
| Canvas lag during recording | Canvas `.width`/`.height` reset every rAF frame causing GPU flush | Only resize when dimensions change, use `setTransform` + `clearRect` |
| Rec waveform not showing | Peaks keyed to `performance.now()` drifted from video time | Store `video.currentTime` directly |
| AudioContext suspended — mic meter dead | Browser autoplay policy blocks AudioContext | Call `audioCtx.resume()` in `buildAudioGraph()` + click-to-resume fallback |
| `postRollVal` null reference | Two JS references to removed DOM element after removing post-roll UI | Removed both references |
| Fade gradient wrong position | Was backing in from zone end instead of starting at group.end | Fixed to draw from postX forward |
| Word highlight not working | `updateActiveSyllable` used single `foundActive` var across duplicate word sets from `#phraseOverlay` and `#cuePhrase`; last match (hidden cuePhrase) stole active state | Changed to time-based toggle on every word independently: `el.classList.toggle("active", t >= t0 && t < t1)` |
| Viewer not scrollable | `.viewer` had `overflow: hidden` | Changed to `overflow-y: auto; overflow-x: hidden` |

---

## Key State Variables

```js
let GLOBAL_PREROLL_MS = 2000;        // Pre-roll before cue
let GLOBAL_FADE_MS = 1500;           // Fade-out duration (IS the post-roll)
let joinedWithNext = new Set();       // Cue keys that auto-join with next
let autoJoinCommas = true;            // Auto-join feature toggle
let userUnjoined = new Set();         // Manual unjoins override auto-join
let phraseMode = "subtitle";          // Display mode for phrases
let audioCtx, recSourceNode, recGainNode, recDestNode, recAnalyser, recTimeDomain;
let monitorNode, monitorEnabled = false;
let recPeakHistory = [];              // Peaks keyed to video.currentTime
let mixDubVol = 1, mixTakeVol = 1;   // Mixer volumes
const CROSSFADE_MARGIN_MS = 200;     // Overlap threshold for channel ping-pong
```

---

## Architecture Discussion (Production)

### Current Setup (Dev Only)
- `serve.py` — local Python Range-aware HTTP server on port 8765
- Everything runs in browser, single HTML file

### Production Architecture (Planned)
- **Storage:** S3, Google Cloud Storage, or Cloudflare R2 (R2 = zero egress fees, ideal for large media)
- **Delivery:** CDN (CloudFront or Cloudflare) with Range request support for video seeking
- **Upload flow:** Actor records → browser encodes → uploads directly to cloud via presigned URL → server stores metadata only
- **Playback flow:** Browser requests signed URL from server → streams video/audio directly from CDN
- **Backend role:** Auth, session management, cue metadata, take tracking, presigned URL generation — never touches media files
- **Tech stack suggestion:** Node/Express or Python/FastAPI + PostgreSQL
- **Scale:** Designed for hundreds of actors uploading large files simultaneously

---

## Files in Project

| File | Purpose |
|------|---------|
| `vo-booth.html` | Complete prototype (HTML + CSS + JS, single file) |
| `serve.py` | Range-aware HTTP dev server (port 8765) |
| `launch.command` | macOS double-click launcher for server |
| `script_batch1.json` | Cue sheet data (timecodes, phrases, characters) |
| `waveform.json` | Pre-computed waveform data for visualization |
| `videos/batch1.mp4` | Reference video/audio for dubbing |

---

## How to Resume Development

1. Open Terminal
2. `cd ~/Documents/CLAUDE_ENVIRONMENTS/SANDBOX/VO-BOOTH-PROTOTYPE && python3 serve.py`
3. Open `http://localhost:8765/vo-booth.html` in Chrome
4. All state is in `vo-booth.html` — single file, self-contained
