Banner of Engineering a Deterministic Audio Boundary Marker with Python and Curses

Engineering a Deterministic Audio Boundary Marker with Python and Curses


Category: Programming » Python

📅 June 09, 2026   |   👁️ Views: 1

Author:   mosaid

I needed to split a long Quran recitation MP3 into one file per verse – precisely, with boundaries that matched the reciter’s pauses, not just fixed durations. Audacity could do it manually, but the process was tedious and error‑prone. I wanted a deterministic, keyboard‑driven workflow that I could pick up and resume anytime, with export pipelines that never left me guessing which segment went where.

That’s how verse_marker.py came to be: a terminal application built with Python, curses, VLC, and FFmpeg. It’s not a general‑purpose audio editor; it’s a focused tool for interactively placing markers, editing them, and splitting audio according to a predictable naming scheme. The whole thing fits in about 10 source files and has no external database – everything is JSON on disk.

Architecture Overview

The codebase is organised into layers that each own a single responsibility:

Player layer (player.py) – abstracts audio playback behind a common interface. The only implementation uses VLC because its Python bindings give precise seek and reliable duration reporting.

Marker persistence (markers.py) – manages marker state as a @dataclass, serialised to a JSON file with backup rotation. The same module provides undo/redo and a VerseSession that normalises boundaries and handles segment queries.

UI layer (ui.py) – a single CursesUI class that owns the event loop, renders the interface, and orchestrates user actions. It’s the largest file (≈36 kB) but still only one responsibility: translating key presses into state mutations.

Splitter (splitter.py) – wraps FFmpeg with progress callbacks, using -ss/-to for gapless cuts.

Exporter (exporter.py) – produces auxiliary output: a segments.json array and an FFmetadata chapters file, both useful for further scripting.

Config (config.py) – persists user preferences (prefix, verse count, output directory) in a simple JSON file.

The layers are composable: the entrypoint (verse_marker.py) wires them together, and a headless mode (--split or --export) skips the UI entirely, reusing the same marker and splitter components.

Marker State and Persistence

Markers are just a sorted list of floating‑point seconds, always starting with 0.0. The MarkerState dataclass also stores the expected number of verses, the label rule for the first segment (verse 000, basmalah, or verse 001), and a resume position. Saving uses a write‑to‑temp‑then‑replace strategy to avoid corruption on a crashed terminal. Backup rotation keeps up to 5 previous copies, which saved me more than once when I accidentally deleted a boundary near the end of a 45‑minute recitation.

UI Design: Curses That Respects Muscle Memory

The interface is a single‑screen dashboard. Every action is a single keypress, with no modal dialogs that trap the cursor. The bottom of the screen always shows the full keymap – a deliberate choice to make the tool self‑documenting even for infrequent use.

Key design decisions:

Guided marking mode (default): when Enter is pressed near an existing marker (within a configurable window, here 1 s), the tool asks whether to replace the existing boundary or insert a new one. This removes the need for manual correction passes – you can refine boundaries while listening, and the tool keeps the marker list clean.

Undo/redo with a stack of 50 snapshots. I implemented it as a simple list of boundary arrays, which is cheap because the data is tiny. Each destructive operation pushes a snapshot first.

Loop and preview: you can set A/B loop points and preview a single segment without leaving the main view. This is invaluable when verifying the transition between two verses.

Segment list view (v key) that shows every segment’s start, end, duration, and filename, with scrolling. It’s a sanity‑check table before you hit x to split.

Export and Reproducibility

Markers aren’t just for splitting. The exporter produces segments.json and an FFmetadata chapters file, which means you can use the same marker data to:

  • Inject chapter metadata into the original MP3 with ffmpeg -i in.mp3 -i chapters.ffmeta -map_metadata 1 out.mp3
  • Drive scripted workflows (e.g., generating timestamps for a web player)
  • Share marker sets so others can reproduce your exact splits

The JSON output looks like this:



[
  {"start": 0.0, "end": 12.345, "duration": 12.345, "name": "022001.mp3"},
  {"start": 12.345, "end": 24.678, "duration": 12.333, "name": "022002.mp3"}
]

Code Snippets That Illustrate the Core

Adding a boundary in guided mode (inside VerseSession):



def add_boundary(self, t: float) -> int:
    t = max(0.0, float(t))
    self.state.boundaries.append(t)
    self._normalize()
    i, _ = self.nearest_boundary(t, exclude_zero=False)
    return int(i or 0)

The undo/redo stack is remarkably simple because boundaries are just a list:



class UndoRedo:
    MAX_STACK = 50

    def __init__(self):
        self.undo_stack: List[List[float]] = []
        self.redo_stack: List[List[float]] = []

    def push(self, boundaries: List[float]) -> None:
        self.undo_stack.append(list(boundaries))
        if len(self.undo_stack) > self.MAX_STACK:
            self.undo_stack.pop(0)
        self.redo_stack.clear()

    def undo(self, current: List[float]) -> Optional[List[float]]:
        if not self.undo_stack:
            return None
        self.redo_stack.append(list(current))
        return self.undo_stack.pop()

And the export pipeline that ties it all together (from exporter.py):



def export_segments_json(self, out_path: Path, segments: List[Tuple[float, float]],
                         names: Optional[List[str]] = None) -> None:
    data = []
    for i, (ss, ee) in enumerate(segments, start=1):
        row = {"start": ss, "end": ee, "duration": ee - ss}
        if names:
            row["name"] = names[i - 1]
        data.append(row)
    out_path.write_text(json.dumps(data, ensure_ascii=False, indent=2),
                        encoding="utf-8")

Why Not Use Existing Tools?

When I started this project, I evaluated the alternatives honestly:

Audacity – powerful but GUI‑only. I wanted a keyboard‑only workflow that I could run over SSH or in a tmux session on a headless machine. Exporting exactly 78 labelled MP3s from Audacity still required manual clicks for each label.

Sonic Visualiser – great for analysis, not for splitting. Its label export is flexible, but the workflow of marking boundaries while playback continues is clunky.

pydub / librosa – I could script splitting with them, but I’d lose the interactive, auditory feedback that’s essential for catching the exact pause between two verses. Purely visual waveform inspection misses recitation‑style nuances.

Other purpose‑built Quran splitters – most are either Windows‑only, abandonware, or tie the user to a specific reciter’s timing.

This tool fills a precise gap: a deterministic, terminal‑first, keyboard‑driven workflow that never touches a mouse and leaves behind a reusable marker file. The core logic (marker session, undo, splitter) is completely decoupled from the UI, so it could be repurposed for any audio segmentation task where you need to mark boundaries by ear.

Is It Worth Putting on GitHub?

Yes – with realistic expectations. The tool is production‑grade for my personal needs, but its audience is niche: people who need to split Quranic recitations precisely and are comfortable on a Linux terminal. There are certainly “better” tools if you need a full‑blown audio workstation, but none that combine an interactive curses session with automatic backup rotation, guided correction, and scriptable export formats.

On GitHub, the project can serve as:

  • A reference implementation for anyone building a curses‑based audio tool.
  • A reproducible specification: given the same MP3 and the marker JSON, anyone can re‑split and verify the result.
  • A seed for contributors who might add visual waveform display (e.g., via asciimatics), support for other audio formats, or integration with metadata providers.

I don’t expect thousands of stars. But a tool that solves a real, well‑defined problem with clean architecture and zero‑friction installation belongs in the open. If one other person uses it to split a single recitation without losing their patience, it’s worth the push.

Future Directions

A few things I might add if the need arises:

Waveform thumbnail in the terminal – just a rough amplitude bar, enough to see gaps visually.

Multiple recitation profiles – sometimes the same surah is recited with different timing; marker sets could be versioned.

Validated export – after splitting, the tool could re‑encode all segments to verify they play correctly and report mismatches.

Final Thoughts

This project is not an attempt to replace Audacity or build a general‑purpose editor. It’s an engineer’s answer to a repetitive task: turn a long MP3 and a set of listening‑derived timestamps into 78 well‑named files, quickly, accurately, and with a complete audit trail. The architecture reflects that single‑minded goal: layered, deterministic, and trivially scriptable.

The source code is available on GitHub. If you do any work with audio segmentation that benefits from a keyboard‑first interface, give it a try – or take the patterns that make sense for your own tools.


← Engineering the Matrix Rain: A Deterministic SVG Animation Component for My Pelican Banner Generator

Leave a comment