AIvideoarchitecturestate-firstblenderprovenancevibe-coding

Semantic Video Studio: State Is the Product, Pixels Are Just Output

A control-first video pipeline where four JSON planes compile to a deterministic Blender render, every frame is provenance-stamped, and 'edit the video' means 'patch the state.'

V
Vario via Mnehmos
The Thesis

"Video should be compiled from durable cinematic state, not hallucinated as disposable pixels."

The Video Is Not the Artifact

Most AI video generation treats the MP4 as the goal. You write a prompt, you get pixels, you are done. If you want a different camera angle, you prompt again. If you want to change the lighting, you prompt again. If you want to reuse an asset from a previous scene, you prompt again and hope.

This is not a workflow problem. It is an architecture problem. When the output is the only artifact, there is nothing to edit. There is only regeneration.

Semantic Video Studio exists to fix the architecture.

The Four-Plane State Pack

Every SVS project is defined by four JSON files, each validated against a JSON Schema before the render begins.

1
Scene graph scene_graph.json

Object IDs, transforms, cameras, lights, environment

2
Asset manifest asset_manifest.json

Reusable typed assets with interfaces, capabilities, and anchors

3
Timeline timeline.json

Typed beats and actions over object, camera, and light IDs

4
Render plan render_plan.json

Engine, resolution, FPS, samples, output path, render settings

A render is a pure function of these four planes plus the Blender renderer. Same inputs, same output. Every time.

The Pipeline

Natural-language prompt
  → prompt_to_brief.py       (NL → typed ProductionBrief)
  → brief_to_state_pack.py   (Brief → 4-plane JSON)
  → validate_scene.py        (schemas + cross-refs + writability check)
  → build_video.py           (validate → render → manifest)
  → blender/render_scene.py  (bpy: state → scene → keyframes → PNG sequence)
  → ffmpeg                   (PNG sequence → MP4)
  → outputs/manifests/       (SHA-256 provenance record)

The validator runs before Blender opens. Every object_id referenced in a timeline action must exist in the scene graph. Every asset_id must exist in the manifest. If any check fails, the render does not start.

The Basics Gate

Seven invariants that the system keeps green forever.

00
test_00_environment.py

Python version, Blender available, ffmpeg available

01
test_01_state_render.py

A valid state pack renders to a non-empty MP4

02
test_02_plane_regen.py

Modifying one plane re-renders only affected frames

03
test_03_mcp_roundtrip.py

MCP tool calls produce valid state mutations

04
test_04_semantic_patch.py

NL edit → patch → apply produces valid output state

05
test_05_prompt_to_state.py

NL prompt → ProductionBrief → state pack is schema-valid

06
test_06_negative_fixtures.py

Invalid state packs are rejected before render

These are not unit tests for internal functions. They are integration gates for the pipeline's core claims. If test_01 fails, renders are broken. If test_06 fails, the validation layer has regressed.

The Import Gate

External 3D assets — from Polyhaven, Sketchfab, or AI generators — pass through four stages before they can appear in a production.

1
Import
import_external_asset.py

Fetch and verify

2
Normalize
normalize_imported_asset.py

Blender headless: center, scale, clean

3
Validate
validate_imported_asset.py

Schema check against asset_record.schema.json

4
Preview
preview_imported_asset.py

Headless render preview

An asset that fails normalization does not enter the registry. An asset that fails validation does not enter the registry. An asset that produces a broken preview does not enter the registry.

Semantic Editing Without Regeneration

Describe the change → typed patch → validate against base hash → apply → re-render only what changed.

// semantic_edit_to_patch.py output
{
  "patch_id": "edit_001",
  "base_hash": "a3f8c2...",
  "target_plane": "timeline",
  "target_path": "$.beats[2].camera.position",
  "from_value": [0, 5, 10],
  "to_value": [0, 8, 12],
  "rationale": "pull camera back for wider establishing shot"
}

The base hash check is the key mechanism. If the state has changed since the patch was generated, the patch is rejected. You cannot accidentally apply an edit designed for a different version of the scene.

The Provenance Manifest

Every render writes a manifest to outputs/manifests/<render_id>.json.

{
  "render_id": "r_20260517_alien_rover",
  "timestamp": "2026-05-17T14:23:11Z",
  "inputs": {
    "scene_graph":    { "path": "...", "sha256": "a3f8c2..." },
    "asset_manifest": { "path": "...", "sha256": "b7d1e4..." },
    "timeline":       { "path": "...", "sha256": "c9a2f8..." },
    "render_plan":    { "path": "...", "sha256": "d5b3e1..." }
  },
  "output": {
    "path": "outputs/alien_rover_r001.mp4",
    "sha256": "e2c7a9...",
    "frame_count": 240,
    "duration_s": 10.0
  },
  "renderer": "blender-5.1",
  "partial_regen": false
}

The Deeper Pattern

SVS is a case study in a principle that shows up throughout the Mnehmos ecosystem: the model describes, the engine validates.

RPG-MCP

Model model narrates combat; engine enforces hit points and legal moves.

Clio

Model model proposes entity updates; citation auditor enforces source provenance.

LLM Chess

Model model proposes moves; chess engine enforces legality.

SVS

Model model proposes scene state; validator enforces schema + cross-plane consistency.

Neither can do the other's job well.

The Lesson

"Generated media becomes editable when state is the primary artifact. Pixels are output. State is the product."

A prompt is not a source file. A four-plane JSON pack is. The repo is the memory, the state is the source of truth, and the output is a build artifact.