Skip to content

Intro to WorldJen

WorldJen is a unified platform for evaluating video and world models. It helps researchers and builders run accurate, comparable benchmarks so you can see where your model excels and where it needs improvement.

What it does

  • Connect your GPU — Install a lightweight runner on your own machine or cloud instance. Your hardware stays under your control.
  • Pick models and dimensions — Connect models via Hugging Face (and other sources). Choose exactly which evaluation dimensions matter for your use case.
  • Get actionable results — See per-dimension scores, video-level breakdowns, and exports so you can act on the data.

Who it's for

WorldJen is built for teams working on video generation and world models who want:

  • Consistent, repeatable evaluations across runs and models
  • Clear metrics (motion quality, physics, prompt adherence, etc.) instead of a single opaque score
  • The ability to run evaluations on their own infrastructure

How it fits together

WorldJen splits evaluation into three concepts so you can pick the right level of effort:

  • Score — upload one clip, get raw dimension scores. The fastest sanity check.
  • Rank — upload several clips for one prompt; see which variant wins on each dimension.
  • Bench — full benchmark runs across many prompts and one or more models, executed on a GPU runner. The dashboard's main flow.

See Concepts for when to use which.

Docs map

NeedStart here
Score vs Rank vs BenchConcepts
First run in the dashboardHow to Use
End-to-end Python exampleTutorial: LTX-2
Programmatic Python usagePython SDK
Terminal and runner commandsCLI
Side-by-side curl/SDK/CLI examplesSDK, CLI, and REST
Direct HTTP integrationREST API
Agent-driven automationAI Agent Integration