Intro to WorldJen
WorldJen is a unified platform for evaluating video and world models. It helps researchers and builders run accurate, comparable benchmarks so you can see where your model excels and where it needs improvement.
What it does
- Connect your GPU — Install a lightweight runner on your own machine or cloud instance. Your hardware stays under your control.
- Pick models and dimensions — Connect models via Hugging Face (and other sources). Choose exactly which evaluation dimensions matter for your use case.
- Get actionable results — See per-dimension scores, video-level breakdowns, and exports so you can act on the data.
Who it's for
WorldJen is built for teams working on video generation and world models who want:
- Consistent, repeatable evaluations across runs and models
- Clear metrics (motion quality, physics, prompt adherence, etc.) instead of a single opaque score
- The ability to run evaluations on their own infrastructure
How it fits together
WorldJen splits evaluation into three concepts so you can pick the right level of effort:
- Score — upload one clip, get raw dimension scores. The fastest sanity check.
- Rank — upload several clips for one prompt; see which variant wins on each dimension.
- Bench — full benchmark runs across many prompts and one or more models, executed on a GPU runner. The dashboard's main flow.
See Concepts for when to use which.
Docs map
| Need | Start here |
|---|---|
| Score vs Rank vs Bench | Concepts |
| First run in the dashboard | How to Use |
| End-to-end Python example | Tutorial: LTX-2 |
| Programmatic Python usage | Python SDK |
| Terminal and runner commands | CLI |
| Side-by-side curl/SDK/CLI examples | SDK, CLI, and REST |
| Direct HTTP integration | REST API |
| Agent-driven automation | AI Agent Integration |
