Two pane image. On the left pane, a solid tan short hair Chihuahua with the text "left pane" above it. Dark contrasting background. On the right pane, a long hair Chihuahua with darker fur and the text "right pane" above it. Light contrasting background.

Thumbnail Bench

Human evaluation of AI image models for YouTube thumbnail generation. Models are tested on prompt-following using production thumbnail templates.

Leaderboard Sample Evals Report

Leaderboard

Text-to-image model rankings based on ai thumbnail generation.

How We Evaluate

Each model creates thumbnails from production TubeSalt templates using identical prompts and default API settings. Thumbnails are typically scored against 10-15 criteria: anatomical accuracy (hands, face, body), skin quality, text and graphics quality, spelling, legibility, composition, framing and prompt-matching. The leaderboard shows average scores across multiple template generations.

Rank	Model	Score (AVG@10)	Organization
1	Imagen 4 Preview	90.7%	Google
2	Hunyuan Image V3	88.2%	Tencent
3	Flux Pro Kontext Max	88.0%	Black Forest Labs
4	Flux Krea	87.5%	Black Forest Labs
5	Flux Pro Kontext	86.0%	Black Forest Labs
6	Ideogram V3	80.2%	Ideogram
7	Seedream V4	79.9%	ByteDance
8	Qwen Image	75.9%	Alibaba
9	HiDream Fast	74.3%	HiDream AI
10	Flux Dev	67.5%	Black Forest Labs
11	HiDream I1 Full	65.3%	HiDream AI

Want new model benchmarks and leaderboard updates?

Get notified when we publish new benchmark results and model comparisons.