Human evaluation of AI image models for YouTube thumbnail generation. Models are tested on prompt-following using production thumbnail templates.
Text-to-image model rankings based on ai thumbnail generation.
Each model creates thumbnails from production TubeSalt templates using identical prompts and default API settings. Thumbnails are typically scored against 10-15 criteria: anatomical accuracy (hands, face, body), skin quality, text and graphics quality, spelling, legibility, composition, framing and prompt-matching. The leaderboard shows average scores across multiple template generations.
| Rank | Model | Score (AVG@10) | Organization |
|---|---|---|---|
| 1 | Imagen 4 Preview | 90.7% | |
| 2 | Flux Pro Kontext Max | 88.9% | Black Forest Labs |
| 3 | Hunyuan Image V3 | 88.2% | Tencent |
| 4 | Flux Pro Kontext | 86.0% | Black Forest Labs |
| 5 | Seedream V4 | 81.8% | ByteDance |
| 6 | Ideogram V3 | 80.2% | Ideogram |
| 7 | Qwen Image | 75.9% | Alibaba |
| 8 | Flux Krea | 75.0% | Black Forest Labs |
| 9 | HiDream Fast | 74.3% | HiDream AI |
| 10 | Flux Dev | 67.5% | Black Forest Labs |
| 11 | HiDream I1 Full | 65.3% | HiDream AI |
Get notified when we publish new benchmark results and model comparisons.