Human evaluation of AI image models for YouTube thumbnail generation. Models are tested on prompt-following using production thumbnail templates.
Text-to-image model rankings based on ai thumbnail generation.
Each model creates thumbnails from production TubeSalt templates using identical prompts and default API settings. Thumbnails are typically scored against 10-15 criteria: anatomical accuracy (hands, face, body), skin quality, text and graphics quality, spelling, legibility, composition, framing and prompt-matching. The leaderboard shows average scores across multiple template generations.
| Rank | Model | Score (AVG@10) | Organization |
|---|---|---|---|
| 1 | Imagen 4 Preview | 90.7% | |
| 2 | Flux Pro Kontext Max | 88.0% | Black Forest Labs |
| 3 | Flux Pro Kontext | 86.0% | Black Forest Labs |
| 4 | Ideogram V3 | 80.2% | Ideogram |
| 5 | Seedream V4 | 79.9% | ByteDance |
Get notified when we publish new benchmark results and model comparisons.