The Generative Burrito Test

A CRITICAL benchmark for image generation models

This was originally inspired by the horse riding astronaut meme way back in 2023. But I think Simon's Pelican benchmark is what keeps the idea alive for me, even though they are testing different modalities. Burritos are obviously more important than both pelicans and equestrian absurdism.

Also, I was initially surprised that it couldn't replicate the image well because I assumed there would be plenty of similar examples in the training data (unlike said equestrian absurdity). But I think it's a bit of a weird concept because all the ingredients get smushed and smashed and congealed.

All images generated using fal defaults. Obviously you can probably prompt it better, but that's HIL effort, and feels like cheating.

The Prompt

A partially eaten burrito with cheese, sour cream, guacamole, lettuce, salsa, pinto beans, and chicken.