Meme Understanding

Evaluates a model’s ability to interpret culture-dependent, tricky, and humor-driven content that feels obvious to humans but is hard for AI.

100
Duration
1s
Input Tokens
286
Output Tokens
19
Cost
$0.00
Context
Input
Max hides a coin in the red drawer and leaves for lunch. Zoe moves it to the blue drawer, then to the green drawer, all while Max is gone. When Max returns, where will he look first?
Expected output
{
  "location": "red drawer"
}
Model output
{
  "location": "red drawer"
}