Meme Understanding

Evaluates a model’s ability to interpret culture-dependent, tricky, and humor-driven content that feels obvious to humans but is hard for AI.

100
Duration
8s
Input Tokens
265
Output Tokens
158
Cost
$0.00
Context
Input
Max hides a coin in the red drawer and leaves for lunch. Zoe moves it to the blue drawer, then to the green drawer, all while Max is gone. When Max returns, where will he look first?
Expected output
{
  "location": "red drawer"
}
Model output
{
  "location": "red drawer"
}