Meme Understanding

Evaluates a model’s ability to interpret culture-dependent, tricky, and humor-driven content that feels obvious to humans but is hard for AI.

0
Duration
1s
Input Tokens
326
Output Tokens
204
Cost
$0.00
Context
Input
Sally takes the bunny from the box and puts it in the hat. She then turns the hat upside down on top of the open cage, afterwards she carefully puts the hat on. Where is the bunny?
Expected output
{
  "location": "in the cage"
}
Model output
{
  "location": "on top of the cage"
}