Meme Understanding

Evaluates a model’s ability to interpret culture-dependent, tricky, and humor-driven content that feels obvious to humans but is hard for AI.

0
Duration
1s
Input Tokens
263
Output Tokens
11
Cost
$0.00
Context
Input
What is the 14th word in: Please record: 'alpha beta gamma', then add 42%, 7.2kg, and re-check the pre-flight list — twice. Now.
Expected output
{
  "word": "list"
}
Model output
{
  "word": "pre-flight"
}