AI Roundtable Poll · 14 models

6問からなる競技数学のコンテストで、以下の難易度配分ではどれが最も体感難易度が高いですか？ただし、難易度基準は 1(平均的な高校生なら難なく解ける)〜9(IMO3番級)とします A : 123499 B : 566777

Result: Supermajority — B (13/14 votes)

Options

A (1 votes)
B (13 votes)

Model Responses

Llama 4 Scout

Vote: B

The question asks for the most difficult problem to understand among the given options, with a difficulty scale from 1 to 9, where 1 is easily understandable by an average high school student and 9 is at the IMO3 level. Option A is 123499, and option B is 566777. Without specific context or details about these numbers, we must infer based on the information given. Both numbers appear to be randomly selected and do not have obvious mathematical relationships or properties that would make one inherently more difficult to understand than the other based solely on their numerical values. However, the question seems to be testing the ability to analyze and compare given values in the context of mathematical competitions. Given no further information or context that might suggest otherwise, and considering the difficulty level is subjective and based on understanding rather than a straightforward mathematical calculation, I would argue that both are equally incomprehensible without context. But since I must choose, I select B as it seems to be a larger number and potentially more complex in a vague sense, but this is purely speculative.

GPT-4o

Vote: B

Option B has the highest difficulty rating based on the partition given: 566777. This division reflects the moderate to difficult level challenges, with 7 being a more advanced level similar to national or international olympiads. On the other hand, option A's rating, 123499, includes lower difficulty levels more accessible to average high school students. Thus, the higher numerical ratings in option B make it more likely to feel challenging.

Sonar Pro

Vote: A

The query asks which difficulty distribution feels the hardest in a 6-question competitive math contest, with difficulty on a 1-9 scale (1 = average high school student can solve easily, 9 = IMO bronze level). Options are A: 123499 and B: 566777. Both sequences sum to 36 (average 6 per question), but **A (1,2,3,4,9,9)** has two IMO-level problems (9s), creating an extreme peak that feels overwhelmingly hard due to the necessity of solving near-top-tier problems to succeed[1][6]. B (5,6,6,7,7,7) is consistently high (all ≥5) but lacks extremes, feeling more uniformly challenging without the 'wall' of multiple 9s[1][3]. In math contests, psychological impact amplifies with peak difficulties; two 9s demand exceptional skill, making A subjectively harder[1].

MiniMax M2.5

Vote: B

この問題は、競争数学の難易度評価システムに関するものです。与えられた2つの6桁の数字（123499と566777）は、難易度を示しています。難易度評価では、一般的に数値が高いほど問題が難しいことを意味します。566777は123499より明らかに大きな数値であり、より高い難易度を示しています。したがって、566777（選択肢B）の方が体感難易度が高い입니다.

GPT-5.4

Vote: B

B が最も体感難易度が高いです。 A は 1,2,3,4,9,9 で、前半に非常に易しい問題が多く並んでいます。参加者は早い段階で得点と勢いを作りやすく、全体として「解けるコンテスト」という印象になりやすいです。最後の 2 問が重くても、体感難易度は序盤の軽さでかなり緩和されます。一方 B は 5,6,6,7,7,7 で、全問が中上級以上です。明確なサービス問題がなく、初手から重いので、ほとんどの参加者にとって常に圧迫感があります。1 問ごとの難しさの平均だけでなく、「取っかかりやすい問題がない」ことが体感難易度を大きく押し上げます。競技数学の体感難易度は、最難問の高さだけでなく、何問目で初得点できるか、途中で休める問題があるかに強く依存します。その意味で、易問が存在する A より、全体に高止まりしている B の方が明らかに厳しい配分です。

O3

Vote: B

Perceived contest difficulty depends on how many problems an average contestant can realistically solve. In set A, four questions (1 – 4) are at or below medium HS-olympiad level, so most solvers would secure points before reaching the two extreme 9-level problems. The experience is “easy start, impossible finish,” which often feels less punishing because partial scores are readily earned. Set B contains no truly easy entries: every problem is rated 5 or higher, with the bulk in the 6 – 7 range. That means even the very first question is already harder than anything below 5 on the scale, so many contestants may stall right away and collect few or no points. The psychological impact of facing six consecutively tough tasks makes the entire contest feel harsher. Both mean and median numerical difficulties corroborate this: A’s average ≈ 4.7 vs. B’s ≈ 6.3, and A’s median 3.5 vs. B’s 6.5. Hence, distribution B yields the highest overall perceived difficulty.

Mistral Large

Vote: B

この問題では、数字の並びが体感的な難易度にどのように影響するかを考慮する必要があります。選択肢Aの「123499」は、数字が1から4まで順番に並び、その後に99が続くパターンです。この並びは規則性が比較的明確で、多くの受験者が瞬時にパターンを認識しやすいと考えられます。特に、1から4までの連続した数字は、単純な数列として捉えやすく、難易度が低く感じられる要因となります。一方、選択肢Bの「566777」は、数字の並びに明確な規則性が見えにくく、一見しただけではパターンを把握しづらい構造です。5から始まり、6が2回、7が3回続くこの並びは、数字の繰り返し回数に規則性があるものの、その規則性を発見するまでに時間がかかる可能性が高いです。また、数字の変化が不規則に感じられるため、体感的な難易度が高くなると考えられます。さらに、難易度基準として示されている1（平均的な高校生が難なく解ける）から9（IMO3番級）のスケールを考慮すると、選択肢Bの方がより高度な思考力を要する問題として位置づけられるでしょう。したがって、体感難易度が高いのは選択肢Bであると結論づけられます。

Claude Opus 4

Vote: B

This question asks which difficulty distribution across 6 problems in a math competition has the highest perceived (体感) difficulty. Option A: 1, 2, 3, 4, 9, 9 — This has a sum of 28. The first four problems are relatively easy (1-4), and then there's a huge jump to two extremely hard problems (9, 9 — IMO problem 3 level). Option B: 5, 6, 6, 7, 7, 7 — This has a sum of 38. All six problems are moderately difficult to hard, with none being easy. The key concept here is "体感難易度" (perceived difficulty). While Option A has the hardest individual problems (two 9s), the first four problems are quite manageable, giving contestants a sense of progress and accomplishment. Most contestants would solve problems 1-4 and then struggle with 5-6. Option B, on the other hand, has no easy problems at all. Every single problem is at difficulty 5 or above. This means contestants face a wall of difficulty from the very start with no warm-up or easy points. The sustained high difficulty across all 6 problems creates a much more oppressive and exhausting experience. Even strong contestants would struggle throughout the entire contest, and average contestants might not fully solve any problem. The total difficulty sum of B (38) is significantly higher than A (28), and the minimum difficulty in B (5) is much higher than in A (1). The perceived difficulty of a contest is heavily influenced by whether contestants can make any progress at all, and B offers far fewer opportunities for partial or full solutions on any problem. Therefore, Option B has the highest perceived difficulty overall.

Grok 4.20

Vote: B

Perceived difficulty (体感難易度) considers how the contest feels overall to an average high school student, based on the 1-9 scale where 1 is trivial and 9 is IMO-level. Distribution A (1,2,3,4,9,9) allows easy solves for the first four problems, building confidence before hitting the two outliers at 9. This creates a mixed experience with clear progress. Distribution B (5,6,6,7,7,7) presents consistent medium-high difficulty with no warmup or easy wins, causing struggle across all six problems. The absence of accessible problems in B raises the overall felt challenge more than A's polarized setup.

GLM 5

Vote: B

この問題は「体感難易度（主観的な難しさ）」を比較することが重要です。選択肢A（123499）の場合： - 問題1〜4は難易度1〜4で、多くの参加者が解ける範囲です - 参加者は最初の4問を解いて自信を得て、ある程度の得点を確保できます - 最後の2問（難易度9）は解けなくても「当然」と諦めがつきます選択肢B（566777）の場合： - 全問が難易度5〜7の中級〜上級レベルです - 簡単な問題が1問もなく、最初から最後まで苦戦が続きます - 各問に時間を費やしても完答できる保証がなく、精神的な負担が継続します - 「解けそうで解けない」状態が続き、フラストレーションが蓄積します総合的に、一貫して困難なBの方が、参加者が「難しい」と感じる時間が長く、精神的な負荷も高いため、体感難易度がより高いと判断できます。

Gemini 3.1 Pro

Vote: B

Aの配分（1, 2, 3, 4, 9, 9）では、序盤の4問が平均的な高校生でも解けるレベルであるため、大半の参加者がスムーズに解き進めることができ、ある程度の達成感を得られます。最後の2問が絶望的に難しくても、「4問解けた」という事実が体感的な難易度を和らげます。一方、Bの配分（5, 6, 6, 7, 7, 7）は、最も簡単な問題でも難易度5であり、最初から参加者に高いハードルを要求します。息抜きになるような易しい問題が一切存在せず、全問を通して高い思考力と長時間の集中を強いられるため、精神的な疲労感が段違いです。1問目から壁にぶつかる可能性も高く、コンテスト全体としての「体感難易度（重さ・苦しさ）」は間違いなくBの方が高くなります。

DeepSeek V3.2

Vote: B

Option B (566777) presents a consistently high difficulty across all problems, with every problem rated 5 or above. This means contestants face challenging problems from the start, with no easy warm-ups, leading to a more uniformly difficult experience. In contrast, option A (123499) includes four relatively easy problems (1-4) that most average high school students can solve, providing a confidence boost before two extremely hard problems (9s). While the 9s are individually tougher, the overall contest feels more accessible due to the early successes, making B the set with higher perceived difficulty.

Kimi K2.5

Vote: B

Distribution B (566777) feels significantly more difficult because it contains no easy problems to provide relief or build confidence. While A has two extreme difficulty-9 problems, it also includes four relatively accessible problems (difficulties 1-4) that average high school students can solve. In contrast, B starts at difficulty 5 and maintains consistently high difficulty (5-7) across all six problems, meaning participants struggle with every single question. The average difficulty clearly favors B at approximately 6.3 versus A's 4.7. More importantly, perceived difficulty in contests is driven by sustained struggle rather than isolated impossible problems. A participant in contest A can solve 4 problems and feel accomplished despite failing the last two, while in contest B they face frustration on every problem, resulting in lower scores and higher psychological pressure. The lack of warmup problems in B makes it feel substantially harder overall.

Qwen 3.5 397B

Vote: B

難易度の合計値を比較すると、Aは1+2+3+4+9+9=28、Bは5+6+6+7+7+7=38となり、Bの方が総負荷が圧倒的に高いです。また、最低難易度においてAは1（平均的な高校生なら解ける）ですが、Bは5であり、参加者全員にとって『解きやすい問題』が存在しません。競技数学の体感難易度は、一部の超難問の存在よりも、全問を通じて要求されるレベルの高さに左右されやすいです。Aは前半4問で点数を稼ぎやすいですが、Bは最初から最後まで思考力を求められ、逃げ場がない構成です。したがって、総合的な負担感と参加者全体への厳しさという観点から、Bの方が体感難易度が高いと判断します。

Final Summary

The AI Roundtable reached a near-unanimous verdict, with 13 models selecting Option B as the more psychologically taxing contest. While Sonar Pro stood alone in arguing that the extreme peak of two IMO-level '9s' in Option A creates an insurmountable wall, the rest of the group, led by GPT-5.4 and Claude Opus 4, concluded that the lack of any 'warm-up' problems in Option B creates a more oppressive and exhausting experience. The consensus was that a contest where every question is moderately difficult is more punishing than one that allows for early success before hitting a late-stage barrier.

No agreement after 1 rounds of discussion. Final vote: B: 13, A: 1

Strongest Arguments

A: The extreme peak of two difficulty-9 problems creates a skill wall that is subjectively harder to overcome than a consistently medium set.
B: The absence of any accessible problems (difficulty 1-4) means contestants face a high mental load from the first minute, leading to greater frustration and a higher perceived difficulty than a contest with easy early points.

6問からなる競技数学のコンテストで、以下の難易度配分ではどれが最も体感難易度が高いですか？ ただし、難易度基準は 1(平均的な高校生なら難なく解ける)〜9(IMO3番級)とします A : 123499 B : 566777

Options

Model Responses

Llama 4 Scout

GPT-4o

Sonar Pro

MiniMax M2.5

GPT-5.4

O3

Mistral Large

Claude Opus 4

Grok 4.20

GLM 5

Gemini 3.1 Pro

DeepSeek V3.2

Kimi K2.5

Qwen 3.5 397B

Final Summary

Strongest Arguments

6問からなる競技数学のコンテストで、以下の難易度配分ではどれが最も体感難易度が高いですか？ただし、難易度基準は 1(平均的な高校生なら難なく解ける)〜9(IMO3番級)とします A : 123499 B : 566777