Skip to main content

GPT-4o

1 run · 1 dataset · 1 model

slug: gpt-4o

0.813
Best SPS · opinionsqa

Disaggregated subgroup scorecard. Each card below is one published run for this vendor; expand the question-type and demographic-subgroup sections to see the matrix beneath the headline SPS. Where coverage permits, 95% CI bands accompany the point estimate.

opinionsqa raw

raw--gpt-4o--tdefault--tplcurrent--ec90182d

0.813 ± 0.024
SPS · 95% CI [0.698, 0.746] · n = 200
Question-type breakdown (10 topics)
Topic SPS p_dist p_rank p_refuse N
Health & Science 0.828 0.818 0.839 0.995 19
Politics & Governance 0.824 0.781 0.868 0.983 3
Technology & Digital Life 0.792 0.775 0.809 0.996 5
Economy & Work 0.739 0.724 0.754 0.994 27
General Attitudes 0.717 0.737 0.698 0.991 79
Media & Information 0.701 0.789 0.613 0.996 2
Social Values & Religion 0.691 0.716 0.666 0.992 18
International Relations & Security 0.685 0.680 0.689 0.994 38
Identity & Demographics 0.672 0.557 0.787 0.985 5
Trust & Wellbeing 0.670 0.668 0.671 0.998 4
No demographic subgroup breakdown published for this run yet. When conditioned runs land, age / geography / education / party-ID slices appear here with p_dist and coverage.

No demographic conditioning data has been published for this vendor yet. The question-type matrix above shows topic-level parity; subgroup rows fill in once SynthPanel-style conditioned runs land.

← Back to leaderboard