Skip to main content

Claude Sonnet 4

1 run · 1 dataset · 1 model

slug: claude-sonnet-4

0.782
Best SPS · opinionsqa

Disaggregated subgroup scorecard. Each card below is one published run for this vendor; expand the question-type and demographic-subgroup sections to see the matrix beneath the headline SPS. Where coverage permits, 95% CI bands accompany the point estimate.

opinionsqa raw

raw--claude-sonnet-4--tdefault--tplcurrent--482c42c7

0.782 ± 0.016
SPS · 95% CI [0.663, 0.694] · n = 684
Question-type breakdown (10 topics)
Topic SPS p_dist p_rank p_refuse N
Media & Information 0.778 0.760 0.795 0.993 63
Technology & Digital Life 0.729 0.694 0.764 0.993 26
Health & Science 0.712 0.688 0.735 0.987 47
General Attitudes 0.690 0.664 0.716 0.989 190
International Relations & Security 0.689 0.645 0.733 0.988 149
Trust & Wellbeing 0.649 0.626 0.672 0.995 25
Identity & Demographics 0.642 0.620 0.663 0.985 39
Politics & Governance 0.627 0.580 0.673 0.989 40
Economy & Work 0.617 0.589 0.644 0.991 68
Social Values & Religion 0.569 0.536 0.602 0.990 37
No demographic subgroup breakdown published for this run yet. When conditioned runs land, age / geography / education / party-ID slices appear here with p_dist and coverage.

No demographic conditioning data has been published for this vendor yet. The question-type matrix above shows topic-level parity; subgroup rows fill in once SynthPanel-style conditioned runs land.

← Back to leaderboard