Skip to main content

Llama 3.3 70B

3 runs · 3 datasets · 1 model

slug: llama-3-3-70b

0.819
Best SPS · opinionsqa

Disaggregated subgroup scorecard. Each card below is one published run for this vendor; expand the question-type and demographic-subgroup sections to see the matrix beneath the headline SPS. Where coverage permits, 95% CI bands accompany the point estimate.

globalopinionqa raw

raw--llama-3.3-70b-instruct--tdefault--tplcurrent--f2748030

0.762 ± 0.044
SPS · 95% CI [0.607, 0.695] · n = 100
Question-type breakdown (7 topics)
Topic SPS p_dist p_rank p_refuse N
Health & Science 0.859 0.791 0.927 1.000 2
Technology & Digital Life 0.853 0.803 0.903 0.968 3
Politics & Governance 0.698 0.643 0.754 0.972 22
General Attitudes 0.644 0.666 0.623 1.000 11
International Relations & Security 0.640 0.636 0.644 0.980 50
Economy & Work 0.626 0.661 0.592 0.967 5
Trust & Wellbeing 0.497 0.422 0.572 0.980 7
No demographic subgroup breakdown published for this run yet. When conditioned runs land, age / geography / education / party-ID slices appear here with p_dist and coverage.

opinionsqa raw

raw--llama-3.3-70b-instruct--tdefault--tplcurrent--6480fa3c

0.819 ± 0.010
SPS · 95% CI [0.723, 0.743] · n = 684
Question-type breakdown (10 topics)
Topic SPS p_dist p_rank p_refuse N
Health & Science 0.794 0.753 0.835 0.987 47
Trust & Wellbeing 0.752 0.700 0.803 0.995 25
Media & Information 0.742 0.710 0.775 0.993 63
Economy & Work 0.740 0.705 0.775 0.991 68
Identity & Demographics 0.734 0.659 0.809 0.985 39
Social Values & Religion 0.731 0.686 0.777 0.990 37
General Attitudes 0.726 0.705 0.747 0.989 190
Politics & Governance 0.724 0.661 0.787 0.989 40
International Relations & Security 0.722 0.667 0.777 0.988 149
Technology & Digital Life 0.712 0.681 0.743 0.993 26
No demographic subgroup breakdown published for this run yet. When conditioned runs land, age / geography / education / party-ID slices appear here with p_dist and coverage.

subpop raw

raw--llama-3.3-70b-instruct--tdefault--tplcurrent--670d96d8

0.796 ± 0.021
SPS · 95% CI [0.683, 0.726] · n = 200
Question-type breakdown (9 topics)
Topic SPS p_dist p_rank p_refuse N
Trust & Wellbeing 0.875 0.795 0.954 0.988 2
Health & Science 0.807 0.813 0.800 0.994 5
Economy & Work 0.717 0.658 0.776 0.991 17
Social Values & Religion 0.715 0.648 0.783 0.987 36
International Relations & Security 0.711 0.662 0.761 0.991 33
General Attitudes 0.704 0.667 0.741 0.913 37
Politics & Governance 0.704 0.644 0.763 0.983 22
Technology & Digital Life 0.675 0.631 0.720 0.993 47
Identity & Demographics 0.631 0.487 0.774 0.995 1
No demographic subgroup breakdown published for this run yet. When conditioned runs land, age / geography / education / party-ID slices appear here with p_dist and coverage.

No demographic conditioning data has been published for this vendor yet. The question-type matrix above shows topic-level parity; subgroup rows fill in once SynthPanel-style conditioned runs land.

← Back to leaderboard