leetcode: by models

Home Paper Code

SE predicted by accuracy

The typical standard errors between pairs of models on this dataset as a function of the absolute accuracy.

CDF of question level accuracy

Results table by model

model	pass1	win_rate	count	SE(A)	SE_x(A)	SE_pred(A)
google_gemma_3_27b_it	40.2	38.8	3	3.7	3	2.2
google_gemma_2_27b_it	22.2	21.1	2	3.1	1.9	2.4
google_gemma_3_4b_it	18	17	5	2.9	2.3	1.6
google_gemma_2_9b_it	15	14.1	4	2.7	1.8	2
llama-3.2-3B-instruct	8.94	8.24	10	2.1	2.1	0.51
llama-3.1-8B-instruct	7.78	7.09	7	2	1.9	0.5
google_gemma_3_12b_it	7.5	7.09	4	2	1.4	1.4
google_codegemma_1.1_7b_it	5	4.62	5	1.6	0.71	1.5
google_gemma_7b_it	2.92	2.64	4	1.3	0.82	0.95
llama-3.2-1B-instruct	2.65	2.48	13	1.2	1.2	0.24
mistralai_mixtral_8x22b_instruct_v0.1	2.59	2.4	3	1.2	0.41	1.1
google_gemma_3_1b_it	0.694	0.636	4	0.62	0.22	0.58
qwen2.5-coder-32b-instruct	0.278	0.278	2	0.39	0	0.39
qwen3-32b	0.185	0.176	3	0.32	0	0.32
google_gemma_2b_it	0.139	0.129	4	0.28	0	0.28
deepseek_r1_distill_qwen_14b	0	0	4	0	0	0
deepseek_r1_distill_qwen_1.5b	0	0	4	0	0	0
deepseek_r1_distill_llama_70b	0	0	2	0	0	0
deepseek_r1_distill_llama_8b	0	0	4	0	0	0
deepseek_v2_lite_chat	0	0	3	0	0	0
deepseek_r1_distill_qwen_7b	0	0	4	0	0	0
deepseek_r1_distill_qwen_32b	0	0	2	0	0	0
mistralai_mathstral_7b_v0.1	0	0	4	0	0	0
mistralai_mistral_7b_instruct_v0.2	0	0	4	0	0	0
mistralai_mistral_7b_instruct_v0.3	0	0	4	0	0	0
mistralai_mistral_7b_instruct_v0.1	0	0	4	0	0	0
mistralai_ministral_8b_instruct_2410	0	0	4	0	0	0
qwen1.5-1.8b-chat	0	0	3	0	0	0
qwen1.5-14b-chat	0	0	3	0	0	0
qwen1.5-32b-chat	0	0	3	0	0	0
qwen1.5-72b-chat	0	0	2	0	0	0
qwen1.5-7b-chat	0	0	3	0	0	0
qwen2-0.5b-instruct	0	0	5	0	0	0
mistralai_mixtral_8x7b_instruct_v0.1	0	0	3	0	0	0
qwen1.5-0.5b-chat	0	0	5	0	0	0
qwen2-72b-instruct	0	0	2	0	0	0
qwen2-1.5b-instruct	0	0	4	0	0	0
qwen2-math-72b-instruct	0	0	2	0	0	0
qwen2-7b-instruct	0	0	4	0	0	0
qwen2-math-7b-instruct	0	0	2	0	0	0
qwen2.5-coder-0.5b-instruct	0	0	5	0	0	0
qwen2.5-coder-1.5b-instruct	0	0	4	0	0	0
qwen2-math-1.5b-instruct	0	0	3	0	0	0
qwen2.5-coder-14b-instruct	0	0	3	0	0	0
qwen2.5-coder-3b-instruct	0	0	4	0	0	0
qwen3-0.6b	0	0	5	0	0	0
qwen2.5-coder-7b-instruct	0	0	4	0	0	0
qwen3-1.7b	0	0	4	0	0	0
qwen3-14b	0	0	3	0	0	0
qwen3-4b	0	0	4	0	0	0
qwen3-8b	0	0	3	0	0	0