leetcode: by models

Home Paper Code


SE predicted by accuracy

The typical standard errors between pairs of models on this dataset as a function of the absolute accuracy.

CDF of question level accuracy

Results table by model

model pass1 win_rate count SE(A) SE_x(A) SE_pred(A)
google_gemma_3_27b_it 40.2 38.8 3 3.7 3 2.2
google_gemma_2_27b_it 22.2 21.1 2 3.1 1.9 2.4
google_gemma_3_4b_it 18 17 5 2.9 2.3 1.6
google_gemma_2_9b_it 15 14.1 4 2.7 1.8 2
llama-3.2-3B-instruct 8.94 8.24 10 2.1 2.1 0.51
llama-3.1-8B-instruct 7.78 7.09 7 2 1.9 0.5
google_gemma_3_12b_it 7.5 7.09 4 2 1.4 1.4
google_codegemma_1.1_7b_it 5 4.62 5 1.6 0.71 1.5
google_gemma_7b_it 2.92 2.64 4 1.3 0.82 0.95
llama-3.2-1B-instruct 2.65 2.48 13 1.2 1.2 0.24
mistralai_mixtral_8x22b_instruct_v0.1 2.59 2.4 3 1.2 0.41 1.1
google_gemma_3_1b_it 0.694 0.636 4 0.62 0.22 0.58
qwen2.5-coder-32b-instruct 0.278 0.278 2 0.39 0 0.39
qwen3-32b 0.185 0.176 3 0.32 0 0.32
google_gemma_2b_it 0.139 0.129 4 0.28 0 0.28
deepseek_r1_distill_qwen_14b 0 0 4 0 0 0
deepseek_r1_distill_qwen_1.5b 0 0 4 0 0 0
deepseek_r1_distill_llama_70b 0 0 2 0 0 0
deepseek_r1_distill_llama_8b 0 0 4 0 0 0
deepseek_v2_lite_chat 0 0 3 0 0 0
deepseek_r1_distill_qwen_7b 0 0 4 0 0 0
deepseek_r1_distill_qwen_32b 0 0 2 0 0 0
mistralai_mathstral_7b_v0.1 0 0 4 0 0 0
mistralai_mistral_7b_instruct_v0.2 0 0 4 0 0 0
mistralai_mistral_7b_instruct_v0.3 0 0 4 0 0 0
mistralai_mistral_7b_instruct_v0.1 0 0 4 0 0 0
mistralai_ministral_8b_instruct_2410 0 0 4 0 0 0
qwen1.5-1.8b-chat 0 0 3 0 0 0
qwen1.5-14b-chat 0 0 3 0 0 0
qwen1.5-32b-chat 0 0 3 0 0 0
qwen1.5-72b-chat 0 0 2 0 0 0
qwen1.5-7b-chat 0 0 3 0 0 0
qwen2-0.5b-instruct 0 0 5 0 0 0
mistralai_mixtral_8x7b_instruct_v0.1 0 0 3 0 0 0
qwen1.5-0.5b-chat 0 0 5 0 0 0
qwen2-72b-instruct 0 0 2 0 0 0
qwen2-1.5b-instruct 0 0 4 0 0 0
qwen2-math-72b-instruct 0 0 2 0 0 0
qwen2-7b-instruct 0 0 4 0 0 0
qwen2-math-7b-instruct 0 0 2 0 0 0
qwen2.5-coder-0.5b-instruct 0 0 5 0 0 0
qwen2.5-coder-1.5b-instruct 0 0 4 0 0 0
qwen2-math-1.5b-instruct 0 0 3 0 0 0
qwen2.5-coder-14b-instruct 0 0 3 0 0 0
qwen2.5-coder-3b-instruct 0 0 4 0 0 0
qwen3-0.6b 0 0 5 0 0 0
qwen2.5-coder-7b-instruct 0 0 4 0 0 0
qwen3-1.7b 0 0 4 0 0 0
qwen3-14b 0 0 3 0 0 0
qwen3-4b 0 0 4 0 0 0
qwen3-8b 0 0 3 0 0 0