lcb_codegen_v5: by examples

Results Paper Code


Not solved by any model

There are 56 examples not solved by any model. Solving some of these can be a good signal that your model is indeed better than leading models if these are good problems.
atcoder.abc301_f, atcoder.abc311_c, atcoder.abc314_e, atcoder.abc315_e, atcoder.abc315_f, atcoder.abc319_c, atcoder.abc324_f, atcoder.abc327_e, atcoder.abc333_e, atcoder.abc337_e, atcoder.abc338_f, atcoder.abc343_a, atcoder.abc343_e, atcoder.abc350_c, atcoder.abc350_e, atcoder.abc355_e, atcoder.abc359_c, atcoder.abc359_e, atcoder.abc362_c, atcoder.abc363_f, atcoder.abc371_f, atcoder.abc372_f, atcoder.abc373_g, atcoder.abc374_d, atcoder.abc374_g, atcoder.abc375_b, atcoder.abc375_f, atcoder.abc376_f, atcoder.abc376_g, atcoder.abc378_g, atcoder.abc382_g, atcoder.abc385_f, atcoder.arc181_a, atcoder.arc181_c, atcoder.arc181_d, atcoder.arc182_d, atcoder.arc182_e, atcoder.arc183_b, atcoder.arc183_c, atcoder.arc183_d, atcoder.arc184_c, atcoder.arc184_d, atcoder.arc185_c, atcoder.arc186_a, atcoder.arc186_b, atcoder.arc186_c, atcoder.arc186_d, atcoder.arc186_e, atcoder.arc187_b, atcoder.arc188_c, atcoder.arc189_a, atcoder.arc189_b, leetcode.3211, leetcode.3327, leetcode.3584, leetcode.3638

Problems solved by 1 model only

example_link model min_pass1_of_model
atcoder.arc183_a Kimi-k1.6-IOI-high 0.860
atcoder.abc325_d O1-2024-12-17 (High) 0.832
atcoder.arc185_d O1-2024-12-17 (High) 0.832
leetcode.3688 O1-2024-12-17 (High) 0.832
atcoder.arc182_a O1-2024-12-17 (High) 0.832
atcoder.abc354_d O1-2024-12-17 (High) 0.832
atcoder.arc184_e DeepSeek-R1-Preview 0.779
leetcode.3551 DeepSeek-R1-Preview 0.779
atcoder.abc364_f Llama-3_1-Nemotron-Ultra-253B-v1 0.777
atcoder.abc366_g DeepCoder-14B-Preview 0.733
atcoder.abc368_g DeepCoder-14B-Preview 0.733
atcoder.abc367_g DeepCoder-14B-Preview 0.733
atcoder.abc373_e DeepCoder-14B-Preview 0.733
atcoder.abc370_f DeepCoder-14B-Preview 0.733
atcoder.abc373_f DeepCoder-14B-Preview 0.733
atcoder.abc370_g DeepCoder-14B-Preview 0.733
atcoder.abc372_g DeepCoder-14B-Preview 0.733
leetcode.3478 DeepCoder-14B-Preview 0.733
leetcode.3562 O1-Preview-2024-09-12 0.556
atcoder.arc188_d DeepSeek-V3 copy 0.545
leetcode.3344 DeepSeek-V3 copy 0.545
leetcode.3233 Claude-3.5-Sonnet-20240620 0.480

Suspect problems

These are 10 problems with the lowest correlation with the overall evaluation (i.e. better models tend to do worse on these. )

example_link pass1_of_ex tau
codeforces.1883_B 0.188 -0.294
atcoder.abc379_f 0.183 -0.244
atcoder.abc384_f 0.571 -0.170
leetcode.3233 0.012 -0.138
leetcode.3347 0.958 -0.138
atcoder.abc372_a 0.917 -0.091
leetcode.3344 0.008 -0.063
atcoder.arc188_d 0.004 -0.063
leetcode.3261 0.008 -0.054
leetcode.3562 0.042 -0.038

Histogram of accuracies

Histogram of problems by the accuracy on each problem.

Histogram of difficulties

Histogram of problems by the minimum win rate to solve each problem.