There are 1 examples not solved by any model.
Solving some of these can be a good signal that your model is indeed better than leading models if these are good problems.
145
| example_link | model | min_pass1_of_model |
|---|---|---|
| 132 | qwen3-14b | 0.860 |
These are 10 problems with the lowest correlation with the overall evaluation (i.e. better models tend to do worse on these. )
| example_link | pass1_of_ex | tau |
|---|---|---|
| 127 | 0.153 | 0.090 |
| 132 | 0.002 | 0.182 |
| 83 | 0.061 | 0.231 |
| 121 | 0.501 | 0.232 |
| 160 | 0.086 | 0.235 |
| 54 | 0.421 | 0.249 |
| 65 | 0.130 | 0.285 |
| 35 | 0.766 | 0.288 |
| 163 | 0.028 | 0.339 |
| 116 | 0.446 | 0.341 |
Histogram of problems by the accuracy on each problem.
Histogram of problems by the minimum win rate to solve each problem.