There are 2 examples not solved by any model.
Solving some of these can be a good signal that your model is indeed better than leading models if these are good problems.
132, 145
| example_link | model | min_pass1_of_model |
|---|
These are 10 problems with the lowest correlation with the overall evaluation (i.e. better models tend to do worse on these. )
| example_link | pass1_of_ex | tau |
|---|---|---|
| 127 | 0.082 | 0.156 |
| 32 | 0.025 | 0.181 |
| 160 | 0.110 | 0.185 |
| 163 | 0.031 | 0.185 |
| 83 | 0.051 | 0.187 |
| 115 | 0.123 | 0.215 |
| 134 | 0.072 | 0.226 |
| 140 | 0.050 | 0.253 |
| 65 | 0.091 | 0.257 |
| 121 | 0.437 | 0.263 |
Histogram of problems by the accuracy on each problem.
Histogram of problems by the minimum win rate to solve each problem.