There are 1 examples not solved by any model.
Solving some of these can be a good signal that your model is indeed better than leading models if these are good problems.
34
| example_link | model | min_pass1_of_model |
|---|
These are 10 problems with the lowest correlation with the overall evaluation (i.e. better models tend to do worse on these. )
| example_link | pass1_of_ex | tau |
|---|---|---|
| 142 | 0.069 | -0.418 |
| 27 | 0.056 | -0.411 |
| 33 | 0.112 | -0.403 |
| 44 | 0.107 | -0.356 |
| 57 | 0.064 | -0.326 |
| 238 | 0.104 | -0.322 |
| 171 | 0.036 | -0.315 |
| 226 | 0.200 | -0.314 |
| 144 | 0.059 | -0.292 |
| 75 | 0.091 | -0.291 |
Histogram of problems by the accuracy on each problem.
Histogram of problems by the minimum win rate to solve each problem.