There are 0 examples not solved by any model.
Solving some of these can be a good signal that your model is indeed better than leading models if these are good problems.
| example_link | model | min_pass1_of_model |
|---|
These are 10 problems with the lowest correlation with the overall evaluation (i.e. better models tend to do worse on these. )
| example_link | pass1_of_ex | tau |
|---|---|---|
| 590 | 0.125 | -0.448 |
| 568 | 0.246 | -0.444 |
| 703 | 0.149 | -0.392 |
| 330 | 0.131 | -0.357 |
| 598 | 0.057 | -0.328 |
| 195 | 0.046 | -0.246 |
| 320 | 0.147 | -0.241 |
| 67 | 0.051 | -0.210 |
| 683 | 0.019 | -0.165 |
| 274 | 0.084 | -0.140 |
Histogram of problems by the accuracy on each problem.
Histogram of problems by the minimum win rate to solve each problem.