There are 2 examples not solved by any model.
Solving some of these can be a good signal that your model is indeed better than leading models if these are good problems.
422, 96
| example_link | model | min_pass1_of_model |
|---|
These are 10 problems with the lowest correlation with the overall evaluation (i.e. better models tend to do worse on these. )
| example_link | pass1_of_ex | tau |
|---|---|---|
| 383 | 0.008 | -0.323 |
| 204 | 0.080 | -0.237 |
| 305 | 0.004 | -0.133 |
| 444 | 0.077 | -0.056 |
| 154 | 0.000 | -0.037 |
| 80 | 0.052 | -0.003 |
| 264 | 0.006 | 0.089 |
| 110 | 0.015 | 0.097 |
| 460 | 0.094 | 0.102 |
| 284 | 0.001 | 0.106 |
Histogram of problems by the accuracy on each problem.
Histogram of problems by the minimum win rate to solve each problem.