ap_cot: by examples

Results Paper Code


Not solved by any model

There are 0 examples not solved by any model. Solving some of these can be a good signal that your model is indeed better than leading models if these are good problems.

Problems solved by 1 model only

example_link model min_pass1_of_model

Suspect problems

These are 10 problems with the lowest correlation with the overall evaluation (i.e. better models tend to do worse on these. )

example_link pass1_of_ex tau
590 0.125 -0.448
568 0.246 -0.444
703 0.149 -0.392
330 0.131 -0.357
598 0.057 -0.328
195 0.046 -0.246
320 0.147 -0.241
67 0.051 -0.210
683 0.019 -0.165
274 0.084 -0.140

Histogram of accuracies

Histogram of problems by the accuracy on each problem.

Histogram of difficulties

Histogram of problems by the minimum win rate to solve each problem.