ap_cot: by examples

Results Paper Code


Not solved by any model

There are 0 examples not solved by any model. Solving some of these can be a good signal that your model is indeed better than leading models if these are good problems.

Problems solved by 1 model only

example_link model min_pass1_of_model

Suspect problems

These are 10 problems with the lowest correlation with the overall evaluation (i.e. better models tend to do worse on these. )

example_link pass1_of_ex tau
568 0.258 -0.509
598 0.050 -0.470
590 0.130 -0.458
703 0.104 -0.338
558 0.068 -0.299
67 0.035 -0.289
683 0.013 -0.288
195 0.079 -0.215
172 0.170 -0.176
274 0.113 -0.163

Histogram of accuracies

Histogram of problems by the accuracy on each problem.

Histogram of difficulties

Histogram of problems by the minimum win rate to solve each problem.