lsat_cot: by examples

Results Paper Code


Not solved by any model

There are 1 examples not solved by any model. Solving some of these can be a good signal that your model is indeed better than leading models if these are good problems.
34

Problems solved by 1 model only

example_link model min_pass1_of_model

Suspect problems

These are 10 problems with the lowest correlation with the overall evaluation (i.e. better models tend to do worse on these. )

example_link pass1_of_ex tau
142 0.111 -0.414
112 0.081 -0.391
238 0.140 -0.385
27 0.073 -0.368
171 0.038 -0.364
276 0.140 -0.327
57 0.057 -0.324
33 0.087 -0.300
44 0.129 -0.300
262 0.082 -0.298

Histogram of accuracies

Histogram of problems by the accuracy on each problem.

Histogram of difficulties

Histogram of problems by the minimum win rate to solve each problem.