lsat_cot: by examples

Results Paper Code


Not solved by any model

There are 1 examples not solved by any model. Solving some of these can be a good signal that your model is indeed better than leading models if these are good problems.
34

Problems solved by 1 model only

example_link model min_pass1_of_model

Suspect problems

These are 10 problems with the lowest correlation with the overall evaluation (i.e. better models tend to do worse on these. )

example_link pass1_of_ex tau
142 0.069 -0.418
27 0.056 -0.411
33 0.112 -0.403
44 0.107 -0.356
57 0.064 -0.326
238 0.104 -0.322
171 0.036 -0.315
226 0.200 -0.314
144 0.059 -0.292
75 0.091 -0.291

Histogram of accuracies

Histogram of problems by the accuracy on each problem.

Histogram of difficulties

Histogram of problems by the minimum win rate to solve each problem.