lsat_cot: by examples

Results Paper Code

Not solved by any model

There are 1 examples not solved by any model. Solving some of these can be a good signal that your model is indeed better than leading models if these are good problems.
34

Problems solved by 1 model only

example_link	model	min_pass1_of_model

Suspect problems

These are 10 problems with the lowest correlation with the overall evaluation (i.e. better models tend to do worse on these. )

example_link	pass1_of_ex	tau
142	0.111	-0.414
112	0.081	-0.391
238	0.140	-0.385
27	0.073	-0.368
171	0.038	-0.364
276	0.140	-0.327
57	0.057	-0.324
33	0.087	-0.300
44	0.129	-0.300
262	0.082	-0.298

Histogram of accuracies

Histogram of problems by the accuracy on each problem.

Histogram of difficulties

Histogram of problems by the minimum win rate to solve each problem.