lsat_cot: by examples

Results Paper Code

Not solved by any model

There are 1 examples not solved by any model. Solving some of these can be a good signal that your model is indeed better than leading models if these are good problems.
34

Problems solved by 1 model only

example_link	model	min_pass1_of_model

Suspect problems

These are 10 problems with the lowest correlation with the overall evaluation (i.e. better models tend to do worse on these. )

example_link	pass1_of_ex	tau
142	0.069	-0.418
27	0.056	-0.411
33	0.112	-0.403
44	0.107	-0.356
57	0.064	-0.326
238	0.104	-0.322
171	0.036	-0.315
226	0.200	-0.314
144	0.059	-0.292
75	0.091	-0.291

Histogram of accuracies

Histogram of problems by the accuracy on each problem.

Histogram of difficulties

Histogram of problems by the minimum win rate to solve each problem.