Raw data: summary.csv
| benchmark_id | size | models | SE(A) | SE_x(A) | SE(A-B) | SE_x(A-B) | corr(A,B) | no_solve | tau- | details |
|---|---|---|---|---|---|---|---|---|---|---|
| swebench-pro | 731 | 150 | 1.6 | 1.3 | 1.5 | 0.25 | 71 | 39 | 18 | models | examples | data | raw |
| swebench-verified | 500 | 150 | 2.2 | 1.8 | 1.9 | 0.22 | 76 | 20 | 24 | models | examples | data | raw |