There are 98 examples not solved by any model.
Solving some of these can be a good signal that your model is indeed better than leading models if these are good problems.
astropy__astropy-13033, astropy__astropy-13398, astropy__astropy-13977, astropy__astropy-14369, astropy__astropy-14598, astropy__astropy-8707, astropy__astropy-8872, django__django-10097, django__django-10554, django__django-10999, django__django-11087, django__django-11138, django__django-11239, django__django-11400, django__django-11433, django__django-11477, django__django-11734, django__django-11820, django__django-11885, django__django-12273, django__django-12406, django__django-13195, django__django-13212, django__django-13344, django__django-13513, django__django-13794, django__django-14011, django__django-14034, django__django-14155, django__django-14170, django__django-14315, django__django-14534, django__django-14725, django__django-15098, django__django-15252, django__django-15280, django__django-15503, django__django-15629, django__django-15916, django__django-15957, django__django-16256, django__django-16263, django__django-16502, django__django-16631, django__django-16667, django__django-16950, matplotlib__matplotlib-20488, matplotlib__matplotlib-21568, matplotlib__matplotlib-24177, matplotlib__matplotlib-24870, matplotlib__matplotlib-25479, matplotlib__matplotlib-25960, matplotlib__matplotlib-26208, matplotlib__matplotlib-26466, mwaskom__seaborn-3187, psf__requests-1724, psf__requests-1766, psf__requests-1921, psf__requests-2317, psf__requests-2931, psf__requests-5414, psf__requests-6028, pydata__xarray-4687, pydata__xarray-6992, pydata__xarray-7229, pylint-dev__pylint-4551, pylint-dev__pylint-4604, pylint-dev__pylint-4661, pytest-dev__pytest-10356, pytest-dev__pytest-5840, scikit-learn__scikit-learn-26194, sphinx-doc__sphinx-10435, sphinx-doc__sphinx-10614, sphinx-doc__sphinx-11510, sphinx-doc__sphinx-7462, sphinx-doc__sphinx-7590, sphinx-doc__sphinx-7748, sphinx-doc__sphinx-7985, sphinx-doc__sphinx-8265, sphinx-doc__sphinx-8548, sphinx-doc__sphinx-8638, sphinx-doc__sphinx-9229, sphinx-doc__sphinx-9461, sphinx-doc__sphinx-9602, sympy__sympy-13091, sympy__sympy-13852, sympy__sympy-13878, sympy__sympy-15599, sympy__sympy-16597, sympy__sympy-17630, sympy__sympy-18199, sympy__sympy-18698, sympy__sympy-19040, sympy__sympy-20438, sympy__sympy-21596, sympy__sympy-21612, sympy__sympy-21930, sympy__sympy-22080
| example_link | model | min_pass1_of_model |
|---|
These are 10 problems with the lowest correlation with the overall evaluation (i.e. better models tend to do worse on these. )
| example_link | pass1_of_ex | tau |
|---|---|---|
| scikit-learn__scikit-learn-25232 | 0.868 | -0.359 |
| sympy__sympy-24443 | 0.920 | -0.289 |
| django__django-13023 | 0.109 | -0.251 |
| django__django-13410 | 0.987 | -0.249 |
| django__django-16877 | 0.035 | -0.242 |
| pydata__xarray-4356 | 0.909 | -0.242 |
| sympy__sympy-16450 | 0.987 | -0.239 |
| sphinx-doc__sphinx-10323 | 0.695 | -0.238 |
| matplotlib__matplotlib-22719 | 0.986 | -0.234 |
| django__django-14373 | 0.973 | -0.233 |
Histogram of problems by the accuracy on each problem.
Histogram of problems by the minimum win rate to solve each problem.