humaneval+: by examples

Not solved by any model

There are 7 examples not solved by any model. Solving some of these can be a good signal that your model is indeed better than leading models if these are good problems.
HumanEval/129, HumanEval/130, HumanEval/132, HumanEval/145, HumanEval/163, HumanEval/32, HumanEval/91

These are 10 problems with the lowest correlation with the overall evaluation (i.e. better models tend to do worse on these. )

Histogram of problems by the accuracy on each problem.

Histogram of problems by the minimum win rate to solve each problem.