swebench-pro: by examples

Results Paper Code


Not solved by any model

There are 285 examples not solved by any model. Solving some of these can be a good signal that your model is indeed better than leading models if these are good problems.
instance_NodeBB__NodeBB-00c70ce7b0541cfc94afe567921d7668cdc8f4ac-vnan, instance_NodeBB__NodeBB-18c45b44613aecd53e9f60457b9812049ab2998d-v0495b863a912fbff5749c67e860612b91825407c, instance_NodeBB__NodeBB-2657804c1fb6b84dc76ad3b18ecf061aaab5f29f-vf2cf3cbd463b7ad942381f1c6d077626485a1e9e, instance_NodeBB__NodeBB-445b70deda20201b7d9a68f7224da751b3db728c-v4fbcfae8b15e4ce5d132c408bca69ebb9cf146ed, instance_NodeBB__NodeBB-51d8f3b195bddb13a13ddc0de110722774d9bb1b-vf2cf3cbd463b7ad942381f1c6d077626485a1e9e, instance_NodeBB__NodeBB-76c6e30282906ac664f2c9278fc90999b27b1f48-vd59a5728dfc977f44533186ace531248c2917516, instance_NodeBB__NodeBB-8168c6c40707478f71b8af60300830fe554c778c-vf2cf3cbd463b7ad942381f1c6d077626485a1e9e, instance_NodeBB__NodeBB-84dfda59e6a0e8a77240f939a7cb8757e6eaf945-v2c59007b1005cd5cd14cbb523ca5229db1fd2dd8, instance_NodeBB__NodeBB-9c576a0758690f45a6ca03b5884c601e473bf2c1-vd59a5728dfc977f44533186ace531248c2917516, instance_NodeBB__NodeBB-a917210c5b2c20637094545401f85783905c074c-vf2cf3cbd463b7ad942381f1c6d077626485a1e9e, instance_NodeBB__NodeBB-be43cd25974681c9743d424238b7536c357dc8d3-vf2cf3cbd463b7ad942381f1c6d077626485a1e9e, instance_NodeBB__NodeBB-f1a80d48cc45877fcbadf34c2345dd9709722c7f-v4fbcfae8b15e4ce5d132c408bca69ebb9cf146ed, instance_NodeBB__NodeBB-f2082d7de85eb62a70819f4f3396dd85626a0c0a-vd59a5728dfc977f44533186ace531248c2917516, instance_NodeBB__NodeBB-f48ed3658aab7be0f1165d4c1f89af48d7865189-v0495b863a912fbff5749c67e860612b91825407c, instance_ansible__ansible-11c1777d56664b1acb56b387a1ad6aeadef1391d-v0f01c69f1e2528b935359cfe578530722bca2c59, instance_ansible__ansible-1c06c46cc14324df35ac4f39a45fb3ccd602195d-v0f01c69f1e2528b935359cfe578530722bca2c59, instance_ansible__ansible-34db57a47f875d11c4068567b9ec7ace174ec4cf-v1055803c3a812189a1133297f7f5468579283f86, instance_ansible__ansible-3889ddeb4b780ab4bac9ca2e75f8c1991bcabe83-v0f01c69f1e2528b935359cfe578530722bca2c59, instance_ansible__ansible-39bd8b99ec8c6624207bf3556ac7f9626dad9173-v1055803c3a812189a1133297f7f5468579283f86, instance_ansible__ansible-40ade1f84b8bb10a63576b0ac320c13f57c87d34-v6382ea168a93d80a64aab1fbd8c4f02dc5ada5bf, instance_ansible__ansible-42355d181a11b51ebfc56f6f4b3d9c74e01cb13b-v1055803c3a812189a1133297f7f5468579283f86, instance_ansible__ansible-4c5ce5a1a9e79a845aff4978cfeb72a0d4ecf7d6-v1055803c3a812189a1133297f7f5468579283f86, instance_ansible__ansible-622a493ae03bd5e5cf517d336fc426e9d12208c7-v906c969b551b346ef54a2c0b41e04f632b7b73c2, instance_ansible__ansible-6cc97447aac5816745278f3735af128afb255c81-v0f01c69f1e2528b935359cfe578530722bca2c59, instance_ansible__ansible-77658704217d5f166404fc67997203c25381cb6e-v390e508d27db7a51eece36bb6d9698b63a5b638a, instance_ansible__ansible-7e1a347695c7987ae56ef1b6919156d9254010ad-v390e508d27db7a51eece36bb6d9698b63a5b638a, instance_ansible__ansible-811093f0225caa4dd33890933150a81c6a6d5226-v1055803c3a812189a1133297f7f5468579283f86, instance_ansible__ansible-83909bfa22573777e3db5688773bda59721962ad-vba6da65a0f3baefda7a058ebbd0a8dcafb8512f5, instance_ansible__ansible-83fb24b923064d3576d473747ebbe62e4535c9e3-vba6da65a0f3baefda7a058ebbd0a8dcafb8512f5, instance_ansible__ansible-935528e22e5283ee3f63a8772830d3d01f55ed8c-vba6da65a0f3baefda7a058ebbd0a8dcafb8512f5, instance_ansible__ansible-949c503f2ef4b2c5d668af0492a5c0db1ab86140-v0f01c69f1e2528b935359cfe578530722bca2c59, instance_ansible__ansible-a1569ea4ca6af5480cf0b7b3135f5e12add28a44-v0f01c69f1e2528b935359cfe578530722bca2c59, instance_ansible__ansible-b6290e1d156af608bd79118d209a64a051c55001-v390e508d27db7a51eece36bb6d9698b63a5b638a, instance_ansible__ansible-b748edea457a4576847a10275678127895d2f02f-v1055803c3a812189a1133297f7f5468579283f86, instance_ansible__ansible-bec27fb4c0a40c5f8bbcf26a475704227d65ee73-v30a923fb5c164d6cd18280c02422f75e611e8fb2, instance_ansible__ansible-c1f2df47538b884a43320f53e787197793b105e8-v906c969b551b346ef54a2c0b41e04f632b7b73c2, instance_ansible__ansible-c616e54a6e23fa5616a1d56d243f69576164ef9b-v1055803c3a812189a1133297f7f5468579283f86, instance_ansible__ansible-cd473dfb2fdbc97acf3293c134b21cbbcfa89ec3-vba6da65a0f3baefda7a058ebbd0a8dcafb8512f5, instance_ansible__ansible-d33bedc48fdd933b5abd65a77c081876298e2f07-v0f01c69f1e2528b935359cfe578530722bca2c59, instance_ansible__ansible-d72025be751c894673ba85caa063d835a0ad3a8c-v390e508d27db7a51eece36bb6d9698b63a5b638a, instance_ansible__ansible-de5858f48dc9e1ce9117034e0d7e76806f420ca8-v1055803c3a812189a1133297f7f5468579283f86, instance_ansible__ansible-e40889e7112ae00a21a2c74312b330e67a766cc0-v1055803c3a812189a1133297f7f5468579283f86, instance_ansible__ansible-e64c6c1ca50d7d26a8e7747d8eb87642e767cd74-v0f01c69f1e2528b935359cfe578530722bca2c59, instance_ansible__ansible-ecea15c508f0e081525be036cf76bbb56dbcdd9d-vba6da65a0f3baefda7a058ebbd0a8dcafb8512f5, instance_ansible__ansible-eea46a0d1b99a6dadedbb6a3502d599235fa7ec3-v390e508d27db7a51eece36bb6d9698b63a5b638a, instance_ansible__ansible-f02a62db509dc7463fab642c9c3458b9bc3476cc-v390e508d27db7a51eece36bb6d9698b63a5b638a, instance_ansible__ansible-f86c58e2d235d8b96029d102c71ee2dfafd57997-v0f01c69f1e2528b935359cfe578530722bca2c59, instance_element-hq__element-web-1216285ed2e82e62f8780b6702aa0f9abdda0b34-vnan, instance_element-hq__element-web-33299af5c9b7a7ec5a9c31d578d4ec5b18088fb7-vnan, instance_element-hq__element-web-44b98896a79ede48f5ad7ff22619a39d5f6ff03c-vnan, instance_element-hq__element-web-459df4583e01e4744a52d45446e34183385442d6-vnan, instance_element-hq__element-web-4fec436883b601a3cac2d4a58067e597f737b817-vnan, instance_element-hq__element-web-53a9b6447bd7e6110ee4a63e2ec0322c250f08d1-vnan, instance_element-hq__element-web-56c7fc1948923b4b3f3507799e725ac16bcf8018-vnan, instance_element-hq__element-web-5dfde12c1c1c0b6e48f17e3405468593e39d9492-vnan, instance_element-hq__element-web-66d0b318bc6fee0d17b54c1781d6ab5d5d323135-vnan, instance_element-hq__element-web-75c2c1a572fa45d1ea1d1a96e9e36e303332ecaa-vnan, instance_element-hq__element-web-776ffa47641c7ec6d142ab4a47691c30ebf83c2e, instance_element-hq__element-web-ad26925bb6628260cfe0fcf90ec0a8cba381f4a4-vnan, instance_element-hq__element-web-aeabf3b18896ac1eb7ae9757e66ce886120f8309-vnan, instance_element-hq__element-web-ce554276db97b9969073369fefa4950ca8e54f84-vnan, instance_element-hq__element-web-cf3c899dd1f221aa1a1f4c5a80dffc05b9c21c85-vnan, instance_element-hq__element-web-d06cf09bf0b3d4a0fbe6bd32e4115caea2083168-vnan, instance_element-hq__element-web-e15ef9f3de36df7f318c083e485f44e1de8aad17, instance_element-hq__element-web-f63160f38459fb552d00fcc60d4064977a9095a6-vnan, instance_element-hq__element-web-fe14847bb9bb07cab1b9c6c54335ff22ca5e516a-vnan, instance_flipt-io__flipt-02e21636c58e86c51119b63e0fb5ca7b813b07b1, instance_flipt-io__flipt-0fd09def402258834b9d6c0eaa6d3b4ab93b4446, instance_flipt-io__flipt-1737085488ecdcd3299c8e61af45a8976d457b7e, instance_flipt-io__flipt-2ca5dfb3513e4e786d2b037075617cccc286d5c3, instance_flipt-io__flipt-2ce8a0331e8a8f63f2c1b555db8277ffe5aa2e63, instance_flipt-io__flipt-36e62baffae2132f78f9d34dc300a9baa2d7ae0e, instance_flipt-io__flipt-381b90f718435c4694380b5fcd0d5cf8e3b5a25a, instance_flipt-io__flipt-3d5a345f94c2adc8a0eaa102c189c08ad4c0f8e8, instance_flipt-io__flipt-40007b9d97e3862bcef8c20ae6c87b22ea0627f0, instance_flipt-io__flipt-406f9396ad65696d58865b3a6283109cd4eaf40e, instance_flipt-io__flipt-492cc0b158200089dceede3b1aba0ed28df3fb1d, instance_flipt-io__flipt-518ec324b66a07fdd95464a5e9ca5fe7681ad8f9, instance_flipt-io__flipt-524f277313606f8cd29b299617d6565c01642e15, instance_flipt-io__flipt-56a620b8fc9ef7a0819b47709aa541cdfdbba00b, instance_flipt-io__flipt-5aef5a14890aa145c22d864a834694bae3a6f112, instance_flipt-io__flipt-5af0757e96dec4962a076376d1bedc79de0d4249, instance_flipt-io__flipt-5c7037ececb0bead0a8eb56054e224bcd7ac5922, instance_flipt-io__flipt-65581fef4aa807540cb933753d085feb0d7e736f, instance_flipt-io__flipt-6fd0f9e2587f14ac1fdd1c229f0bcae0468c8daa, instance_flipt-io__flipt-756f00f79ba8abf9fe53f3c6c818123b42eb7355, instance_flipt-io__flipt-84806a178447e766380cc66b14dee9c6eeb534f4, instance_flipt-io__flipt-86906cbfc3a5d3629a583f98e6301142f5f14bdb-v6bea0cc3a6fc532d7da914314f2944fc1cd04dee, instance_flipt-io__flipt-8bd3604dc54b681f1f0f7dd52cbc70b3024184b6, instance_flipt-io__flipt-96820c3ad10b0b2305e8877b6b303f7fafdf815f, instance_flipt-io__flipt-9d25c18b79bc7829a6fb08ec9e8793d5d17e2868, instance_flipt-io__flipt-9f8127f225a86245fa35dca4885c2daef824ee55, instance_flipt-io__flipt-a0cbc0cb65ae601270bdbe3f5313e2dfd49c80e4, instance_flipt-io__flipt-a42d38a1bb1df267c53d9d4a706cf34825ae3da9, instance_flipt-io__flipt-af7a0be46d15f0b63f16a868d13f3b48a838e7ce, instance_flipt-io__flipt-b2cd6a6dd73ca91b519015fd5924fde8d17f3f06, instance_flipt-io__flipt-b3cd920bbb25e01fdb2dab66a5a913363bc62f6c, instance_flipt-io__flipt-b433bd05ce405837804693bebd5f4b88d87133c8, instance_flipt-io__flipt-b6cef5cdc0daff3ee99e5974ed60a1dc6b4b0d67, instance_flipt-io__flipt-c154dd1a3590954dfd3b901555fc6267f646a289, instance_flipt-io__flipt-c1728053367c753688f114ec26e703c8fdeda125, instance_flipt-io__flipt-c188284ff0c094a4ee281afebebd849555ebee59, instance_flipt-io__flipt-c1fd7a81ef9f23e742501bfb26d914eb683262aa, instance_flipt-io__flipt-c6a7b1fd933e763b1675281b30077e161fa115a1, instance_flipt-io__flipt-c8d71ad7ea98d97546f01cce4ccb451dbcf37d3b, instance_flipt-io__flipt-cd18e54a0371fa222304742c6312e9ac37ea86c1, instance_flipt-io__flipt-cd2f3b0a9d4d8b8a6d3d56afab65851ecdc408e8, instance_flipt-io__flipt-dae029cba7cdb98dfb1a6b416c00d324241e6063, instance_flipt-io__flipt-e2bd19dafa7166c96b082fb2a59eb54b4be0d778, instance_flipt-io__flipt-e50808c03e4b9d25a6a78af9c61a3b1616ea356b, instance_flipt-io__flipt-e594593dae52badf80ffd27878d2275c7f0b20e9, instance_flipt-io__flipt-e91615cf07966da41756017a7d571f9fc0fdbe80, instance_flipt-io__flipt-ea9a2663b176da329b3f574da2ce2a664fc5b4a1, instance_flipt-io__flipt-ebb3f84c74d61eee4d8c6875140b990eee62e146, instance_flipt-io__flipt-ee02b164f6728d3227c42671028c67a4afd36918, instance_flipt-io__flipt-f1bc91a1b999656dbdb2495ccb57bf2105b84920, instance_flipt-io__flipt-f36bd61fb1cee4669de1f00e59da462bfeae8765, instance_flipt-io__flipt-f808b4dd6e36b9dc8b011eb26b196f4e2cc64c41, instance_future-architect__vuls-030b2e03525d68d74cb749959aac2d7f3fc0effa, instance_future-architect__vuls-1832b4ee3a20177ad313d806983127cb6e53f5cf, instance_future-architect__vuls-3c1489e588dacea455ccf4c352a3b1006902e2d4, instance_future-architect__vuls-3f8de0268376e1f0fa6d9d61abb0d9d3d580ea7d, instance_future-architect__vuls-4c04acbd9ea5b073efe999e33381fa9f399d6f27, instance_future-architect__vuls-5af1a227339e46c7abf3f2815e4c636a0c01098e, instance_future-architect__vuls-6eff6a9329a65cc412e79b8f82444dfa3d0f0b5a, instance_future-architect__vuls-78b52d6a7f480bd610b692de9bf0c86f57332f23, instance_future-architect__vuls-83bcca6e669ba2e4102f26c4a2b52f78c7861f1a, instance_future-architect__vuls-86b60e1478e44d28b1aff6b9ac7e95ceb05bc5fc, instance_future-architect__vuls-878c25bf5a9c9fd88ac32eb843f5636834d5712d, instance_future-architect__vuls-abd80417728b16c6502067914d27989ee575f0ee, instance_future-architect__vuls-ad2edbb8448e2c41a097f1c0b52696c0f6c5924d, instance_future-architect__vuls-bff6b7552370b55ff76d474860eead4ab5de785a-v1151a6325649aaf997cd541ebe533b53fddf1b07, instance_future-architect__vuls-ca3f6b1dbf2cd24d1537bfda43e788443ce03a0c, instance_future-architect__vuls-d18e7a751d07260d75ce3ba0cd67c4a6aebfd967, instance_future-architect__vuls-e1fab805afcfc92a2a615371d0ec1e667503c254-v264a82e2f4818e30f5a25e4da53b27ba119f62b5, instance_future-architect__vuls-e3c27e1817d68248043bd09d63cc31f3344a6f2c, instance_future-architect__vuls-e6c0da61324a0c04026ffd1c031436ee2be9503a, instance_future-architect__vuls-ef2be3d6ea4c0a13674aaab08b182eca4e2b9a17-v264a82e2f4818e30f5a25e4da53b27ba119f62b5, instance_future-architect__vuls-fd18df1dd4e4360f8932bc4b894bd8b40d654e7c, instance_gravitational__teleport-005dcb16bacc6a5d5890c4cd302ccfd4298e275d-vee9b09fb20c43af7e520f57e9239bbcf46b7113d, instance_gravitational__teleport-0ac7334939981cf85b9591ac295c3816954e287e, instance_gravitational__teleport-1316e6728a3ee2fc124e2ea0cc6a02044c87a144-v626ec2a48416b10a88641359a169d99e935ff037, instance_gravitational__teleport-1330415d33a27594c948a36d9d7701f496229e9f, instance_gravitational__teleport-2b15263e49da5625922581569834eec4838a9257-vee9b09fb20c43af7e520f57e9239bbcf46b7113d, instance_gravitational__teleport-2bb3bbbd8aff1164a2353381cb79e1dc93b90d28-vee9b09fb20c43af7e520f57e9239bbcf46b7113d, instance_gravitational__teleport-3587cca7840f636489449113969a5066025dd5bf, instance_gravitational__teleport-3fa6904377c006497169945428e8197158667910-v626ec2a48416b10a88641359a169d99e935ff037, instance_gravitational__teleport-46aa81b1ce96ebb4ebed2ae53fd78cd44a05da6c-vee9b09fb20c43af7e520f57e9239bbcf46b7113d, instance_gravitational__teleport-4d0117b50dc8cdb91c94b537a4844776b224cd3d, instance_gravitational__teleport-4e1c39639edf1ab494dd7562844c8b277b5cfa18-vee9b09fb20c43af7e520f57e9239bbcf46b7113d, instance_gravitational__teleport-4f771403dc4177dc26ee0370f7332f3fe54bee0f-vee9b09fb20c43af7e520f57e9239bbcf46b7113d, instance_gravitational__teleport-59d39dee5a8a66e5b8a18a9085a199d369b1fba8-v626ec2a48416b10a88641359a169d99e935ff037, instance_gravitational__teleport-5dca072bb4301f4579a15364fcf37cc0c39f7f6c, instance_gravitational__teleport-629dc432eb191ca479588a8c49205debb83e80e2, instance_gravitational__teleport-6a14edcf1ff010172fdbac622d0a474ed6af46de, instance_gravitational__teleport-6eaaf3a27e64f4ef4ef855bd35d7ec338cf17460-v626ec2a48416b10a88641359a169d99e935ff037, instance_gravitational__teleport-7744f72c6eb631791434b648ba41083b5f6d2278-vce94f93ad1030e3136852817f2423c1b3ac37bc4, instance_gravitational__teleport-78b0d8c72637df1129fb6ff84fc49ef4b5ab1288, instance_gravitational__teleport-82185f232ae8974258397e121b3bc2ed0c3729ed-v626ec2a48416b10a88641359a169d99e935ff037, instance_gravitational__teleport-8302d467d160f869b77184e262adbe2fbc95d9ba-vce94f93ad1030e3136852817f2423c1b3ac37bc4, instance_gravitational__teleport-87a593518b6ce94624f6c28516ce38cc30cbea5a, instance_gravitational__teleport-a95b3ae0667f9e4b2404bf61f51113e6d83f01cd, instance_gravitational__teleport-b4e7cd3a5e246736d3fe8d6886af55030b232277, instance_gravitational__teleport-b5d8169fc0a5e43fee2616c905c6d32164654dc6, instance_gravitational__teleport-ba6c4a135412c4296dd5551bd94042f0dc024504-v626ec2a48416b10a88641359a169d99e935ff037, instance_gravitational__teleport-baeb2697c4e4870c9850ff0cd5c7a2d08e1401c9-vee9b09fb20c43af7e520f57e9239bbcf46b7113d, instance_gravitational__teleport-bb562408da4adeae16e025be65e170959d1ec492-vee9b09fb20c43af7e520f57e9239bbcf46b7113d, instance_gravitational__teleport-c1b1c6a1541c478d7777a48fca993cc8206c73b9, instance_gravitational__teleport-c335534e02de143508ebebc7341021d7f8656e8f, instance_gravitational__teleport-c782838c3a174fdff80cafd8cd3b1aa4dae8beb2, instance_gravitational__teleport-d6ffe82aaf2af1057b69c61bf9df777f5ab5635a-vee9b09fb20c43af7e520f57e9239bbcf46b7113d, instance_gravitational__teleport-d873ea4fa67d3132eccba39213c1ca2f52064dcc-vce94f93ad1030e3136852817f2423c1b3ac37bc4, instance_gravitational__teleport-db89206db6c2969266e664c7c0fb51b70e958b64, instance_gravitational__teleport-e6681abe6a7113cfd2da507f05581b7bdf398540-v626ec2a48416b10a88641359a169d99e935ff037, instance_gravitational__teleport-eda668c30d9d3b56d9c69197b120b01013611186, instance_gravitational__teleport-f432a71a13e698b6e1c4672a2e9e9c1f32d35c12, instance_gravitational__teleport-fd2959260ef56463ad8afa4c973f47a50306edd4, instance_internetarchive__openlibrary-03095f2680f7516fca35a58e665bf2a41f006273-v8717e18970bcdc4e0d2cea3b1527752b21e74866, instance_internetarchive__openlibrary-08ac40d050a64e1d2646ece4959af0c42bf6b7b5-v0f5aece3601a5b4419f7ccec1dbda2071be28ee4, instance_internetarchive__openlibrary-0a90f9f0256e4f933523e9842799e39f95ae29ce-v76304ecdb3a5954fcf13feb710e8c40fcf24b73c, instance_internetarchive__openlibrary-111347e9583372e8ef91c82e0612ea437ae3a9c9-v2d9a6c849c60ed19fd0858ce9e40b7cc8e097e59, instance_internetarchive__openlibrary-11838fad1028672eb975c79d8984f03348500173-v0f5aece3601a5b4419f7ccec1dbda2071be28ee4, instance_internetarchive__openlibrary-30bc73a1395fba2300087c7f307e54bb5372b60a-v76304ecdb3a5954fcf13feb710e8c40fcf24b73c, instance_internetarchive__openlibrary-3aeec6afed9198d734b7ee1293f03ca94ff970e1-v13642507b4fc1f8d234172bf8129942da2c2ca26, instance_internetarchive__openlibrary-5fb312632097be7e9ac6ab657964af115224d15d-v0f5aece3601a5b4419f7ccec1dbda2071be28ee4, instance_internetarchive__openlibrary-6fdbbeee4c0a7e976ff3e46fb1d36f4eb110c428-v08d8e8889ec945ab821fb156c04c7d2e2810debb, instance_internetarchive__openlibrary-77c16d530b4d5c0f33d68bead2c6b329aee9b996-ve8c8d62a2b60610a3c4631f5f23ed866bada9818, instance_internetarchive__openlibrary-7edd1ef09d91fe0b435707633c5cc9af41dedddf-v76304ecdb3a5954fcf13feb710e8c40fcf24b73c, instance_internetarchive__openlibrary-8a9d9d323dfcf2a5b4f38d70b1108b030b20ebf3-v13642507b4fc1f8d234172bf8129942da2c2ca26, instance_internetarchive__openlibrary-910b08570210509f3bcfebf35c093a48243fe754-v0f5aece3601a5b4419f7ccec1dbda2071be28ee4, instance_internetarchive__openlibrary-9bdfd29fac883e77dcbc4208cab28c06fd963ab2-v76304ecdb3a5954fcf13feb710e8c40fcf24b73c, instance_internetarchive__openlibrary-9cd47f4dc21e273320d9e30d889c864f8cb20ccf-v0f5aece3601a5b4419f7ccec1dbda2071be28ee4, instance_internetarchive__openlibrary-acdddc590d0b3688f8f6386f43709049622a6e19-vfa6ff903cb27f336e17654595dd900fa943dcd91, instance_internetarchive__openlibrary-b67138b316b1e9c11df8a4a8391fe5cc8e75ff9f-ve8c8d62a2b60610a3c4631f5f23ed866bada9818, instance_internetarchive__openlibrary-bb152d23c004f3d68986877143bb0f83531fe401-ve8c8d62a2b60610a3c4631f5f23ed866bada9818, instance_internetarchive__openlibrary-dbbd9d539c6d4fd45d5be9662aa19b6d664b5137-v08d8e8889ec945ab821fb156c04c7d2e2810debb, instance_internetarchive__openlibrary-f8cc11d9c1575fdba5ac66aee0befca970da8d64-v13642507b4fc1f8d234172bf8129942da2c2ca26, instance_navidrome__navidrome-0488fb92cb02a82924fb1181bf1642f2e87096db, instance_navidrome__navidrome-27875ba2dd1673ddf8affca526b0664c12c3b98b, instance_navidrome__navidrome-28389fb05e1523564dfc61fa43ed8eb8a10f938c, instance_navidrome__navidrome-3853c3318f67b41a9e4cb768618315ff77846fdb, instance_navidrome__navidrome-3972616585e82305eaf26aa25697b3f5f3082288, instance_navidrome__navidrome-3982ba725883e71d4e3e618c61d5140eeb8d850a, instance_navidrome__navidrome-55730514ea59d5f1d0b8e3f8745569c29bdbf7b4, instance_navidrome__navidrome-56303cde23a4122d2447cbb266f942601a78d7e4, instance_navidrome__navidrome-5e549255201e622c911621a7b770477b1f5a89be, instance_navidrome__navidrome-69e0a266f48bae24a11312e9efbe495a337e4c84, instance_navidrome__navidrome-6bd4c0f6bfa653e9b8b27cfdc2955762d371d6e9, instance_navidrome__navidrome-7073d18b54da7e53274d11c9e2baef1242e8769e, instance_navidrome__navidrome-874b17b8f614056df0ef021b5d4f977341084185, instance_navidrome__navidrome-89b12b34bea5687c70e4de2109fd1e7330bb2ba2, instance_navidrome__navidrome-8d56ec898e776e7e53e352cb9b25677975787ffc, instance_navidrome__navidrome-b3980532237e57ab15b2b93c49d5cd5b2d050013, instance_navidrome__navidrome-b65e76293a917ee2dfc5d4b373b1c62e054d0dca, instance_navidrome__navidrome-c90468b895f6171e33e937ff20dc915c995274f0, instance_navidrome__navidrome-d0dceae0943b8df16e579c2d9437e11760a0626a, instance_navidrome__navidrome-d5df102f9f97c21715c756069c9e141da2a422dc, instance_navidrome__navidrome-d8e794317f788198227e10fb667e10496b3eb99a, instance_navidrome__navidrome-dfa453cc4ab772928686838dc73d0130740f054e, instance_navidrome__navidrome-e12a14a87d392ac70ee4cc8079e3c3e0103dbcb2, instance_navidrome__navidrome-eebfbc5381a1e506ff17b5f1371d1ad83d5fd642, instance_navidrome__navidrome-f78257235ec3429ef42af6687738cd327ec77ce8, instance_navidrome__navidrome-fa85e2a7816a6fe3829a4c0d8e893e982b0985da, instance_protonmail__webclients-01ea5214d11e0df8b7170d91bafd34f23cb0f2b1, instance_protonmail__webclients-08bb09914d0d37b0cd6376d4cab5b77728a43e7b, instance_protonmail__webclients-1917e37f5d9941a3459ce4b0177e201e2d94a622, instance_protonmail__webclients-281a6b3f190f323ec2c0630999354fafb84b2880, instance_protonmail__webclients-2dce79ea4451ad88d6bfe94da22e7f2f988efa60, instance_protonmail__webclients-2f2f6c311c6128fe86976950d3c0c2db07b03921, instance_protonmail__webclients-2f66db85455f4b22a47ffd853738f679b439593c, instance_protonmail__webclients-369fd37de29c14c690cb3b1c09a949189734026f, instance_protonmail__webclients-428cd033fede5fd6ae9dbc7ab634e010b10e4209, instance_protonmail__webclients-51742625834d3bd0d10fe0c7e76b8739a59c6b9f, instance_protonmail__webclients-5e815cfa518b223a088fa9bb232a5fc90ab15691, instance_protonmail__webclients-5f0745dd6993bb1430a951c62a49807c6635cd77, instance_protonmail__webclients-6e165e106d258a442ae849cdf08260329cb92d39, instance_protonmail__webclients-6e1873b06df6529a469599aa1d69d3b18f7d9d37, instance_protonmail__webclients-715dbd4e6999499cd2a576a532d8214f75189116, instance_protonmail__webclients-863d524b5717b9d33ce08a0f0535e3fd8e8d1ed8, instance_protonmail__webclients-8be4f6cb9380fcd2e67bcb18cef931ae0d4b869c, instance_protonmail__webclients-ae36cb23a1682dcfd69587c1b311ae0227e28f39, instance_protonmail__webclients-c6f65d205c401350a226bb005f42fac1754b0b5b, instance_protonmail__webclients-c8117f446c3d1d7e117adc6e0e46b0ece9b0b90e, instance_protonmail__webclients-caf10ba9ab2677761c88522d1ba8ad025779c492, instance_protonmail__webclients-cba6ebbd0707caa524ffee51c62b197f6122c902, instance_protonmail__webclients-cfd7571485186049c10c822f214d474f1edde8d1, instance_protonmail__webclients-d8ff92b414775565f496b830c9eb6cc5fa9620e6, instance_protonmail__webclients-da91f084c0f532d9cc8ca385a701274d598057b8, instance_protonmail__webclients-e65cc5f33719e02e1c378146fb981d27bc24bdf4, instance_qutebrowser__qutebrowser-01d1d1494411380d97cac14614a829d3a69cecaf-v2ef375ac784985212b1805e1d0431dc8f1b3c171, instance_qutebrowser__qutebrowser-16de05407111ddd82fa12e54389d532362489da9-v363c8a7e5ccdf6968fc7ab84a2053ac78036691d, instance_qutebrowser__qutebrowser-21b426b6a20ec1cc5ecad770730641750699757b-v363c8a7e5ccdf6968fc7ab84a2053ac78036691d, instance_qutebrowser__qutebrowser-305e7c96d5e2fdb3b248b27dfb21042fb2b7e0b8-v2ef375ac784985212b1805e1d0431dc8f1b3c171, instance_qutebrowser__qutebrowser-36ade4bba504eb96f05d32ceab9972df7eb17bcc-v2ef375ac784985212b1805e1d0431dc8f1b3c171, instance_qutebrowser__qutebrowser-394bfaed6544c952c6b3463751abab3176ad4997-vafb3e8e01b31319c66c4e666b8a3b1d8ba55db24, instance_qutebrowser__qutebrowser-3e21c8214a998cb1058defd15aabb24617a76402-v5fc38aaf22415ab0b70567368332beee7955b367, instance_qutebrowser__qutebrowser-473a15f7908f2bb6d670b0e908ab34a28d8cf7e2-v363c8a7e5ccdf6968fc7ab84a2053ac78036691d, instance_qutebrowser__qutebrowser-7f9713b20f623fc40473b7167a082d6db0f0fd40-va0fd88aac89cde702ec1ba84877234da33adce8a, instance_qutebrowser__qutebrowser-9b71c1ea67a9e7eb70dd83214d881c2031db6541-v363c8a7e5ccdf6968fc7ab84a2053ac78036691d, instance_qutebrowser__qutebrowser-a25e8a09873838ca9efefd36ea8a45170bbeb95c-vc2f56a753b62a190ddb23cd330c257b9cf560d12, instance_qutebrowser__qutebrowser-c0be28ebee3e1837aaf3f30ec534ccd6d038f129-v9f8e9d96c85c85a605e382f1510bd08563afc566, instance_qutebrowser__qutebrowser-deeb15d6f009b3ca0c3bd503a7cef07462bd16b4-v363c8a7e5ccdf6968fc7ab84a2053ac78036691d, instance_qutebrowser__qutebrowser-e34dfc68647d087ca3175d9ad3f023c30d8c9746-v363c8a7e5ccdf6968fc7ab84a2053ac78036691d, instance_qutebrowser__qutebrowser-ec2dcfce9eee9f808efc17a1b99e227fc4421dea-v5149fcda2a9a6fe1d35dfed1bade1444a11ef271, instance_qutebrowser__qutebrowser-ef5ba1a0360b39f9eff027fbdc57f363597c3c3b-v363c8a7e5ccdf6968fc7ab84a2053ac78036691d, instance_qutebrowser__qutebrowser-f631cd4422744160d9dcf7a0455da532ce973315-v35616345bb8052ea303186706cec663146f0f184, instance_qutebrowser__qutebrowser-fcfa069a06ade76d91bac38127f3235c13d78eb1-v5fc38aaf22415ab0b70567368332beee7955b367, instance_qutebrowser__qutebrowser-ff1c025ad3210506fc76e1f604d8c8c27637d88e-v363c8a7e5ccdf6968fc7ab84a2053ac78036691d, instance_tutao__tutanota-09c2776c0fce3db5c6e18da92b5a45dce9f013aa-vbc0d9ba8f0071fbe982809910959a6ff8884dbbf, instance_tutao__tutanota-12a6cbaa4f8b43c2f85caca0787ab55501539955-vc4e41fd0029957297843cb9dec4a25c7c756f029, instance_tutao__tutanota-1e516e989b3c0221f4af6b297d9c0e4c43e4adc3-vbc0d9ba8f0071fbe982809910959a6ff8884dbbf, instance_tutao__tutanota-1ff82aa365763cee2d609c9d19360ad87fdf2ec7-vc4e41fd0029957297843cb9dec4a25c7c756f029, instance_tutao__tutanota-40e94dee2bcec2b63f362da283123e9df1874cc1-vc4e41fd0029957297843cb9dec4a25c7c756f029, instance_tutao__tutanota-4b4e45949096bb288f2b522f657610e480efa3e8-vee878bb72091875e912c52fc32bc60ec3760227b, instance_tutao__tutanota-51818218c6ae33de00cbea3a4d30daac8c34142e-vc4e41fd0029957297843cb9dec4a25c7c756f029, instance_tutao__tutanota-8513a9e8114a8b42e64f4348335e0f23efa054c4-vee878bb72091875e912c52fc32bc60ec3760227b, instance_tutao__tutanota-b4934a0f3c34d9d7649e944b183137e8fad3e859-vbc0d9ba8f0071fbe982809910959a6ff8884dbbf, instance_tutao__tutanota-befce4b146002b9abc86aa95f4d57581771815ce-vee878bb72091875e912c52fc32bc60ec3760227b, instance_tutao__tutanota-d1aa0ecec288bfc800cfb9133b087c4f81ad8b38-vbc0d9ba8f0071fbe982809910959a6ff8884dbbf, instance_tutao__tutanota-da4edb7375c10f47f4ed3860a591c5e6557f7b5c-vbc0d9ba8f0071fbe982809910959a6ff8884dbbf, instance_tutao__tutanota-db90ac26ab78addf72a8efaff3c7acc0fbd6d000-vbc0d9ba8f0071fbe982809910959a6ff8884dbbf, instance_tutao__tutanota-f3ffe17af6e8ab007e8d461355057ad237846d9d-vbc0d9ba8f0071fbe982809910959a6ff8884dbbf, instance_tutao__tutanota-fb32e5f9d9fc152a00144d56dd0af01760a2d4dc-vc4e41fd0029957297843cb9dec4a25c7c756f029, instance_tutao__tutanota-fbdb72a2bd39b05131ff905780d9d4a2a074de26-vbc0d9ba8f0071fbe982809910959a6ff8884dbbf, instance_tutao__tutanota-fe240cbf7f0fdd6744ef7bef8cb61676bcdbb621-vc4e41fd0029957297843cb9dec4a25c7c756f029

Problems solved by 1 model only

example_link model min_pass1_of_model

Suspect problems

These are 10 problems with the lowest correlation with the overall evaluation (i.e. better models tend to do worse on these. )

example_link pass1_of_ex tau
instance_protonmail__webclients-01b519cd49e6a24d9a05d2eb97f54e420740072e 0.964 -0.303
instance_flipt-io__flipt-0b119520afca1cf25c470ff4288c464d4510b944 0.974 -0.260
instance_protonmail__webclients-e7f3f20c8ad86089967498632ace73c1157a9d51 0.906 -0.251
instance_gravitational__teleport-ad41b3c15414b28a6cec8c25424a19bfa7abd0e9-vee9b09fb20c43af7e520f57e9239bbcf46b7113d 0.013 -0.234
instance_gravitational__teleport-dd3977957a67bedaf604ad6ca255ba8c7b6704e9 0.019 -0.221
instance_qutebrowser__qutebrowser-e5340c449f23608803c286da0563b62f58ba25b0-v059c6fdc75567943479b23ebca7c07b5e9a7f34c 0.198 -0.220
instance_flipt-io__flipt-abaa5953795afb9c621605bb18cb32ac48b4508c 0.011 -0.215
instance_navidrome__navidrome-8e640bb8580affb7e0ea6225c0bbe240186b6b08 0.965 -0.205
instance_protonmail__webclients-944adbfe06644be0789f59b78395bdd8567d8547 0.013 -0.198
instance_ansible__ansible-d2f80991180337e2be23d6883064a67dcbaeb662-vba6da65a0f3baefda7a058ebbd0a8dcafb8512f5 0.027 -0.198

Histogram of accuracies

Histogram of problems by the accuracy on each problem.

Histogram of difficulties

Histogram of problems by the minimum win rate to solve each problem.