APEX
Back to models

GPT 5.1 Codex Mini

OpenAI

400K context$3.00/M input$15.00/M output
1747peak 1765

Avg Score

81.0

Avg Cost

$1.84

Score/$

44.0

Runs

54

Win/Loss/Draw

Scoring Dimensions

Score Distribution

Category ELOs

multi-languageexpert
3238
code-reviewhard
3008
from-scratchmedium
2714
frontendexpert
2354
from-scratchhard
2327
code-reviewmedium
2289
code-review
2238
frontendhard
2185
multi-languagehard
2110
multi-language
2098
from-scratchexpert
2072
from-scratch
2010
backendeasy
2007
refactoringexpert
1984
debuggingmedium
1919
from-scratcheasy
1918
backendmedium
1776
backendexpert
1762
backend
1734
backendhard
1720
full-stack
1717
full-stackhard
1715
frontend
1711
refactoring
1705
debugging
1624
debuggingexpert
1621
frontendmedium
1588
debugginghard
1528

All Results

TaskCategoryScore
Harden insecure Docker setup with 12 vulnerabilitiescode-review
Build codebase indexer for LLM context windowsfrom-scratch
Code review: identify security vulnscode-review
Dockerize Node.js monorepofull-stack
Split 1100-line god file into proper modulesrefactoring
Build RAG pipeline with vector searchbackend
Add i18n with locale routing to Next.js appfull-stack
Remove AI slop and over-engineering from codebaserefactoring
Write Kubernetes manifests for Node.js microservicefull-stack
Optimize bloated React bundle under 500KBfrontend79.4
Implement multi-tenant row-level security in Postgresbackend82.5
Implement zero-trust API authentication layerbackend77.7
Fix broken GitHub Actions CI pipelinedebugging88.3
Find and patch all OWASP Top 10 vulnerabilitiesdebugging77.8
Build distributed node cluster with gossip protocolfrom-scratch72.5
Convert React app to PWA with offline supportfrontend79.7
Find and fix 4 hidden backdoors in Flask appdebugging87.0
Build production website with auth and members areafrontend75.5
Build SaaS admin dashboard from scratchfrom-scratch86.4
Build MCP server for database managementbackend69.2
Implement transformer inference engine with KV cachefrom-scratch86.6
Implement background job scheduler with persistencebackend82.0
Build CLI tool with subcommands and configfrom-scratch82.9
Build LLM evaluation harness with structured gradingbackend61.5
Build real-time portfolio risk calculatorbackend72.2
Add Redis caching layer to Express APIbackend80.0
Fix race conditions in order matching enginebackend90.9
Debug and fix 6 broken database triggers and constraintsdebugging79.8
Fix data integrity bugs in denormalized e-commerce schemadebugging80.8
Write complex SQL report with window functionsbackend79.5
Build materialized view refresh pipeline for analyticsbackend75.3
Fix deadlocking transaction patterns in Flask appbackend79.8
Fix hallucination and context window bugs in RAG agentbackend77.3
Write tests for untested legacy Flask servicecode-review92.5
Add Google OAuth2 login to Express appfull-stack82.0
Optimize slow Postgres queries in Flask appbackend81.0
Add slash commands and moderation to Discord botbackend83.8
Add virtual scrolling to table rendering 5000 rowsfrontend81.8
Fix 12 WCAG accessibility violations in checkout formfrontend90.4
Add GraphQL layer over REST APImulti-language80.5
Write integration tests for payment flowcode-review81.0
Zero-downtime schema migrationfull-stack76.5
Add rate limiting middlewarebackend83.5
Implement Stripe webhook handlerbackend80.5
Port Python CLI to Rustmulti-language90.3
Fix flaky test suitedebugging88.4
Add cursor-based pagination to REST APIbackend87.9
Fix N+1 query in dashboardbackend82.9
Fix memory leak in event handlerdebugging85.8
Refactor monolithic handler to CQRSrefactoring72.7
Debug race condition in worker pooldebugging76.6
Fix React hydration mismatchfrontend69.2
Build terminal UI dashboardfrom-scratch82.2
Build REST API from scratchfrom-scratch89.3