APEX
Back to models

GPT 5.3 Codex

OpenRouter

400K context$1.75/M input$14.00/M output
1639peak 1655

Avg Score

78.8

Avg Cost

$0.12

Score/$

661.5

Runs

65

Win/Loss/Draw

Scoring Dimensions

Score Distribution

Category ELOs

backendeasy
2389
from-scratchmedium
2359
multi-languageexpert
2331
from-scratchexpert
2157
multi-languagehard
2052
from-scratcheasy
1999
frontendhard
1973
frontendexpert
1854
code-reviewhard
1848
multi-language
1786
backendmedium
1785
full-stackmedium
1738
from-scratch
1728
debuggingmedium
1707
backendhard
1696
backend
1695
debuggingexpert
1688
frontendeasy
1679
from-scratchhard
1678
full-stack
1674
frontend
1657
frontendmedium
1647
full-stackhard
1641
code-reviewmedium
1622
code-review
1615
debugging
1592
backendexpert
1572
debugginghard
1488
refactoring
1371
refactoringmedium
1355
refactoringexpert
1136

All Results

TaskCategoryScore
Build codebase indexer for LLM context windowsfrom-scratch56.1
Add file upload with S3 presigned URLsbackend84.2
Build CLI tool with subcommands and configfrom-scratch78.8
Add virtual scrolling to table rendering 5000 rowsfrontend87.3
Replace console.log with structured loggingrefactoring58.0
Add WebSocket real-time updatesfull-stack82.7
Implement transformer inference engine with KV cachefrom-scratch89.8
Build distributed node cluster with gossip protocolfrom-scratch82.3
Fix memory leak in event handlerdebugging88.9
Fix broken responsive layoutfrontend77.3
Add Redis caching layer to Express APIbackend87.0
Fix auth bypass vulnerabilitydebugging80.2
Add GraphQL layer over REST APImulti-language84.4
Convert React app to PWA with offline supportfrontend88.4
Fix broken GitHub Actions CI pipelinedebugging84.9
Fix flaky test suitedebugging93.7
Refactor monolithic handler to CQRSrefactoring58.9
Build LLM evaluation harness with structured gradingbackend83.0
Add cursor-based pagination to REST APIbackend87.0
Remove AI slop and over-engineering from codebaserefactoring87.3
Dockerize Node.js monorepofull-stack84.4
Implement JWT auth middlewarebackend53.0
Build materialized view refresh pipeline for analyticsbackend76.2
Build REST API from scratchfrom-scratch85.7
Add i18n with locale routing to Next.js appfull-stack80.0
Fix race conditions in order matching enginebackend90.0
Fix data integrity bugs in denormalized e-commerce schemadebugging93.3
Build production website with auth and members areafrontend72.1
Add slash commands and moderation to Discord botbackend88.8
Implement zero-trust API authentication layerbackend74.9
Add Google OAuth2 login to Express appfull-stack87.2
Write complex SQL report with window functionsbackend76.7
Migrate callback-hell Express app to async/awaitrefactoring62.7
Optimize bloated React bundle under 500KBfrontend81.9
Implement Stripe webhook handlerbackend85.4
Debug and fix 6 broken database triggers and constraintsdebugging83.5
Find and fix 4 hidden backdoors in Flask appdebugging69.5
Add caching layer to eliminate slow SSR page loadsfull-stack80.3
Fix 12 WCAG accessibility violations in checkout formfrontend86.9
Add rate limiting middlewarebackend91.3
Build RAG pipeline with vector searchbackend66.8
Optimize slow Postgres queries in Flask appbackend91.7
Debug race condition in worker pooldebugging93.3
Split 1100-line god file into proper modulesrefactoring51.1
Build real-time portfolio risk calculatorbackend73.3
Build terminal UI dashboardfrom-scratch78.8
Build SaaS admin dashboard from scratchfrom-scratch54.2
Find and patch all OWASP Top 10 vulnerabilitiesdebugging72.8
Add streaming SSE endpoint for LLM chatbackend89.2
Code review: identify security vulnscode-review77.5
Implement multi-tenant row-level security in Postgresbackend58.8
Fix N+1 query in dashboardbackend73.5
Fix hallucination and context window bugs in RAG agentbackend67.0
Write Kubernetes manifests for Node.js microservicefull-stack94.3
Fix React hydration mismatchfrontend65.8
Implement background job scheduler with persistencebackend84.4
Harden insecure Docker setup with 12 vulnerabilitiescode-review90.7
Add retry logic and dead letter queue to Python task queuebackend88.0
Write tests for untested legacy Flask servicecode-review66.4
Write integration tests for payment flowcode-review79.6
Fix Node.js stream backpressure causing OOM on large filesbackend89.3
Port Python CLI to Rustmulti-language76.0
Fix deadlocking transaction patterns in Flask appbackend61.0
Build MCP server for database managementbackend91.7
Zero-downtime schema migrationfull-stack66.0