APEX
Back to models

GPT 5.2

OpenRouter

400K context$1.75/M input$14.00/M output
1828peak 1833

Avg Score

80.0

Avg Cost

$0.31

Score/$

258.7

Runs

122

Win/Loss/Draw

Scoring Dimensions

Score Distribution

Category ELOs

refactoringexpert
2558
frontendexpert
2536
backendeasy
2371
multi-languagehard
2212
frontendhard
2185
from-scratchhard
2139
from-scratchexpert
2090
full-stackmedium
2061
backendexpert
2034
frontendmedium
2017
code-reviewhard
1986
from-scratcheasy
1972
full-stack
1962
full-stackhard
1932
debuggingmedium
1919
from-scratch
1904
frontend
1896
backend
1864
refactoring
1861
backendmedium
1860
debuggingexpert
1841
refactoringmedium
1811
backendhard
1792
debugging
1754
code-reviewmedium
1732
code-review
1730
debugginghard
1711
from-scratchmedium
1694
frontendeasy
1555
multi-language
1554
multi-languageexpert
327

All Results

TaskCategoryScore
Debug race condition in worker pooldebugging92.1
Add caching layer to eliminate slow SSR page loadsfull-stack79.7
Write complex SQL report with window functionsbackend77.0
Add slash commands and moderation to Discord botbackend86.3
Remove AI slop and over-engineering from codebaserefactoring90.5
Write Kubernetes manifests for Node.js microservicefull-stack92.2
Build SaaS admin dashboard from scratchfrom-scratch63.6
Fix hallucination and context window bugs in RAG agentbackend63.0
Build codebase indexer for LLM context windowsfrom-scratch59.3
Fix deadlocking transaction patterns in Flask appbackend85.0
Optimize bloated React bundle under 500KBfrontend82.1
Fix memory leak in event handlerdebugging91.8
Debug and fix 6 broken database triggers and constraintsdebugging93.3
Build real-time portfolio risk calculatorbackend86.1
Replace console.log with structured loggingrefactoring69.0
Add i18n with locale routing to Next.js appfull-stack76.5
Build LLM evaluation harness with structured gradingbackend82.3
Add Redis caching layer to Express APIbackend90.1
Build materialized view refresh pipeline for analyticsbackend76.5
Refactor monolithic handler to CQRSrefactoring51.1
Build MCP server for database managementbackend86.9
Fix 12 WCAG accessibility violations in checkout formfrontend90.2
Implement background job scheduler with persistencebackend72.2
Fix Node.js stream backpressure causing OOM on large filesbackend91.2
Optimize slow Postgres queries in Flask appbackend83.0
Add retry logic and dead letter queue to Python task queuebackend78.6
Find and patch all OWASP Top 10 vulnerabilitiesdebugging75.1
Write tests for untested legacy Flask servicecode-review70.4
Build distributed node cluster with gossip protocolfrom-scratch83.7
Add file upload with S3 presigned URLsbackend84.4
Fix N+1 query in dashboardbackend74.7
Add virtual scrolling to table rendering 5000 rowsfrontend60.5
Add cursor-based pagination to REST APIbackend88.0
Build CLI tool with subcommands and configfrom-scratch72.3
Fix race conditions in order matching enginebackend84.8
Build RAG pipeline with vector searchbackend48.3
Fix flaky test suitedebugging92.8
Port Python CLI to Rustmulti-language40.5
Migrate callback-hell Express app to async/awaitrefactoring68.0
Fix broken GitHub Actions CI pipelinedebugging87.6
Fix broken responsive layoutfrontend76.0
Add Google OAuth2 login to Express appfull-stack88.1
Dockerize Node.js monorepofull-stack76.7
Find and fix 4 hidden backdoors in Flask appdebugging81.7
Build production website with auth and members areafrontend72.5
Zero-downtime schema migrationfull-stack88.3
Fix auth bypass vulnerabilitydebugging88.3
Implement transformer inference engine with KV cachefrom-scratch91.5
Harden insecure Docker setup with 12 vulnerabilitiescode-review94.2
Add rate limiting middlewarebackend77.9
Split 1100-line god file into proper modulesrefactoring64.4
Implement multi-tenant row-level security in Postgresbackend64.7
Add GraphQL layer over REST APImulti-language87.0
Fix data integrity bugs in denormalized e-commerce schemadebugging65.9
Add streaming SSE endpoint for LLM chatbackend88.5
Implement zero-trust API authentication layerbackend78.7
Implement JWT auth middlewarebackend50.9
Fix React hydration mismatchfrontend85.8
Implement Stripe webhook handlerbackend88.0
Code review: identify security vulnscode-review78.5
Add WebSocket real-time updatesfull-stack78.5
Write integration tests for payment flowcode-review84.3
Build REST API from scratchfrom-scratch67.2
Build terminal UI dashboardfrom-scratch60.4
Convert React app to PWA with offline supportfrontend84.7
Build codebase indexer for LLM context windowsfrom-scratch51.5
Harden insecure Docker setup with 12 vulnerabilitiescode-review92.5
Add streaming SSE endpoint for LLM chatbackend81.5
Write Kubernetes manifests for Node.js microservicefull-stack93.8
Convert React app to PWA with offline supportfrontend81.2
Remove AI slop and over-engineering from codebaserefactoring90.0
Add file upload with S3 presigned URLsbackend83.3
Implement multi-tenant row-level security in Postgresbackend85.8
Replace console.log with structured loggingrefactoring73.0
Add i18n with locale routing to Next.js appfull-stack78.0
Split 1100-line god file into proper modulesrefactoring86.9
Implement zero-trust API authentication layerbackend85.3
Implement JWT auth middlewarebackend87.2
Add caching layer to eliminate slow SSR page loadsfull-stack93.3
Optimize bloated React bundle under 500KBfrontend91.8
Find and patch all OWASP Top 10 vulnerabilitiesdebugging86.1
Fix broken responsive layoutfrontend75.5
Dockerize Node.js monorepofull-stack90.8
Build production website with auth and members areafrontend80.7
Write tests for untested legacy Flask servicecode-review70.7
Add retry logic and dead letter queue to Python task queuebackend80.5
Build SaaS admin dashboard from scratchfrom-scratch82.2
Implement background job scheduler with persistencebackend81.4
Build MCP server for database managementbackend84.1
Fix React hydration mismatchfrontend72.0
Implement transformer inference engine with KV cachefrom-scratch85.0
Build CLI tool with subcommands and configfrom-scratch81.0
Build LLM evaluation harness with structured gradingbackend64.9
Fix data integrity bugs in denormalized e-commerce schemadebugging77.5
Build real-time portfolio risk calculatorbackend67.0
Fix hallucination and context window bugs in RAG agentbackend71.8
Build materialized view refresh pipeline for analyticsbackend82.0
Fix race conditions in order matching enginebackend90.0
Debug and fix 6 broken database triggers and constraintsdebugging84.0
Fix deadlocking transaction patterns in Flask appbackend75.7
Write complex SQL report with window functionsbackend81.2
Add Redis caching layer to Express APIbackend82.2
Find and fix 4 hidden backdoors in Flask appdebugging88.5
Add Google OAuth2 login to Express appfull-stack82.0
Add slash commands and moderation to Discord botbackend89.3
Fix 12 WCAG accessibility violations in checkout formfrontend87.0
Optimize slow Postgres queries in Flask appbackend85.6
Build distributed node cluster with gossip protocolfrom-scratch83.8
Fix Node.js stream backpressure causing OOM on large filesbackend91.4
Write integration tests for payment flowcode-review67.5
Add virtual scrolling to table rendering 5000 rowsfrontend88.3
Add GraphQL layer over REST APImulti-language67.3
Build terminal UI dashboardfrom-scratch63.7
Zero-downtime schema migrationfull-stack79.5
Add cursor-based pagination to REST APIbackend88.8
Refactor monolithic handler to CQRSrefactoring79.5
Add rate limiting middlewarebackend88.3
Fix flaky test suitedebugging93.3
Fix N+1 query in dashboardbackend91.1
Fix memory leak in event handlerdebugging88.0
Build REST API from scratchfrom-scratch90.3
Debug race condition in worker pooldebugging83.2