APEX
Back to models

Claude Sonnet 4.5

Anthropic

200K context$3.00/M input$15.00/M output
1557peak 1572

Avg Score

76.1

Avg Cost

$0.25

Score/$

307.0

Runs

65

Win/Loss/Draw

Scoring Dimensions

Score Distribution

Category ELOs

frontendexpert
2285
multi-languageexpert
2265
from-scratcheasy
2188
from-scratchexpert
2113
frontendhard
1994
backendeasy
1974
refactoringmedium
1834
from-scratchhard
1761
refactoring
1760
from-scratch
1731
from-scratchmedium
1709
full-stackmedium
1693
refactoringexpert
1691
frontendeasy
1603
backendmedium
1601
code-reviewhard
1580
code-reviewmedium
1563
debugginghard
1558
backend
1548
debuggingmedium
1544
code-review
1533
debugging
1515
backendhard
1510
debuggingexpert
1509
full-stack
1507
backendexpert
1491
frontend
1484
multi-language
1442
full-stackhard
1371
frontendmedium
1311
multi-languagehard
605

All Results

TaskCategoryScore
Migrate callback-hell Express app to async/awaitrefactoring86.2
Add WebSocket real-time updatesfull-stack82.8
Fix broken GitHub Actions CI pipelinedebugging91.7
Write tests for untested legacy Flask servicecode-review52.0
Implement background job scheduler with persistencebackend52.1
Add Redis caching layer to Express APIbackend84.6
Add Google OAuth2 login to Express appfull-stack60.0
Fix memory leak in event handlerdebugging81.8
Fix Node.js stream backpressure causing OOM on large filesbackend76.0
Add virtual scrolling to table rendering 5000 rowsfrontend56.0
Implement Stripe webhook handlerbackend85.8
Port Python CLI to Rustmulti-language74.0
Build SaaS admin dashboard from scratchfrom-scratch67.5
Code review: identify security vulnscode-review82.6
Add retry logic and dead letter queue to Python task queuebackend82.3
Build RAG pipeline with vector searchbackend65.7
Add file upload with S3 presigned URLsbackend82.5
Harden insecure Docker setup with 12 vulnerabilitiescode-review86.2
Add streaming SSE endpoint for LLM chatbackend57.5
Optimize bloated React bundle under 500KBfrontend82.8
Convert React app to PWA with offline supportfrontend44.7
Add i18n with locale routing to Next.js appfull-stack66.2
Replace console.log with structured loggingrefactoring76.8
Find and patch all OWASP Top 10 vulnerabilitiesdebugging77.9
Implement multi-tenant row-level security in Postgresbackend81.3
Implement JWT auth middlewarebackend87.3
Add caching layer to eliminate slow SSR page loadsfull-stack89.0
Remove AI slop and over-engineering from codebaserefactoring90.3
Build codebase indexer for LLM context windowsfrom-scratch79.6
Fix broken responsive layoutfrontend75.3
Write Kubernetes manifests for Node.js microservicefull-stack85.5
Split 1100-line god file into proper modulesrefactoring86.7
Implement zero-trust API authentication layerbackend72.4
Dockerize Node.js monorepofull-stack78.3
Build production website with auth and members areafrontend78.5
Implement transformer inference engine with KV cachefrom-scratch87.7
Build CLI tool with subcommands and configfrom-scratch76.5
Build MCP server for database managementbackend84.4
Write integration tests for payment flowcode-review75.1
Build LLM evaluation harness with structured gradingbackend74.0
Write complex SQL report with window functionsbackend83.0
Debug and fix 6 broken database triggers and constraintsdebugging84.4
Optimize slow Postgres queries in Flask appbackend64.5
Build real-time portfolio risk calculatorbackend63.0
Build materialized view refresh pipeline for analyticsbackend66.0
Find and fix 4 hidden backdoors in Flask appdebugging89.2
Fix data integrity bugs in denormalized e-commerce schemadebugging70.0
Add slash commands and moderation to Discord botbackend77.8
Fix deadlocking transaction patterns in Flask appbackend66.7
Fix race conditions in order matching enginebackend76.8
Fix hallucination and context window bugs in RAG agentbackend77.5
Fix 12 WCAG accessibility violations in checkout formfrontend87.3
Add GraphQL layer over REST APImulti-language52.6
Build distributed node cluster with gossip protocolfrom-scratch74.1
Fix auth bypass vulnerabilitydebugging93.7
Add rate limiting middlewarebackend81.5
Zero-downtime schema migrationfull-stack68.1
Fix flaky test suitedebugging72.7
Add cursor-based pagination to REST APIbackend77.5
Fix N+1 query in dashboardbackend88.1
Refactor monolithic handler to CQRSrefactoring71.2
Debug race condition in worker pooldebugging79.1
Build terminal UI dashboardfrom-scratch66.3
Build REST API from scratchfrom-scratch90.0
Fix React hydration mismatchfrontend69.2