APEX
Back to models

Claude Sonnet 4.6

Anthropic

200K context$3.00/M input$15.00/M output
1743peak 1760

Avg Score

83.8

Avg Cost

$0.31

Score/$

266.8

Runs

70

Win/Loss/Draw

Scoring Dimensions

Score Distribution

Category ELOs

backendeasy
2440
frontendeasy
2409
frontendexpert
2395
from-scratchmedium
2340
refactoringexpert
2312
from-scratcheasy
2308
frontendhard
2229
multi-languagehard
2200
code-reviewhard
2176
from-scratchexpert
2072
refactoringmedium
2024
full-stackmedium
1996
refactoring
1969
multi-languageexpert
1923
code-reviewmedium
1914
code-review
1875
from-scratchhard
1874
from-scratch
1845
debuggingmedium
1842
backendhard
1841
backendexpert
1821
debuggingexpert
1796
multi-language
1766
backendmaster
1752
backend
1741
full-stack
1721
frontend
1719
debugging
1676
backendmedium
1616
frontendmedium
1615
full-stackhard
1597
debugginghard
1573
frontendmaster
1526

All Results

TaskCategoryScore
Fix and extend Chrome browser extensionfrontend62.6
Build 3D browser game with physics and multiplayer syncfrontend83.3
Build multi-tool LLM agent runtimebackend87.3
Build interactive data visualization dashboardfrontend68.5
Migrate Express monolith to modular architecturebackend84.1
Fix broken GitHub Actions CI pipelinedebugging96.5
Add Redis caching layer to Express APIbackend73.7
Add Google OAuth2 login to Express appfull-stack68.5
Implement Stripe webhook handlerbackend92.5
Add streaming SSE endpoint for LLM chatbackend65.4
Fix hallucination and context window bugs in RAG agentbackend85.7
Implement background job scheduler with persistencebackend81.6
Implement transformer inference engine with KV cachefrom-scratch87.0
Add WebSocket real-time updatesfull-stack81.8
Build real-time portfolio risk calculatorbackend74.8
Add rate limiting middlewarebackend91.8
Build materialized view refresh pipeline for analyticsbackend90.3
Code review: identify security vulnscode-review85.7
Build distributed node cluster with gossip protocolfrom-scratch75.3
Fix data integrity bugs in denormalized e-commerce schemadebugging91.4
Build RAG pipeline with vector searchbackend60.5
Port Python CLI to Rustmulti-language68.5
Write tests for untested legacy Flask servicecode-review86.0
Add retry logic and dead letter queue to Python task queuebackend83.3
Migrate callback-hell Express app to async/awaitrefactoring90.0
Fix flaky test suitedebugging90.0
Implement multi-tenant row-level security in Postgresbackend87.6
Build codebase indexer for LLM context windowsfrom-scratch82.3
Add file upload with S3 presigned URLsbackend75.5
Add i18n with locale routing to Next.js appfull-stack81.6
Implement JWT auth middlewarebackend87.0
Write Kubernetes manifests for Node.js microservicefull-stack90.8
Remove AI slop and over-engineering from codebaserefactoring93.0
Find and patch all OWASP Top 10 vulnerabilitiesdebugging91.6
Implement zero-trust API authentication layerbackend83.7
Optimize bloated React bundle under 500KBfrontend90.7
Replace console.log with structured loggingrefactoring92.8
Split 1100-line god file into proper modulesrefactoring90.1
Add caching layer to eliminate slow SSR page loadsfull-stack94.2
Convert React app to PWA with offline supportfrontend72.7
Dockerize Node.js monorepofull-stack89.2
Fix broken responsive layoutfrontend88.2
Harden insecure Docker setup with 12 vulnerabilitiescode-review95.1
Build SaaS admin dashboard from scratchfrom-scratch75.4
Build terminal UI dashboardfrom-scratch78.3
Build production website with auth and members areafrontend82.2
Build CLI tool with subcommands and configfrom-scratch77.7
Build MCP server for database managementbackend87.0
Build LLM evaluation harness with structured gradingbackend74.4
Fix race conditions in order matching enginebackend89.7
Fix deadlocking transaction patterns in Flask appbackend90.4
Debug and fix 6 broken database triggers and constraintsdebugging83.4
Find and fix 4 hidden backdoors in Flask appdebugging93.7
Write complex SQL report with window functionsbackend84.0
Fix 12 WCAG accessibility violations in checkout formfrontend91.8
Optimize slow Postgres queries in Flask appbackend89.1
Add slash commands and moderation to Discord botbackend79.8
Write integration tests for payment flowcode-review83.5
Add GraphQL layer over REST APImulti-language86.4
Add virtual scrolling to table rendering 5000 rowsfrontend74.4
Fix Node.js stream backpressure causing OOM on large filesbackend91.3
Refactor monolithic handler to CQRSrefactoring82.5
Fix auth bypass vulnerabilitydebugging93.7
Zero-downtime schema migrationfull-stack79.0
Add cursor-based pagination to REST APIbackend80.6
Fix N+1 query in dashboardbackend89.5
Fix memory leak in event handlerdebugging79.3
Fix React hydration mismatchfrontend81.9
Debug race condition in worker pooldebugging80.5
Build REST API from scratchfrom-scratch93.3