APEX
Back to models

Composer 2.5

xAI

256K context$0.50/M input$2.50/M output
1765peak 1793

Avg Score

84.7

Avg Cost

$0.39

Score/$

215.5

Runs

70

Win/Loss/Draw

Scoring Dimensions

Score Distribution

Category ELOs

refactoringexpert
2697
from-scratchmedium
2632
multi-languageexpert
2534
code-reviewhard
2436
frontendeasy
2383
frontendexpert
2320
multi-languagehard
2200
backendeasy
2170
from-scratcheasy
2151
from-scratchhard
1989
code-review
1908
multi-language
1906
code-reviewmedium
1900
frontendmaster
1888
refactoring
1877
from-scratch
1862
backendexpert
1850
refactoringmedium
1837
frontendhard
1829
from-scratchexpert
1802
debuggingmedium
1792
full-stackhard
1777
frontend
1774
backendhard
1769
debuggingexpert
1760
backend
1750
full-stack
1739
debugging
1712
full-stackmedium
1706
backendmaster
1706
debugginghard
1700
backendmedium
1688
frontendmedium
1680

All Results

TaskCategoryScore
Migrate Express monolith to modular architecturebackend87.8
Implement Stripe webhook handlerbackend77.7
Debug race condition in worker pooldebugging90.9
Add Google OAuth2 login to Express appfull-stack80.7
Build MCP server for database managementbackend88.0
Build RAG pipeline with vector searchbackend78.8
Build interactive data visualization dashboardfrontend81.7
Build 3D browser game with physics and multiplayer syncfrontend84.8
Fix race conditions in order matching enginebackend91.7
Build production website with auth and members areafrontend79.0
Build LLM evaluation harness with structured gradingbackend76.9
Add i18n with locale routing to Next.js appfull-stack84.1
Add streaming SSE endpoint for LLM chatbackend87.0
Find and patch all OWASP Top 10 vulnerabilitiesdebugging88.0
Build codebase indexer for LLM context windowsfrom-scratch78.5
Build materialized view refresh pipeline for analyticsbackend80.3
Fix auth bypass vulnerabilitydebugging91.6
Build terminal UI dashboardfrom-scratch83.4
Code review: identify security vulnscode-review93.8
Fix Node.js stream backpressure causing OOM on large filesbackend91.3
Harden insecure Docker setup with 12 vulnerabilitiescode-review91.3
Build SaaS admin dashboard from scratchfrom-scratch78.4
Build multi-tool LLM agent runtimebackend85.0
Migrate callback-hell Express app to async/awaitrefactoring86.8
Implement background job scheduler with persistencebackend83.8
Fix memory leak in event handlerdebugging89.8
Add rate limiting middlewarebackend86.4
Refactor monolithic handler to CQRSrefactoring88.6
Port Python CLI to Rustmulti-language80.8
Dockerize Node.js monorepofull-stack76.7
Implement zero-trust API authentication layerbackend81.2
Build REST API from scratchfrom-scratch89.4
Zero-downtime schema migrationfull-stack81.8
Fix 12 WCAG accessibility violations in checkout formfrontend85.0
Add slash commands and moderation to Discord botbackend83.6
Remove AI slop and over-engineering from codebaserefactoring88.0
Add virtual scrolling to table rendering 5000 rowsfrontend82.0
Fix and extend Chrome browser extensionfrontend84.8
Write tests for untested legacy Flask servicecode-review81.7
Fix N+1 query in dashboardbackend80.3
Optimize bloated React bundle under 500KBfrontend80.8
Implement transformer inference engine with KV cachefrom-scratch82.1
Add cursor-based pagination to REST APIbackend79.5
Fix broken responsive layoutfrontend88.0
Add caching layer to eliminate slow SSR page loadsfull-stack88.3
Implement multi-tenant row-level security in Postgresbackend81.6
Fix React hydration mismatchfrontend82.1
Optimize slow Postgres queries in Flask appbackend88.3
Implement JWT auth middlewarebackend76.3
Write Kubernetes manifests for Node.js microservicefull-stack88.3
Add GraphQL layer over REST APImulti-language86.5
Write integration tests for payment flowcode-review89.5
Replace console.log with structured loggingrefactoring80.8
Convert React app to PWA with offline supportfrontend79.5
Add file upload with S3 presigned URLsbackend76.0
Find and fix 4 hidden backdoors in Flask appdebugging91.5
Add Redis caching layer to Express APIbackend88.3
Fix hallucination and context window bugs in RAG agentbackend82.6
Debug and fix 6 broken database triggers and constraintsdebugging85.8
Add retry logic and dead letter queue to Python task queuebackend84.1
Fix flaky test suitedebugging89.7
Fix data integrity bugs in denormalized e-commerce schemadebugging86.5
Build real-time portfolio risk calculatorbackend87.4
Build distributed node cluster with gossip protocolfrom-scratch84.4
Write complex SQL report with window functionsbackend86.1
Fix broken GitHub Actions CI pipelinedebugging92.7
Build CLI tool with subcommands and configfrom-scratch80.5
Add WebSocket real-time updatesfull-stack87.8
Fix deadlocking transaction patterns in Flask appbackend86.8
Split 1100-line god file into proper modulesrefactoring87.8