APEX
Back to models

Grok 4.3

xAI

1000K context$1.25/M input$2.50/M output
1743peak 1793

Avg Score

83.5

Avg Cost

$0.39

Score/$

212.5

Runs

70

Win/Loss/Draw

Scoring Dimensions

Score Distribution

Category ELOs

multi-languageexpert
2921
from-scratchmedium
2550
code-reviewhard
2385
from-scratcheasy
2339
backendeasy
2201
frontendeasy
2145
refactoringexpert
2112
from-scratchhard
2020
multi-languagehard
2008
from-scratchexpert
1922
from-scratch
1913
multi-language
1894
code-review
1867
code-reviewmedium
1860
debuggingexpert
1842
full-stackhard
1834
debuggingmedium
1808
full-stack
1806
refactoringmedium
1806
full-stackmedium
1794
refactoring
1787
backendexpert
1785
debugging
1749
backendhard
1723
backend
1716
frontendmedium
1715
debugginghard
1711
backendmedium
1689
frontend
1647
frontendhard
1614
backendmaster
1548
frontendmaster
1532
frontendexpert
1343

All Results

TaskCategoryScore
Convert React app to PWA with offline supportfrontend83.5
Fix and extend Chrome browser extensionfrontend66.6
Add virtual scrolling to table rendering 5000 rowsfrontend74.8
Optimize bloated React bundle under 500KBfrontend83.5
Code review: identify security vulnscode-review91.3
Refactor monolithic handler to CQRSrefactoring78.2
Find and patch all OWASP Top 10 vulnerabilitiesdebugging86.6
Build interactive data visualization dashboardfrontend72.5
Debug and fix 6 broken database triggers and constraintsdebugging86.5
Add streaming SSE endpoint for LLM chatbackend87.5
Add rate limiting middlewarebackend87.0
Fix auth bypass vulnerabilitydebugging92.3
Write Kubernetes manifests for Node.js microservicefull-stack88.5
Add slash commands and moderation to Discord botbackend78.5
Implement JWT auth middlewarebackend85.8
Implement Stripe webhook handlerbackend85.8
Split 1100-line god file into proper modulesrefactoring84.0
Build materialized view refresh pipeline for analyticsbackend79.5
Build production website with auth and members areafrontend65.1
Build real-time portfolio risk calculatorbackend75.0
Add WebSocket real-time updatesfull-stack85.6
Build multi-tool LLM agent runtimebackend83.9
Implement zero-trust API authentication layerbackend85.8
Add caching layer to eliminate slow SSR page loadsfull-stack89.6
Write integration tests for payment flowcode-review88.7
Implement multi-tenant row-level security in Postgresbackend84.2
Write complex SQL report with window functionsbackend80.1
Fix race conditions in order matching enginebackend87.5
Fix broken GitHub Actions CI pipelinedebugging90.0
Find and fix 4 hidden backdoors in Flask appdebugging88.8
Build SaaS admin dashboard from scratchfrom-scratch73.7
Build MCP server for database managementbackend86.8
Add i18n with locale routing to Next.js appfull-stack82.7
Add file upload with S3 presigned URLsbackend73.5
Fix hallucination and context window bugs in RAG agentbackend85.3
Port Python CLI to Rustmulti-language87.3
Implement transformer inference engine with KV cachefrom-scratch83.7
Replace console.log with structured loggingrefactoring85.2
Optimize slow Postgres queries in Flask appbackend86.3
Build 3D browser game with physics and multiplayer syncfrontend80.2
Implement background job scheduler with persistencebackend71.0
Fix data integrity bugs in denormalized e-commerce schemadebugging91.7
Zero-downtime schema migrationfull-stack89.0
Build CLI tool with subcommands and configfrom-scratch82.5
Fix Node.js stream backpressure causing OOM on large filesbackend94.0
Add retry logic and dead letter queue to Python task queuebackend83.3
Build distributed node cluster with gossip protocolfrom-scratch83.6
Remove AI slop and over-engineering from codebaserefactoring84.0
Debug race condition in worker pooldebugging92.2
Add Google OAuth2 login to Express appfull-stack81.6
Harden insecure Docker setup with 12 vulnerabilitiescode-review90.5
Dockerize Node.js monorepofull-stack81.5
Migrate callback-hell Express app to async/awaitrefactoring86.3
Fix N+1 query in dashboardbackend76.5
Fix 12 WCAG accessibility violations in checkout formfrontend80.2
Add GraphQL layer over REST APImulti-language83.3
Add Redis caching layer to Express APIbackend88.3
Build LLM evaluation harness with structured gradingbackend72.8
Fix memory leak in event handlerdebugging90.5
Write tests for untested legacy Flask servicecode-review78.3
Migrate Express monolith to modular architecturebackend74.7
Build terminal UI dashboardfrom-scratch82.7
Build RAG pipeline with vector searchbackend78.0
Build REST API from scratchfrom-scratch94.3
Fix React hydration mismatchfrontend89.5
Fix broken responsive layoutfrontend83.2
Fix deadlocking transaction patterns in Flask appbackend87.6
Build codebase indexer for LLM context windowsfrom-scratch86.1
Fix flaky test suitedebugging91.8
Add cursor-based pagination to REST APIbackend77.7