APEX
Back to models

Step 3.5 Flash

OpenRouter

256K context$0.10/M input$0.30/M output
1472peak 1496

Avg Score

53.7

Avg Cost

$0.14

Score/$

396.2

Runs

123

Win/Loss/Draw

Scoring Dimensions

Score Distribution

Category ELOs

from-scratchmedium
2031
backendeasy
1841
code-reviewhard
1675
debuggingmedium
1585
frontendmedium
1583
backendexpert
1571
full-stackmedium
1570
backendmedium
1569
debugginghard
1555
from-scratchhard
1541
code-review
1536
code-reviewmedium
1533
from-scratch
1511
backend
1503
full-stack
1491
full-stackhard
1441
debugging
1437
frontend
1429
backendhard
1393
refactoringmedium
1363
refactoring
1346
from-scratcheasy
1332
frontendexpert
1310
debuggingexpert
1264
frontendeasy
1208
refactoringexpert
1118
from-scratchexpert
1024
multi-language
949
multi-languagehard
563
frontendhard
403
multi-languageexpert
0

All Results

TaskCategoryScore
Add Google OAuth2 login to Express appfull-stack
Build codebase indexer for LLM context windowsfrom-scratch0.0
Add retry logic and dead letter queue to Python task queuebackend28.0
Build real-time portfolio risk calculatorbackend0.0
Implement JWT auth middlewarebackend28.0
Fix N+1 query in dashboardbackend0.0
Implement background job scheduler with persistencebackend0.0
Fix auth bypass vulnerabilitydebugging
Add streaming SSE endpoint for LLM chatbackend
Write integration tests for payment flowcode-review
Implement Stripe webhook handlerbackend
Fix broken GitHub Actions CI pipelinedebugging
Write tests for untested legacy Flask servicecode-review0.0
Add file upload with S3 presigned URLsbackend
Add cursor-based pagination to REST APIbackend
Add rate limiting middlewarebackend28.0
Add WebSocket real-time updatesfull-stack0.0
Fix flaky test suitedebugging0.0
Fix memory leak in event handlerdebugging
Refactor monolithic handler to CQRSrefactoring0.0
Fix 12 WCAG accessibility violations in checkout formfrontend28.0
Port Python CLI to Rustmulti-language0.0
Find and patch all OWASP Top 10 vulnerabilitiesdebugging28.0
Fix data integrity bugs in denormalized e-commerce schemadebugging28.0
Code review: identify security vulnscode-review22.0
Optimize bloated React bundle under 500KBfrontend0.0
Debug and fix 6 broken database triggers and constraintsdebugging0.0
Migrate callback-hell Express app to async/awaitrefactoring22.0
Add i18n with locale routing to Next.js appfull-stack28.0
Optimize slow Postgres queries in Flask appbackend0.0
Dockerize Node.js monorepofull-stack28.0
Fix race conditions in order matching enginebackend22.0
Fix broken responsive layoutfrontend0.0
Build LLM evaluation harness with structured gradingbackend22.0
Implement transformer inference engine with KV cachefrom-scratch0.0
Build CLI tool with subcommands and configfrom-scratch0.0
Remove AI slop and over-engineering from codebaserefactoring22.0
Fix React hydration mismatchfrontend62.9
Find and fix 4 hidden backdoors in Flask appdebugging90.9
Add virtual scrolling to table rendering 5000 rowsfrontend59.1
Build terminal UI dashboardfrom-scratch58.5
Add GraphQL layer over REST APImulti-language54.0
Build MCP server for database managementbackend76.9
Build REST API from scratchfrom-scratch72.7
Add Redis caching layer to Express APIbackend57.8
Fix deadlocking transaction patterns in Flask appbackend79.2
Convert React app to PWA with offline supportfrontend84.8
Build production website with auth and members areafrontend51.8
Split 1100-line god file into proper modulesrefactoring80.8
Write Kubernetes manifests for Node.js microservicefull-stack81.7
Build RAG pipeline with vector searchbackend43.0
Implement multi-tenant row-level security in Postgresbackend74.8
Build SaaS admin dashboard from scratchfrom-scratch44.3
Write complex SQL report with window functionsbackend41.8
Add slash commands and moderation to Discord botbackend53.3
Fix hallucination and context window bugs in RAG agentbackend50.5
Implement zero-trust API authentication layerbackend77.3
Debug race condition in worker pooldebugging87.9
Add caching layer to eliminate slow SSR page loadsfull-stack81.7
Replace console.log with structured loggingrefactoring60.0
Harden insecure Docker setup with 12 vulnerabilitiescode-review88.4
Zero-downtime schema migrationfull-stack82.5
Build materialized view refresh pipeline for analyticsbackend80.0
Fix Node.js stream backpressure causing OOM on large filesbackend52.3
Build distributed node cluster with gossip protocolfrom-scratch51.6
Replace console.log with structured loggingrefactoring62.8
Split 1100-line god file into proper modulesrefactoring71.9
Implement multi-tenant row-level security in Postgresbackend73.1
Add file upload with S3 presigned URLsbackend62.4
Optimize bloated React bundle under 500KBfrontend79.8
Add caching layer to eliminate slow SSR page loadsfull-stack82.7
Add i18n with locale routing to Next.js appfull-stack65.9
Build codebase indexer for LLM context windowsfrom-scratch28.1
Harden insecure Docker setup with 12 vulnerabilitiescode-review93.2
Convert React app to PWA with offline supportfrontend67.3
Remove AI slop and over-engineering from codebaserefactoring80.4
Fix broken responsive layoutfrontend69.2
Implement JWT auth middlewarebackend78.6
Dockerize Node.js monorepofull-stack75.9
Implement zero-trust API authentication layerbackend73.3
Write Kubernetes manifests for Node.js microservicefull-stack85.7
Find and patch all OWASP Top 10 vulnerabilitiesdebugging66.4
Build real-time portfolio risk calculatorbackend60.8
Implement transformer inference engine with KV cachefrom-scratch67.2
Build production website with auth and members areafrontend59.5
Build CLI tool with subcommands and configfrom-scratch47.3
Implement background job scheduler with persistencebackend48.5
Build MCP server for database managementbackend59.1
Build SaaS admin dashboard from scratchfrom-scratch74.6
Debug and fix 6 broken database triggers and constraintsdebugging36.5
Add Redis caching layer to Express APIbackend78.8
Build materialized view refresh pipeline for analyticsbackend43.3
Write integration tests for payment flowcode-review68.5
Add Google OAuth2 login to Express appfull-stack76.0
Add virtual scrolling to table rendering 5000 rowsfrontend43.0
Add rate limiting middlewarebackend79.5
Fix data integrity bugs in denormalized e-commerce schemadebugging76.8
Fix N+1 query in dashboardbackend83.8
Write complex SQL report with window functionsbackend75.5
Zero-downtime schema migrationfull-stack61.6
Build distributed node cluster with gossip protocolfrom-scratch43.3
Fix 12 WCAG accessibility violations in checkout formfrontend62.0
Add retry logic and dead letter queue to Python task queuebackend77.2
Fix race conditions in order matching enginebackend88.0
Find and fix 4 hidden backdoors in Flask appdebugging74.2
Refactor monolithic handler to CQRSrefactoring56.6
Fix hallucination and context window bugs in RAG agentbackend56.7
Add slash commands and moderation to Discord botbackend56.1
Fix flaky test suitedebugging78.5
Debug race condition in worker pooldebugging89.3
Fix deadlocking transaction patterns in Flask appbackend72.3
Fix Node.js stream backpressure causing OOM on large filesbackend90.4
Build LLM evaluation harness with structured gradingbackend46.0
Add cursor-based pagination to REST APIbackend55.6
Fix memory leak in event handlerdebugging63.4
Write tests for untested legacy Flask servicecode-review64.5
Build terminal UI dashboardfrom-scratch70.9
Add GraphQL layer over REST APImulti-language42.6
Implement Stripe webhook handlerbackend78.6
Optimize slow Postgres queries in Flask appbackend44.0
Fix React hydration mismatchfrontend73.0
Fix auth bypass vulnerabilitydebugging89.9
Build REST API from scratchfrom-scratch79.5