APEX
Back to models

Gemini 2.5 Pro

OpenRouter

1049K context$1.25/M input$10.00/M output
1575peak 1595

Avg Score

67.9

Avg Cost

$0.27

Score/$

255.8

Runs

116

Win/Loss/Draw

Scoring Dimensions

Score Distribution

Category ELOs

refactoringexpert
2558
multi-languageexpert
2482
from-scratchmedium
2031
from-scratchexpert
1999
debuggingmedium
1949
frontendeasy
1876
frontendhard
1806
backendeasy
1800
from-scratcheasy
1788
from-scratchhard
1757
from-scratch
1756
code-reviewhard
1675
multi-language
1674
refactoring
1664
multi-languagehard
1656
frontendmedium
1606
full-stackmedium
1606
debugging
1603
debuggingexpert
1586
debugginghard
1578
frontend
1574
backendhard
1572
refactoringmedium
1559
full-stack
1550
full-stackhard
1542
backend
1525
backendmedium
1505
code-review
1489
code-reviewmedium
1467
backendexpert
1457
frontendexpert
1256

All Results

TaskCategoryScore
Build SaaS admin dashboard from scratchfrom-scratch50.1
Split 1100-line god file into proper modulesrefactoring62.7
Implement JWT auth middlewarebackend52.0
Convert React app to PWA with offline supportfrontend52.1
Add file upload with S3 presigned URLsbackend74.2
Implement multi-tenant row-level security in Postgresbackend66.7
Code review: identify security vulnscode-review73.0
Build terminal UI dashboardfrom-scratch49.3
Build production website with auth and members areafrontend56.7
Add retry logic and dead letter queue to Python task queuebackend70.5
Add slash commands and moderation to Discord botbackend78.9
Dockerize Node.js monorepofull-stack75.5
Fix 12 WCAG accessibility violations in checkout formfrontend86.5
Add virtual scrolling to table rendering 5000 rowsfrontend85.6
Implement zero-trust API authentication layerbackend68.8
Write integration tests for payment flowcode-review41.4
Fix React hydration mismatchfrontend38.5
Add WebSocket real-time updatesfull-stack58.7
Harden insecure Docker setup with 12 vulnerabilitiescode-review65.2
Build distributed node cluster with gossip protocolfrom-scratch45.1
Implement background job scheduler with persistencebackend40.2
Add caching layer to eliminate slow SSR page loadsfull-stack85.2
Write complex SQL report with window functionsbackend40.0
Fix hallucination and context window bugs in RAG agentbackend63.8
Fix N+1 query in dashboardbackend69.7
Zero-downtime schema migrationfull-stack48.1
Build real-time portfolio risk calculatorbackend58.6
Refactor monolithic handler to CQRSrefactoring28.3
Optimize bloated React bundle under 500KBfrontend79.7
Build CLI tool with subcommands and configfrom-scratch73.3
Implement transformer inference engine with KV cachefrom-scratch82.6
Fix broken GitHub Actions CI pipelinedebugging88.8
Add GraphQL layer over REST APImulti-language34.6
Implement Stripe webhook handlerbackend66.5
Find and patch all OWASP Top 10 vulnerabilitiesdebugging31.8
Replace console.log with structured loggingrefactoring60.8
Add streaming SSE endpoint for LLM chatbackend82.4
Fix race conditions in order matching enginebackend79.1
Build materialized view refresh pipeline for analyticsbackend72.8
Fix Node.js stream backpressure causing OOM on large filesbackend43.4
Build MCP server for database managementbackend82.4
Build codebase indexer for LLM context windowsfrom-scratch52.9
Fix flaky test suitedebugging93.0
Find and fix 4 hidden backdoors in Flask appdebugging90.9
Add i18n with locale routing to Next.js appfull-stack75.7
Add rate limiting middlewarebackend73.5
Debug and fix 6 broken database triggers and constraintsdebugging88.8
Write tests for untested legacy Flask servicecode-review33.4
Optimize slow Postgres queries in Flask appbackend86.3
Fix auth bypass vulnerabilitydebugging78.5
Debug race condition in worker pooldebugging88.0
Fix broken responsive layoutfrontend75.0
Build RAG pipeline with vector searchbackend37.8
Fix memory leak in event handlerdebugging34.3
Migrate callback-hell Express app to async/awaitrefactoring64.4
Build LLM evaluation harness with structured gradingbackend68.3
Add Redis caching layer to Express APIbackend50.5
Remove AI slop and over-engineering from codebaserefactoring75.3
Port Python CLI to Rustmulti-language52.2
Build REST API from scratchfrom-scratch70.2
Convert React app to PWA with offline supportfrontend66.8
Dockerize Node.js monorepofull-stack69.0
Implement multi-tenant row-level security in Postgresbackend56.5
Remove AI slop and over-engineering from codebaserefactoring84.5
Write Kubernetes manifests for Node.js microservicefull-stack82.3
Implement JWT auth middlewarebackend75.0
Harden insecure Docker setup with 12 vulnerabilitiescode-review78.3
Build codebase indexer for LLM context windowsfrom-scratch35.0
Add caching layer to eliminate slow SSR page loadsfull-stack88.1
Add streaming SSE endpoint for LLM chatbackend72.7
Fix broken responsive layoutfrontend68.8
Add i18n with locale routing to Next.js appfull-stack63.7
Split 1100-line god file into proper modulesrefactoring75.0
Optimize bloated React bundle under 500KBfrontend70.1
Replace console.log with structured loggingrefactoring40.9
Implement zero-trust API authentication layerbackend70.5
Find and patch all OWASP Top 10 vulnerabilitiesdebugging69.4
Add Redis caching layer to Express APIbackend62.7
Implement background job scheduler with persistencebackend73.2
Implement transformer inference engine with KV cachefrom-scratch84.0
Build MCP server for database managementbackend51.9
Build SaaS admin dashboard from scratchfrom-scratch68.0
Build real-time portfolio risk calculatorbackend53.7
Fix hallucination and context window bugs in RAG agentbackend74.1
Build production website with auth and members areafrontend44.6
Build LLM evaluation harness with structured gradingbackend58.0
Build CLI tool with subcommands and configfrom-scratch60.1
Fix race conditions in order matching enginebackend67.2
Build materialized view refresh pipeline for analyticsbackend69.9
Fix deadlocking transaction patterns in Flask appbackend65.5
Debug and fix 6 broken database triggers and constraintsdebugging81.8
Write complex SQL report with window functionsbackend72.1
Fix data integrity bugs in denormalized e-commerce schemadebugging82.2
Build RAG pipeline with vector searchbackend72.8
Find and fix 4 hidden backdoors in Flask appdebugging82.0
Write tests for untested legacy Flask servicecode-review60.5
Fix 12 WCAG accessibility violations in checkout formfrontend81.8
Optimize slow Postgres queries in Flask appbackend85.9
Add retry logic and dead letter queue to Python task queuebackend72.8
Add slash commands and moderation to Discord botbackend81.8
Fix Node.js stream backpressure causing OOM on large filesbackend63.9
Build distributed node cluster with gossip protocolfrom-scratch79.0
Write integration tests for payment flowcode-review68.5
Add GraphQL layer over REST APImulti-language73.0
Fix auth bypass vulnerabilitydebugging95.0
Add rate limiting middlewarebackend78.7
Zero-downtime schema migrationfull-stack76.5
Add cursor-based pagination to REST APIbackend90.0
Fix flaky test suitedebugging83.1
Fix N+1 query in dashboardbackend77.7
Refactor monolithic handler to CQRSrefactoring79.9
Fix memory leak in event handlerdebugging62.0
Fix React hydration mismatchfrontend76.4
Build terminal UI dashboardfrom-scratch70.5
Debug race condition in worker pooldebugging88.8
Build REST API from scratchfrom-scratch86.0