APEX
Back to models

Devstral 2512

OpenRouter

262K context$0.05/M input$0.22/M output
1400peak 1412

Avg Score

62.8

Avg Cost

$0.10

Score/$

603.7

Runs

111

Win/Loss/Draw

Scoring Dimensions

Score Distribution

Category ELOs

multi-languageexpert
2183
from-scratchmedium
1922
refactoringexpert
1712
multi-languagehard
1695
debuggingmedium
1681
multi-language
1628
backendeasy
1616
frontendhard
1590
from-scratchexpert
1578
frontendexpert
1544
from-scratcheasy
1534
frontendeasy
1501
debugging
1491
refactoring
1480
debuggingexpert
1474
debugginghard
1473
refactoringmedium
1428
from-scratchhard
1422
from-scratch
1419
frontend
1417
full-stackhard
1393
backendhard
1386
full-stack
1377
backendexpert
1373
frontendmedium
1359
full-stackmedium
1353
code-review
1346
backend
1342
code-reviewmedium
1342
backendmedium
1244
code-reviewhard
966

All Results

TaskCategoryScore
Fix hallucination and context window bugs in RAG agentbackend53.4
Fix deadlocking transaction patterns in Flask appbackend72.5
Build terminal UI dashboardfrom-scratch59.5
Debug race condition in worker pooldebugging87.5
Fix memory leak in event handlerdebugging78.2
Write tests for untested legacy Flask servicecode-review39.1
Find and fix 4 hidden backdoors in Flask appdebugging78.7
Build MCP server for database managementbackend80.5
Build real-time portfolio risk calculatorbackend38.8
Add caching layer to eliminate slow SSR page loadsfull-stack77.8
Build distributed node cluster with gossip protocolfrom-scratch22.4
Add streaming SSE endpoint for LLM chatbackend80.0
Build materialized view refresh pipeline for analyticsbackend75.8
Implement zero-trust API authentication layerbackend69.3
Debug and fix 6 broken database triggers and constraintsdebugging88.0
Write Kubernetes manifests for Node.js microservicefull-stack77.7
Code review: identify security vulnscode-review76.0
Add Google OAuth2 login to Express appfull-stack72.9
Convert React app to PWA with offline supportfrontend65.8
Write integration tests for payment flowcode-review46.3
Optimize bloated React bundle under 500KBfrontend72.0
Fix auth bypass vulnerabilitydebugging67.5
Find and patch all OWASP Top 10 vulnerabilitiesdebugging69.5
Replace console.log with structured loggingrefactoring80.9
Fix race conditions in order matching enginebackend72.2
Add slash commands and moderation to Discord botbackend69.0
Split 1100-line god file into proper modulesrefactoring57.6
Add rate limiting middlewarebackend75.0
Dockerize Node.js monorepofull-stack46.5
Fix Node.js stream backpressure causing OOM on large filesbackend36.0
Add file upload with S3 presigned URLsbackend43.5
Build RAG pipeline with vector searchbackend46.0
Fix broken responsive layoutfrontend68.3
Add retry logic and dead letter queue to Python task queuebackend60.2
Add WebSocket real-time updatesfull-stack75.7
Build SaaS admin dashboard from scratchfrom-scratch49.9
Add Redis caching layer to Express APIbackend44.5
Migrate callback-hell Express app to async/awaitrefactoring59.4
Add cursor-based pagination to REST APIbackend67.3
Fix N+1 query in dashboardbackend47.4
Write complex SQL report with window functionsbackend59.5
Port Python CLI to Rustmulti-language47.1
Implement transformer inference engine with KV cachefrom-scratch76.7
Build codebase indexer for LLM context windowsfrom-scratch47.1
Fix React hydration mismatchfrontend52.5
Implement multi-tenant row-level security in Postgresbackend57.5
Fix data integrity bugs in denormalized e-commerce schemadebugging75.2
Build CLI tool with subcommands and configfrom-scratch61.4
Harden insecure Docker setup with 12 vulnerabilitiescode-review74.0
Build REST API from scratchfrom-scratch72.8
Fix broken GitHub Actions CI pipelinedebugging92.0
Optimize slow Postgres queries in Flask appbackend65.0
Add i18n with locale routing to Next.js appfull-stack67.5
Fix 12 WCAG accessibility violations in checkout formfrontend75.0
Remove AI slop and over-engineering from codebaserefactoring74.0
Implement Stripe webhook handlerbackend47.9
Find and patch all OWASP Top 10 vulnerabilitiesdebugging67.4
Replace console.log with structured loggingrefactoring49.4
Implement multi-tenant row-level security in Postgresbackend73.7
Convert React app to PWA with offline supportfrontend60.5
Add caching layer to eliminate slow SSR page loadsfull-stack73.8
Build codebase indexer for LLM context windowsfrom-scratch28.5
Split 1100-line god file into proper modulesrefactoring67.1
Add i18n with locale routing to Next.js appfull-stack63.5
Remove AI slop and over-engineering from codebaserefactoring68.0
Fix broken responsive layoutfrontend67.4
Harden insecure Docker setup with 12 vulnerabilitiescode-review76.5
Write Kubernetes manifests for Node.js microservicefull-stack74.7
Implement JWT auth middlewarebackend64.4
Add streaming SSE endpoint for LLM chatbackend62.9
Dockerize Node.js monorepofull-stack69.7
Optimize bloated React bundle under 500KBfrontend77.4
Implement zero-trust API authentication layerbackend68.9
Build production website with auth and members areafrontend63.3
Build SaaS admin dashboard from scratchfrom-scratch70.5
Build LLM evaluation harness with structured gradingbackend46.0
Implement background job scheduler with persistencebackend59.1
Build MCP server for database managementbackend45.0
Implement transformer inference engine with KV cachefrom-scratch66.1
Build CLI tool with subcommands and configfrom-scratch40.2
Build real-time portfolio risk calculatorbackend52.5
Fix hallucination and context window bugs in RAG agentbackend65.7
Build materialized view refresh pipeline for analyticsbackend66.6
Fix race conditions in order matching enginebackend36.7
Write complex SQL report with window functionsbackend59.7
Fix data integrity bugs in denormalized e-commerce schemadebugging63.5
Fix deadlocking transaction patterns in Flask appbackend51.4
Debug and fix 6 broken database triggers and constraintsdebugging64.6
Write tests for untested legacy Flask servicecode-review48.2
Find and fix 4 hidden backdoors in Flask appdebugging81.0
Add slash commands and moderation to Discord botbackend51.3
Fix 12 WCAG accessibility violations in checkout formfrontend80.0
Add retry logic and dead letter queue to Python task queuebackend58.3
Optimize slow Postgres queries in Flask appbackend65.0
Write integration tests for payment flowcode-review56.8
Add GraphQL layer over REST APImulti-language73.7
Add virtual scrolling to table rendering 5000 rowsfrontend65.8
Fix Node.js stream backpressure causing OOM on large filesbackend58.0
Build distributed node cluster with gossip protocolfrom-scratch24.3
Fix auth bypass vulnerabilitydebugging88.7
Implement Stripe webhook handlerbackend50.0
Build terminal UI dashboardfrom-scratch42.5
Zero-downtime schema migrationfull-stack57.0
Add rate limiting middlewarebackend46.5
Fix flaky test suitedebugging69.8
Fix React hydration mismatchfrontend43.8
Refactor monolithic handler to CQRSrefactoring66.5
Fix N+1 query in dashboardbackend61.5
Fix memory leak in event handlerdebugging64.7
Build REST API from scratchfrom-scratch71.5
Debug race condition in worker pooldebugging89.2