APEX
Back to models

GPT OSS 120b

OpenRouter

131K context$0.04/M input$0.19/M output
1378peak 1384

Avg Score

58.9

Avg Cost

$0.12

Score/$

481.7

Runs

116

Win/Loss/Draw

Scoring Dimensions

Score Distribution

Category ELOs

code-reviewhard
1876
debuggingmedium
1582
backendhard
1565
backend
1468
backendmedium
1468
debugging
1461
debuggingexpert
1452
from-scratchhard
1448
debugginghard
1418
full-stackhard
1407
backendexpert
1385
from-scratch
1329
frontendmedium
1294
full-stack
1293
frontend
1237
frontendhard
1233
code-review
1200
multi-language
1188
multi-languageexpert
1110
full-stackmedium
1101
from-scratchexpert
1024
refactoringmedium
1023
refactoring
958
backendeasy
923
code-reviewmedium
856
from-scratchmedium
818
frontendexpert
529
from-scratcheasy
171
multi-languagehard
152
frontendeasy
10
refactoringexpert
0

All Results

TaskCategoryScore
Implement JWT auth middlewarebackend55.5
Zero-downtime schema migrationfull-stack63.8
Optimize bloated React bundle under 500KBfrontend64.0
Code review: identify security vulnscode-review30.3
Write complex SQL report with window functionsbackend65.5
Build real-time portfolio risk calculatorbackend53.4
Find and patch all OWASP Top 10 vulnerabilitiesdebugging63.3
Build SaaS admin dashboard from scratchfrom-scratch44.8
Fix Node.js stream backpressure causing OOM on large filesbackend82.4
Port Python CLI to Rustmulti-language53.8
Add file upload with S3 presigned URLsbackend73.8
Fix hallucination and context window bugs in RAG agentbackend31.5
Migrate callback-hell Express app to async/awaitrefactoring55.2
Add Redis caching layer to Express APIbackend78.0
Debug race condition in worker pooldebugging78.3
Implement transformer inference engine with KV cachefrom-scratch70.7
Find and fix 4 hidden backdoors in Flask appdebugging78.8
Harden insecure Docker setup with 12 vulnerabilitiescode-review61.2
Add retry logic and dead letter queue to Python task queuebackend65.8
Add slash commands and moderation to Discord botbackend65.7
Implement zero-trust API authentication layerbackend59.1
Split 1100-line god file into proper modulesrefactoring43.3
Build distributed node cluster with gossip protocolfrom-scratch33.8
Add GraphQL layer over REST APImulti-language48.0
Fix auth bypass vulnerabilitydebugging89.7
Build CLI tool with subcommands and configfrom-scratch63.6
Build LLM evaluation harness with structured gradingbackend38.5
Fix broken GitHub Actions CI pipelinedebugging79.8
Add i18n with locale routing to Next.js appfull-stack55.8
Add cursor-based pagination to REST APIbackend86.5
Fix N+1 query in dashboardbackend64.0
Build production website with auth and members areafrontend48.3
Implement multi-tenant row-level security in Postgresbackend57.9
Write integration tests for payment flowcode-review81.5
Fix data integrity bugs in denormalized e-commerce schemadebugging78.1
Write Kubernetes manifests for Node.js microservicefull-stack73.5
Fix race conditions in order matching enginebackend63.0
Build MCP server for database managementbackend78.8
Build materialized view refresh pipeline for analyticsbackend49.9
Build terminal UI dashboardfrom-scratch27.8
Add caching layer to eliminate slow SSR page loadsfull-stack76.3
Add streaming SSE endpoint for LLM chatbackend84.8
Optimize slow Postgres queries in Flask appbackend70.0
Implement background job scheduler with persistencebackend53.1
Fix flaky test suitedebugging81.3
Dockerize Node.js monorepofull-stack58.8
Convert React app to PWA with offline supportfrontend58.6
Fix memory leak in event handlerdebugging84.1
Add virtual scrolling to table rendering 5000 rowsfrontend41.5
Add Google OAuth2 login to Express appfull-stack78.9
Implement Stripe webhook handlerbackend82.8
Fix broken responsive layoutfrontend50.0
Build codebase indexer for LLM context windowsfrom-scratch40.9
Remove AI slop and over-engineering from codebaserefactoring65.8
Debug and fix 6 broken database triggers and constraintsdebugging82.0
Replace console.log with structured loggingrefactoring50.0
Add rate limiting middlewarebackend63.8
Write tests for untested legacy Flask servicecode-review35.2
Add WebSocket real-time updatesfull-stack70.7
Fix 12 WCAG accessibility violations in checkout formfrontend75.0
Build RAG pipeline with vector searchbackend45.4
Fix React hydration mismatchfrontend77.2
Refactor monolithic handler to CQRSrefactoring28.6
Fix deadlocking transaction patterns in Flask appbackend33.3
Build REST API from scratchfrom-scratch61.4
Dockerize Node.js monorepofull-stack53.4
Split 1100-line god file into proper modulesrefactoring10.3
Optimize bloated React bundle under 500KBfrontend68.5
Implement multi-tenant row-level security in Postgresbackend64.3
Remove AI slop and over-engineering from codebaserefactoring62.1
Harden insecure Docker setup with 12 vulnerabilitiescode-review62.5
Replace console.log with structured loggingrefactoring33.9
Convert React app to PWA with offline supportfrontend42.5
Find and patch all OWASP Top 10 vulnerabilitiesdebugging61.7
Fix broken responsive layoutfrontend54.7
Implement zero-trust API authentication layerbackend58.5
Write Kubernetes manifests for Node.js microservicefull-stack72.5
Build codebase indexer for LLM context windowsfrom-scratch28.1
Add caching layer to eliminate slow SSR page loadsfull-stack53.8
Add i18n with locale routing to Next.js appfull-stack49.8
Implement JWT auth middlewarebackend58.0
Build REST API from scratchfrom-scratch25.8
Build SaaS admin dashboard from scratchfrom-scratch51.5
Fix race conditions in order matching enginebackend76.8
Fix N+1 query in dashboardbackend77.5
Write integration tests for payment flowcode-review73.8
Add rate limiting middlewarebackend52.7
Build distributed node cluster with gossip protocolfrom-scratch58.9
Fix React hydration mismatchfrontend69.5
Write complex SQL report with window functionsbackend80.2
Fix hallucination and context window bugs in RAG agentbackend71.5
Fix auth bypass vulnerabilitydebugging54.2
Build real-time portfolio risk calculatorbackend52.8
Optimize slow Postgres queries in Flask appbackend74.2
Implement transformer inference engine with KV cachefrom-scratch17.6
Fix data integrity bugs in denormalized e-commerce schemadebugging71.4
Build RAG pipeline with vector searchbackend42.5
Implement Stripe webhook handlerbackend42.7
Build LLM evaluation harness with structured gradingbackend60.7
Add virtual scrolling to table rendering 5000 rowsfrontend52.9
Refactor monolithic handler to CQRSrefactoring26.8
Implement background job scheduler with persistencebackend44.5
Find and fix 4 hidden backdoors in Flask appdebugging61.8
Build production website with auth and members areafrontend43.6
Build CLI tool with subcommands and configfrom-scratch61.7
Fix flaky test suitedebugging55.3
Debug and fix 6 broken database triggers and constraintsdebugging53.3
Build MCP server for database managementbackend79.2
Zero-downtime schema migrationfull-stack63.2
Fix deadlocking transaction patterns in Flask appbackend75.0
Add retry logic and dead letter queue to Python task queuebackend68.0
Build materialized view refresh pipeline for analyticsbackend59.4
Write tests for untested legacy Flask servicecode-review40.3
Add slash commands and moderation to Discord botbackend67.5
Build terminal UI dashboardfrom-scratch48.8
Debug race condition in worker pooldebugging71.5