APEX
Back to models

GPT OSS 120b

OpenRouter

131K context$0.04/M input$0.19/M output
1295peak 1312

Avg Score

61.9

Avg Cost

$0.02

Score/$

3269.9

Runs

65

Win/Loss/Draw

Scoring Dimensions

Score Distribution

Category ELOs

code-reviewhard
1530
multi-languageexpert
1475
debugginghard
1455
backendmedium
1454
debuggingmedium
1427
debugging
1424
debuggingexpert
1403
backend
1335
full-stackhard
1333
backendhard
1302
multi-language
1296
full-stack
1270
frontendhard
1261
from-scratchhard
1230
frontendmedium
1228
from-scratch
1199
frontend
1182
code-review
1139
backendexpert
1120
full-stackmedium
1115
backendeasy
1073
refactoringmedium
1046
refactoring
1027
from-scratchexpert
1003
code-reviewmedium
935
from-scratcheasy
881
multi-languagehard
442
frontendexpert
252
frontendeasy
240
refactoringexpert
0
from-scratchmedium
0

All Results

TaskCategoryScore
Implement JWT auth middlewarebackend55.5
Zero-downtime schema migrationfull-stack63.8
Optimize bloated React bundle under 500KBfrontend64.0
Code review: identify security vulnscode-review30.3
Write complex SQL report with window functionsbackend65.5
Build real-time portfolio risk calculatorbackend53.4
Find and patch all OWASP Top 10 vulnerabilitiesdebugging63.3
Build SaaS admin dashboard from scratchfrom-scratch44.8
Fix Node.js stream backpressure causing OOM on large filesbackend82.4
Port Python CLI to Rustmulti-language53.8
Add file upload with S3 presigned URLsbackend73.8
Fix hallucination and context window bugs in RAG agentbackend31.5
Migrate callback-hell Express app to async/awaitrefactoring55.2
Add Redis caching layer to Express APIbackend78.0
Debug race condition in worker pooldebugging78.3
Implement transformer inference engine with KV cachefrom-scratch70.7
Find and fix 4 hidden backdoors in Flask appdebugging78.8
Harden insecure Docker setup with 12 vulnerabilitiescode-review61.2
Add retry logic and dead letter queue to Python task queuebackend65.8
Add slash commands and moderation to Discord botbackend65.7
Implement zero-trust API authentication layerbackend59.1
Split 1100-line god file into proper modulesrefactoring43.3
Build distributed node cluster with gossip protocolfrom-scratch47.8
Add GraphQL layer over REST APImulti-language48.0
Fix auth bypass vulnerabilitydebugging89.7
Build CLI tool with subcommands and configfrom-scratch63.6
Build LLM evaluation harness with structured gradingbackend38.5
Fix broken GitHub Actions CI pipelinedebugging79.8
Add i18n with locale routing to Next.js appfull-stack55.8
Add cursor-based pagination to REST APIbackend86.5
Fix N+1 query in dashboardbackend64.0
Build production website with auth and members areafrontend48.3
Implement multi-tenant row-level security in Postgresbackend57.9
Write integration tests for payment flowcode-review73.8
Fix data integrity bugs in denormalized e-commerce schemadebugging78.1
Write Kubernetes manifests for Node.js microservicefull-stack73.5
Fix race conditions in order matching enginebackend63.0
Build MCP server for database managementbackend78.8
Build materialized view refresh pipeline for analyticsbackend49.9
Build terminal UI dashboardfrom-scratch33.0
Add caching layer to eliminate slow SSR page loadsfull-stack76.3
Add streaming SSE endpoint for LLM chatbackend84.8
Optimize slow Postgres queries in Flask appbackend70.0
Implement background job scheduler with persistencebackend53.1
Fix flaky test suitedebugging81.3
Dockerize Node.js monorepofull-stack58.8
Convert React app to PWA with offline supportfrontend58.6
Fix memory leak in event handlerdebugging84.1
Add virtual scrolling to table rendering 5000 rowsfrontend41.5
Add Google OAuth2 login to Express appfull-stack78.9
Implement Stripe webhook handlerbackend82.8
Fix broken responsive layoutfrontend50.0
Build codebase indexer for LLM context windowsfrom-scratch40.9
Remove AI slop and over-engineering from codebaserefactoring65.8
Debug and fix 6 broken database triggers and constraintsdebugging82.0
Replace console.log with structured loggingrefactoring50.0
Add rate limiting middlewarebackend63.8
Write tests for untested legacy Flask servicecode-review35.2
Add WebSocket real-time updatesfull-stack70.7
Fix 12 WCAG accessibility violations in checkout formfrontend75.0
Build RAG pipeline with vector searchbackend45.4
Fix React hydration mismatchfrontend77.2
Refactor monolithic handler to CQRSrefactoring28.6
Fix deadlocking transaction patterns in Flask appbackend33.3
Build REST API from scratchfrom-scratch68.0