APEX
Back to models

GPT OSS 20b

OpenRouter

131K context$0.03/M input$0.14/M output
1265peak 1270

Avg Score

53.3

Avg Cost

$0.11

Score/$

490.2

Runs

114

Win/Loss/Draw

Scoring Dimensions

Score Distribution

Category ELOs

multi-languagehard
2101
multi-language
1554
debuggingmedium
1522
backendeasy
1407
debuggingexpert
1403
debugging
1378
code-reviewhard
1367
backendhard
1360
backend
1305
backendmedium
1300
frontendmedium
1265
full-stackhard
1262
debugginghard
1244
code-review
1207
backendexpert
1197
full-stack
1177
frontend
1163
code-reviewmedium
1097
from-scratch
1089
from-scratchexpert
1024
refactoringmedium
999
full-stackmedium
999
from-scratchhard
986
refactoring
937
from-scratchmedium
818
multi-languageexpert
677
frontendhard
667
frontendexpert
357
frontendeasy
10
from-scratcheasy
0
refactoringexpert
0

All Results

TaskCategoryScore
Build distributed node cluster with gossip protocolfrom-scratch28.6
Split 1100-line god file into proper modulesrefactoring22.9
Find and patch all OWASP Top 10 vulnerabilitiesdebugging56.1
Write integration tests for payment flowcode-review74.2
Add streaming SSE endpoint for LLM chatbackend82.8
Add Google OAuth2 login to Express appfull-stack68.0
Migrate callback-hell Express app to async/awaitrefactoring50.6
Implement background job scheduler with persistencebackend0.6
Write Kubernetes manifests for Node.js microservicefull-stack61.2
Build REST API from scratchfrom-scratch52.4
Add i18n with locale routing to Next.js appfull-stack35.6
Build LLM evaluation harness with structured gradingbackend44.0
Find and fix 4 hidden backdoors in Flask appdebugging65.3
Fix broken GitHub Actions CI pipelinedebugging82.2
Add file upload with S3 presigned URLsbackend69.0
Add caching layer to eliminate slow SSR page loadsfull-stack62.0
Fix race conditions in order matching enginebackend67.0
Replace console.log with structured loggingrefactoring43.3
Fix hallucination and context window bugs in RAG agentbackend57.3
Build SaaS admin dashboard from scratchfrom-scratch36.6
Refactor monolithic handler to CQRSrefactoring28.0
Build terminal UI dashboardfrom-scratch48.1
Add GraphQL layer over REST APImulti-language85.0
Fix auth bypass vulnerabilitydebugging34.5
Fix data integrity bugs in denormalized e-commerce schemadebugging78.9
Fix Node.js stream backpressure causing OOM on large filesbackend87.0
Port Python CLI to Rustmulti-language48.8
Build RAG pipeline with vector searchbackend39.0
Fix broken responsive layoutfrontend52.7
Add Redis caching layer to Express APIbackend51.0
Harden insecure Docker setup with 12 vulnerabilitiescode-review64.2
Fix React hydration mismatchfrontend59.0
Implement transformer inference engine with KV cachefrom-scratch66.7
Debug race condition in worker pooldebugging80.5
Build materialized view refresh pipeline for analyticsbackend54.8
Write tests for untested legacy Flask servicecode-review15.1
Implement JWT auth middlewarebackend47.5
Add slash commands and moderation to Discord botbackend58.0
Code review: identify security vulnscode-review76.3
Add cursor-based pagination to REST APIbackend80.0
Add rate limiting middlewarebackend63.9
Implement multi-tenant row-level security in Postgresbackend24.2
Implement Stripe webhook handlerbackend84.3
Implement zero-trust API authentication layerbackend58.8
Build MCP server for database managementbackend70.6
Fix flaky test suitedebugging73.0
Debug and fix 6 broken database triggers and constraintsdebugging79.3
Dockerize Node.js monorepofull-stack53.2
Optimize bloated React bundle under 500KBfrontend55.3
Build production website with auth and members areafrontend42.3
Fix memory leak in event handlerdebugging78.9
Add retry logic and dead letter queue to Python task queuebackend69.3
Build real-time portfolio risk calculatorbackend45.9
Add virtual scrolling to table rendering 5000 rowsfrontend80.3
Build CLI tool with subcommands and configfrom-scratch29.8
Convert React app to PWA with offline supportfrontend50.9
Fix N+1 query in dashboardbackend52.4
Optimize slow Postgres queries in Flask appbackend73.4
Zero-downtime schema migrationfull-stack58.9
Fix deadlocking transaction patterns in Flask appbackend57.7
Fix 12 WCAG accessibility violations in checkout formfrontend67.2
Remove AI slop and over-engineering from codebaserefactoring66.6
Build codebase indexer for LLM context windowsfrom-scratch37.5
Write complex SQL report with window functionsbackend47.3
Add WebSocket real-time updatesfull-stack73.8
Split 1100-line god file into proper modulesrefactoring52.6
Remove AI slop and over-engineering from codebaserefactoring61.8
Optimize bloated React bundle under 500KBfrontend61.9
Fix broken responsive layoutfrontend52.0
Replace console.log with structured loggingrefactoring41.5
Build codebase indexer for LLM context windowsfrom-scratch16.6
Implement JWT auth middlewarebackend23.6
Find and patch all OWASP Top 10 vulnerabilitiesdebugging48.5
Implement multi-tenant row-level security in Postgresbackend57.0
Harden insecure Docker setup with 12 vulnerabilitiescode-review58.6
Convert React app to PWA with offline supportfrontend50.4
Add i18n with locale routing to Next.js appfull-stack3.5
Add caching layer to eliminate slow SSR page loadsfull-stack66.5
Dockerize Node.js monorepofull-stack54.0
Implement zero-trust API authentication layerbackend54.6
Add streaming SSE endpoint for LLM chatbackend63.0
Write Kubernetes manifests for Node.js microservicefull-stack77.0
Implement transformer inference engine with KV cachefrom-scratch43.6
Add slash commands and moderation to Discord botbackend61.8
Optimize slow Postgres queries in Flask appbackend76.5
Zero-downtime schema migrationfull-stack63.8
Build distributed node cluster with gossip protocolfrom-scratch12.4
Fix hallucination and context window bugs in RAG agentbackend19.8
Implement background job scheduler with persistencebackend3.1
Write complex SQL report with window functionsbackend71.0
Build LLM evaluation harness with structured gradingbackend53.4
Build RAG pipeline with vector searchbackend33.9
Write integration tests for payment flowcode-review68.5
Fix data integrity bugs in denormalized e-commerce schemadebugging71.9
Fix flaky test suitedebugging58.0
Build terminal UI dashboardfrom-scratch29.6
Fix React hydration mismatchfrontend64.5
Build REST API from scratchfrom-scratch46.4
Fix N+1 query in dashboardbackend36.5
Find and fix 4 hidden backdoors in Flask appdebugging33.6
Build SaaS admin dashboard from scratchfrom-scratch19.5
Write tests for untested legacy Flask servicecode-review9.8
Add virtual scrolling to table rendering 5000 rowsfrontend38.3
Add rate limiting middlewarebackend71.8
Build CLI tool with subcommands and configfrom-scratch20.0
Debug and fix 6 broken database triggers and constraintsdebugging76.2
Add retry logic and dead letter queue to Python task queuebackend47.6
Debug race condition in worker pooldebugging76.0
Build real-time portfolio risk calculatorbackend51.3
Fix race conditions in order matching enginebackend59.1
Build MCP server for database managementbackend72.0
Fix deadlocking transaction patterns in Flask appbackend57.5
Build production website with auth and members areafrontend30.8
Build materialized view refresh pipeline for analyticsbackend46.4