APEX
Back to models

GPT OSS 20b

OpenRouter

131K context$0.03/M input$0.14/M output
1220peak 1235

Avg Score

57.5

Avg Cost

<$0.01

Score/$

6031.5

Runs

65

Win/Loss/Draw

Scoring Dimensions

Score Distribution

Category ELOs

multi-languagehard
2085
multi-language
1577
code-reviewhard
1530
debuggingmedium
1360
backendmedium
1350
debugging
1309
debuggingexpert
1303
debugginghard
1251
backend
1250
code-review
1230
frontendmedium
1226
backendhard
1218
full-stackhard
1202
multi-languageexpert
1155
frontend
1140
code-reviewmedium
1122
full-stack
1113
backendeasy
1073
backendexpert
1064
from-scratch
1061
from-scratchhard
957
from-scratchexpert
936
refactoringmedium
901
refactoring
890
frontendhard
805
full-stackmedium
727
from-scratchmedium
710
from-scratcheasy
449
frontendeasy
240
frontendexpert
22
refactoringexpert
0

All Results

TaskCategoryScore
Build distributed node cluster with gossip protocolfrom-scratch33.8
Split 1100-line god file into proper modulesrefactoring22.9
Find and patch all OWASP Top 10 vulnerabilitiesdebugging56.1
Write integration tests for payment flowcode-review74.2
Add streaming SSE endpoint for LLM chatbackend82.8
Add Google OAuth2 login to Express appfull-stack68.0
Migrate callback-hell Express app to async/awaitrefactoring50.6
Implement background job scheduler with persistencebackend0.6
Write Kubernetes manifests for Node.js microservicefull-stack61.2
Build REST API from scratchfrom-scratch63.0
Add i18n with locale routing to Next.js appfull-stack35.6
Build LLM evaluation harness with structured gradingbackend44.0
Find and fix 4 hidden backdoors in Flask appdebugging65.3
Fix broken GitHub Actions CI pipelinedebugging82.2
Add file upload with S3 presigned URLsbackend69.0
Add caching layer to eliminate slow SSR page loadsfull-stack62.0
Fix race conditions in order matching enginebackend67.0
Replace console.log with structured loggingrefactoring43.3
Fix hallucination and context window bugs in RAG agentbackend57.3
Build SaaS admin dashboard from scratchfrom-scratch36.6
Refactor monolithic handler to CQRSrefactoring28.0
Build terminal UI dashboardfrom-scratch50.5
Add GraphQL layer over REST APImulti-language85.0
Fix auth bypass vulnerabilitydebugging34.5
Fix data integrity bugs in denormalized e-commerce schemadebugging78.9
Fix Node.js stream backpressure causing OOM on large filesbackend87.0
Port Python CLI to Rustmulti-language48.8
Build RAG pipeline with vector searchbackend39.0
Fix broken responsive layoutfrontend52.7
Add Redis caching layer to Express APIbackend51.0
Harden insecure Docker setup with 12 vulnerabilitiescode-review64.2
Fix React hydration mismatchfrontend59.0
Implement transformer inference engine with KV cachefrom-scratch66.7
Debug race condition in worker pooldebugging80.5
Build materialized view refresh pipeline for analyticsbackend54.8
Write tests for untested legacy Flask servicecode-review15.1
Implement JWT auth middlewarebackend47.5
Add slash commands and moderation to Discord botbackend58.0
Code review: identify security vulnscode-review76.3
Add cursor-based pagination to REST APIbackend80.0
Add rate limiting middlewarebackend63.9
Implement multi-tenant row-level security in Postgresbackend24.2
Implement Stripe webhook handlerbackend84.3
Implement zero-trust API authentication layerbackend58.8
Build MCP server for database managementbackend70.6
Fix flaky test suitedebugging73.0
Debug and fix 6 broken database triggers and constraintsdebugging79.3
Dockerize Node.js monorepofull-stack53.2
Optimize bloated React bundle under 500KBfrontend55.3
Build production website with auth and members areafrontend42.3
Fix memory leak in event handlerdebugging78.9
Add retry logic and dead letter queue to Python task queuebackend69.3
Build real-time portfolio risk calculatorbackend45.9
Add virtual scrolling to table rendering 5000 rowsfrontend80.3
Build CLI tool with subcommands and configfrom-scratch39.8
Convert React app to PWA with offline supportfrontend50.9
Fix N+1 query in dashboardbackend52.4
Optimize slow Postgres queries in Flask appbackend73.4
Zero-downtime schema migrationfull-stack58.9
Fix deadlocking transaction patterns in Flask appbackend57.7
Fix 12 WCAG accessibility violations in checkout formfrontend67.2
Remove AI slop and over-engineering from codebaserefactoring66.6
Build codebase indexer for LLM context windowsfrom-scratch37.5
Write complex SQL report with window functionsbackend47.3
Add WebSocket real-time updatesfull-stack73.8