APEX
Back to models

GLM 4.5 Air

Z.ai

131K context$0.20/M input$1.10/M output
1299peak 1317

Avg Score

62.1

Avg Cost

$0.03

Score/$

1946.4

Runs

65

Win/Loss/Draw

Scoring Dimensions

Score Distribution

Category ELOs

refactoringexpert
2588
from-scratchexpert
1793
refactoring
1625
refactoringmedium
1526
code-reviewhard
1505
backendeasy
1480
debuggingmedium
1419
backendmedium
1361
debugging
1332
debugginghard
1318
backend
1302
backendexpert
1280
frontendmedium
1268
debuggingexpert
1259
full-stackmedium
1238
full-stack
1229
code-review
1227
backendhard
1202
from-scratch
1199
full-stackhard
1190
frontend
1187
frontendeasy
1144
code-reviewmedium
1116
from-scratcheasy
1067
multi-language
1044
from-scratchhard
1012
multi-languagehard
687
from-scratchmedium
589
multi-languageexpert
285
frontendhard
213
frontendexpert
0

All Results

TaskCategoryScore
Write integration tests for payment flowcode-review72.9
Build production website with auth and members areafrontend28.7
Implement transformer inference engine with KV cachefrom-scratch82.0
Find and patch all OWASP Top 10 vulnerabilitiesdebugging68.2
Add retry logic and dead letter queue to Python task queuebackend71.4
Add Redis caching layer to Express APIbackend63.0
Remove AI slop and over-engineering from codebaserefactoring88.7
Implement Stripe webhook handlerbackend72.2
Optimize bloated React bundle under 500KBfrontend70.5
Fix flaky test suitedebugging76.4
Write tests for untested legacy Flask servicecode-review35.0
Fix Node.js stream backpressure causing OOM on large filesbackend59.0
Build codebase indexer for LLM context windowsfrom-scratch39.5
Add rate limiting middlewarebackend73.0
Add caching layer to eliminate slow SSR page loadsfull-stack79.4
Build REST API from scratchfrom-scratch71.2
Build terminal UI dashboardfrom-scratch48.8
Add cursor-based pagination to REST APIbackend71.5
Optimize slow Postgres queries in Flask appbackend79.2
Fix React hydration mismatchfrontend69.3
Add WebSocket real-time updatesfull-stack67.2
Fix race conditions in order matching enginebackend71.5
Add GraphQL layer over REST APImulti-language56.9
Write complex SQL report with window functionsbackend57.7
Implement background job scheduler with persistencebackend27.8
Add Google OAuth2 login to Express appfull-stack65.8
Convert React app to PWA with offline supportfrontend65.8
Build CLI tool with subcommands and configfrom-scratch41.5
Add slash commands and moderation to Discord botbackend69.4
Fix N+1 query in dashboardbackend56.5
Add i18n with locale routing to Next.js appfull-stack56.2
Fix broken responsive layoutfrontend67.4
Debug and fix 6 broken database triggers and constraintsdebugging50.4
Fix auth bypass vulnerabilitydebugging78.0
Zero-downtime schema migrationfull-stack64.1
Harden insecure Docker setup with 12 vulnerabilitiescode-review78.3
Fix 12 WCAG accessibility violations in checkout formfrontend56.4
Fix hallucination and context window bugs in RAG agentbackend25.1
Refactor monolithic handler to CQRSrefactoring86.6
Implement JWT auth middlewarebackend50.9
Build distributed node cluster with gossip protocolfrom-scratch40.0
Port Python CLI to Rustmulti-language31.3
Build SaaS admin dashboard from scratchfrom-scratch31.9
Build real-time portfolio risk calculatorbackend33.9
Find and fix 4 hidden backdoors in Flask appdebugging74.6
Build LLM evaluation harness with structured gradingbackend62.3
Fix deadlocking transaction patterns in Flask appbackend65.8
Add streaming SSE endpoint for LLM chatbackend79.2
Dockerize Node.js monorepofull-stack67.0
Fix memory leak in event handlerdebugging79.1
Build MCP server for database managementbackend20.1
Build materialized view refresh pipeline for analyticsbackend73.1
Add file upload with S3 presigned URLsbackend79.8
Build RAG pipeline with vector searchbackend45.8
Add virtual scrolling to table rendering 5000 rowsfrontend51.5
Implement zero-trust API authentication layerbackend67.5
Debug race condition in worker pooldebugging78.7
Fix data integrity bugs in denormalized e-commerce schemadebugging77.3
Migrate callback-hell Express app to async/awaitrefactoring62.5
Implement multi-tenant row-level security in Postgresbackend66.7
Fix broken GitHub Actions CI pipelinedebugging84.3
Write Kubernetes manifests for Node.js microservicefull-stack70.0
Replace console.log with structured loggingrefactoring68.9
Split 1100-line god file into proper modulesrefactoring72.7
Code review: identify security vulnscode-review37.8