APEX
Back to models

Gemini 3 Flash Preview

Google

1049K context$0.50/M input$3.00/M output
1544peak 1559

Avg Score

72.0

Avg Cost

$0.02

Score/$

4126.4

Runs

46

Win/Loss/Draw

Scoring Dimensions

Score Distribution

Category ELOs

refactoringexpert
2558
from-scratchexpert
2072
debuggingmedium
2010
backendeasy
1964
from-scratcheasy
1758
frontendhard
1756
from-scratchhard
1725
full-stackhard
1699
multi-languagehard
1695
backendmedium
1684
refactoring
1651
from-scratch
1649
full-stack
1629
code-reviewmedium
1603
backend
1541
full-stackmedium
1526
multi-language
1514
code-review
1498
frontend
1488
frontendmedium
1488
backendhard
1475
debugging
1458
backendexpert
1453
refactoringmedium
1428
debuggingexpert
1378
debugginghard
1342
frontendexpert
1136
from-scratchmedium
1004
multi-languageexpert
917
code-reviewhard
188

All Results

TaskCategoryScore
Build codebase indexer for LLM context windowsfrom-scratch60.7
Optimize bloated React bundle under 500KBfrontend72.0
Migrate callback-hell Express app to async/awaitrefactoring72.7
Add WebSocket real-time updatesfull-stack81.0
Port Python CLI to Rustmulti-language51.3
Harden insecure Docker setup with 12 vulnerabilitiescode-review88.8
Write Kubernetes manifests for Node.js microservicefull-stack84.5
Split 1100-line god file into proper modulesrefactoring68.5
Implement transformer inference engine with KV cachefrom-scratch87.7
Build MCP server for database managementbackend70.3
Build SaaS admin dashboard from scratchfrom-scratch75.0
Implement background job scheduler with persistencebackend65.5
Build production website with auth and members areafrontend55.6
Build CLI tool with subcommands and configfrom-scratch48.1
Fix hallucination and context window bugs in RAG agentbackend55.5
Write complex SQL report with window functionsbackend71.9
Find and fix 4 hidden backdoors in Flask appdebugging71.5
Fix 12 WCAG accessibility violations in checkout formfrontend84.8
Fix race conditions in order matching enginebackend78.1
Build real-time portfolio risk calculatorbackend62.4
Build LLM evaluation harness with structured gradingbackend72.0
Fix deadlocking transaction patterns in Flask appbackend44.3
Fix data integrity bugs in denormalized e-commerce schemadebugging78.4
Debug and fix 6 broken database triggers and constraintsdebugging66.6
Add Redis caching layer to Express APIbackend79.0
Optimize slow Postgres queries in Flask appbackend86.3
Add Google OAuth2 login to Express appfull-stack80.9
Write tests for untested legacy Flask servicecode-review47.3
Add retry logic and dead letter queue to Python task queuebackend72.0
Add GraphQL layer over REST APImulti-language73.8
Fix Node.js stream backpressure causing OOM on large filesbackend90.4
Build distributed node cluster with gossip protocolfrom-scratch67.9
Write integration tests for payment flowcode-review35.8
Add rate limiting middlewarebackend82.9
Implement Stripe webhook handlerbackend70.9
Zero-downtime schema migrationfull-stack73.0
Fix flaky test suitedebugging91.2
Add cursor-based pagination to REST APIbackend69.0
Fix N+1 query in dashboardbackend91.7
Fix memory leak in event handlerdebugging66.0
Refactor monolithic handler to CQRSrefactoring80.3
Code review: identify security vulnscode-review88.1
Debug race condition in worker pooldebugging84.0
Fix React hydration mismatchfrontend76.7
Build terminal UI dashboardfrom-scratch51.0
Build REST API from scratchfrom-scratch85.7