Machine learning at scale 🤖

← Back to book index

End-to-End Flow

Understanding the complete journey from user request to served ad is critical for building efficient systems.

The 100-Millisecond Journey: From Page Load to Served Ad

The entire ad serving process must complete in roughly 100 milliseconds:

User Request: Page load or content request triggers ad request
Ad Request: Platform receives request with user context
Candidate Retrieval: Billions of ads filtered to thousands of candidates
Filtering: Hard constraints applied (targeting, policy, eligibility)
Prediction: ML models predict CTR, CVR, and other signals
Ranking: Candidates scored and ranked
Auction: Winners selected and prices determined
Serving: Ad creative retrieved and served to user
Logging: All signals captured for model training and optimization

Latency Budgets and the Critical Path

Every millisecond matters:

Network Latency: 20-40ms for request/response
Retrieval: 10-20ms to fetch candidates
ML Inference: 20-40ms for predictions
Auction: 5-10ms for ranking and selection
Serving: 5-10ms for creative retrieval

Optimizing the critical path is essential for meeting latency targets.

The Three Planes: Real-Time Serving, Near-Real-Time Streaming, Batch Processing

Real-Time Serving Plane

The request-response path that must complete in <100ms:

Candidate retrieval
Real-time predictions
Auction execution
Ad serving

Near-Real-Time Streaming Plane

Processing that happens within seconds to minutes:

Feature updates (user behavior, recent clicks)
Budget pacing adjustments
Frequency cap updates
Real-time model scoring updates

Batch Processing Plane

Offline processing that happens hourly or daily:

Model training
Feature engineering
Historical analysis
Reporting and optimization

Understanding which operations belong in which plane is crucial for system design.

Content to be expanded...