End-to-End Flow

Understanding the complete journey from user request to served ad is critical for building efficient systems.

The 100-Millisecond Journey: From Page Load to Served Ad

The entire ad serving process must complete in roughly 100 milliseconds:

  1. User Request: Page load or content request triggers ad request
  2. Ad Request: Platform receives request with user context
  3. Candidate Retrieval: Billions of ads filtered to thousands of candidates
  4. Filtering: Hard constraints applied (targeting, policy, eligibility)
  5. Prediction: ML models predict CTR, CVR, and other signals
  6. Ranking: Candidates scored and ranked
  7. Auction: Winners selected and prices determined
  8. Serving: Ad creative retrieved and served to user
  9. Logging: All signals captured for model training and optimization

Latency Budgets and the Critical Path

Every millisecond matters:

  • Network Latency: 20-40ms for request/response
  • Retrieval: 10-20ms to fetch candidates
  • ML Inference: 20-40ms for predictions
  • Auction: 5-10ms for ranking and selection
  • Serving: 5-10ms for creative retrieval

Optimizing the critical path is essential for meeting latency targets.

The Three Planes: Real-Time Serving, Near-Real-Time Streaming, Batch Processing

Real-Time Serving Plane

The request-response path that must complete in <100ms:

  • Candidate retrieval
  • Real-time predictions
  • Auction execution
  • Ad serving

Near-Real-Time Streaming Plane

Processing that happens within seconds to minutes:

  • Feature updates (user behavior, recent clicks)
  • Budget pacing adjustments
  • Frequency cap updates
  • Real-time model scoring updates

Batch Processing Plane

Offline processing that happens hourly or daily:

  • Model training
  • Feature engineering
  • Historical analysis
  • Reporting and optimization

Understanding which operations belong in which plane is crucial for system design.

Content to be expanded...