Candidate Retrieval

The first and most critical step: finding relevant ads from billions of candidates in milliseconds.

The Scale Problem: Billions of Ads, Milliseconds to Decide

The fundamental challenge:

  • Billions of ads in the inventory
  • Milliseconds to make decisions
  • High recall needed to ensure good candidates aren't missed
  • Low latency required for user experience

This requires sophisticated retrieval techniques that balance speed and accuracy.

Inverted Indexes and Targeting Constraints

Inverted Indexes

Traditional information retrieval technique:

  • Index ads by targeting attributes (demographics, interests, keywords)
  • Fast lookup for matching constraints
  • Efficient for structured targeting

Targeting Constraints

Common constraints include:

  • Geographic location
  • Demographics (age, gender)
  • Interests and behaviors
  • Device type
  • Time of day

Early filtering by constraints dramatically reduces candidate set.

Embedding-Based Retrieval: Two-Tower Models

Two-Tower Architecture

  • User Tower: Encodes user features into embedding
  • Ad Tower: Encodes ad features into embedding
  • Similarity: Cosine similarity or dot product between embeddings

This allows semantic matching beyond exact targeting constraints.

Training

  • Positive pairs: User-ad pairs that resulted in clicks/conversions
  • Negative sampling: Random or hard negatives
  • Loss function: Typically contrastive loss or softmax cross-entropy

Approximate Nearest Neighbor Search at Scale

The Challenge

Exact nearest neighbor search is too slow for billions of vectors. We need approximate methods.

Techniques

  • Locality-Sensitive Hashing (LSH): Hash similar vectors to same buckets
  • Product Quantization: Compress vectors for faster search
  • Hierarchical Navigable Small World (HNSW): Graph-based approximate search
  • IVF (Inverted File Index): Cluster-based indexing

Tradeoffs

  • Recall vs. Latency: More accurate methods are slower
  • Index Size: Larger indices enable better recall but use more memory
  • Update Frequency: How often the index is rebuilt as new ads are added

Balancing Recall and Latency

The Tradeoff

  • Higher recall: More relevant candidates, better final results, but slower
  • Lower latency: Faster response, but may miss good candidates

Strategies

  • Multi-stage Retrieval: Coarse retrieval followed by fine-grained ranking
  • Caching: Cache frequent queries and user embeddings
  • Parallel Processing: Retrieve from multiple indexes in parallel
  • Early Termination: Stop searching once enough candidates found

The goal is to retrieve enough high-quality candidates (typically 100-1000) for downstream ranking while staying within latency budgets.

Content to be expanded...