Machine learning at scale 🤖

← Back to book index

Ranking and Scoring

Converting predictions into a ranked list that optimizes for multiple objectives.

From Predictions to a Ranked List

Ranking transforms ML predictions into an ordered list:

Predictions: CTR, CVR, quality scores from models
Scoring: Combine predictions with bids and other signals
Ranking: Sort by score to create ordered list
Selection: Top-k ads proceed to auction

The scoring function determines which ads rank highest.

The Role of Bid in Scoring: Expected Value Formulations

Expected Value

The fundamental scoring formula:

eCPM = pCTR × bid: Expected revenue per impression
eCPC = pCTR × pCVR × bid: Expected value for conversion-optimized campaigns

Bid Interpretation

CPC campaigns: Bid is cost-per-click, so eCPM = pCTR × bid
CPA campaigns: Bid is cost-per-action, so eCPM = pCTR × pCVR × bid
CPM campaigns: Bid is already per-impression, so eCPM = bid

Understanding bid semantics is crucial for correct scoring.

Incorporating Advertiser Objectives (CPC, CPA, ROAS Optimization)

CPC Optimization

Advertisers want to maximize clicks within budget:

Score = pCTR × bid
Higher CTR or higher bid increases score

CPA Optimization

Advertisers want conversions:

Score = pCTR × pCVR × bid
Must predict both click and conversion probability

ROAS Optimization

Advertisers want to maximize return on ad spend:

Score = pCTR × pCVR × expected_revenue × bid_multiplier
Requires predicting conversion value, not just probability

Different objectives require different scoring functions.

Diversity and Exploration in Ranking

Diversity

Showing variety in ad types, advertisers, or categories:

User experience: Prevents ad fatigue
Advertiser fairness: Gives more advertisers opportunities
Platform health: Reduces dependency on single advertisers

Techniques

Maximal marginal relevance: Balance relevance and diversity
Category diversity: Ensure multiple ad categories represented
Advertiser diversity: Limit ads from same advertiser

Exploration

Showing new or uncertain ads to gather data:

Cold start problem: New ads have no history
Exploration-exploitation: Balance showing known good ads vs. learning about new ones
Multi-armed bandits: Formal framework for exploration

Position Bias and How to Correct for It

The Problem

Users are more likely to click ads in higher positions, regardless of relevance. This creates bias in:

Training data: Higher positions have artificially higher CTR
Model predictions: Models learn position as a strong signal
Evaluation: Position bias inflates metrics

Correction Techniques

Inverse propensity weighting: Weight examples by inverse of position probability
Position as feature: Include position but don't use in serving
Randomization: Occasionally randomize positions to collect unbiased data
Causal modeling: Explicitly model position effect

Serving-Time Considerations

At serving time, we can't know position yet (it depends on auction outcome). We need to:

Train models without position bias
Score ads assuming they'll be in top position
Let auction determine final positions

Content to be expanded...