Filtering — What Gets Cut and When

Understanding the filtering hierarchy and why order matters for system efficiency.

The Filtering Hierarchy: Why Order Matters

Filtering should happen in order of:

  1. Cheapest filters first: Eliminate candidates early to save compute
  2. Highest rejection rate: Apply filters that remove the most candidates first
  3. Deterministic before ML: Use rules before expensive model inference

The wrong order can waste significant compute on ads that will eventually be filtered out.

Hard Constraints: Targeting, Eligibility, Policy Compliance

Targeting Constraints

  • Geographic restrictions
  • Demographic targeting
  • Device type requirements
  • Time-based restrictions

These are typically checked first using inverted indexes.

Eligibility Checks

  • Advertiser account status (active, suspended, etc.)
  • Campaign status (running, paused, exhausted)
  • Ad creative approval status
  • Budget availability

Policy Compliance

  • Content policies (prohibited content, brand safety)
  • Ad format requirements
  • Legal restrictions (age-gated products, etc.)

These filters are deterministic and fast, making them ideal for early-stage filtering.

Brand Safety Filtering: Advertiser and Publisher Controls

Advertiser Controls

Advertisers can specify:

  • Block lists: Categories or sites to avoid
  • Allow lists: Only show on specific sites
  • Content categories: Avoid certain content types

Publisher Controls

Publishers can specify:

  • Ad quality standards: Minimum quality scores
  • Content restrictions: What types of ads are acceptable
  • Brand safety requirements: Protect their brand reputation

Implementation

  • Pre-computed lists: Fast lookup tables
  • Content classification: ML models for content categorization
  • Real-time checks: Verify against current policies

Why Most Filtering Belongs Early (and What Doesn't)

Early Filtering Benefits

  • Saves compute: Don't run expensive ML on filtered ads
  • Reduces latency: Fewer candidates to process downstream
  • Lowers costs: Less infrastructure needed

What Shouldn't Be Filtered Early

  • Quality-based filtering: Requires ML predictions
  • Diversity requirements: Need to see full candidate set
  • Exploration: New ads need evaluation before filtering

The Cost of Late-Stage Filtering: Wasted Compute and Lost Revenue

Wasted Compute

If filtering happens after ML inference:

  • Models run on ads that will be filtered
  • Feature computation wasted
  • Ranking computation unnecessary

Lost Revenue

Late filtering can also hurt revenue:

  • Budget exhaustion: Ads filtered after budget check waste budget
  • Frequency caps: Filtering after frequency check wastes impressions
  • Opportunity cost: Time spent on filtered ads could be used for better candidates

Best Practices

  • Filter as early as possible
  • Use approximate checks when exact checks are expensive
  • Cache filtering results when possible
  • Monitor filtering rates at each stage

Content to be expanded...