Monitoring, Debugging, and Closing the Loop
How to monitor production systems, detect issues, and continuously improve.
Metrics to Monitor in Production
Revenue Metrics
- RPM (Revenue Per Mille): Overall revenue per 1000 impressions
- Revenue per query: Average revenue per user request
- Fill rate: Percentage of requests that result in served ads
- eCPM: Effective cost per mille (what advertisers pay)
User Experience Metrics
- CTR: Click-through rate (engagement indicator)
- Ad load: Number of ads per page
- User satisfaction: Surveys, negative feedback rates
- Page load time: Impact of ads on page performance
Advertiser Metrics
- ROAS: Return on ad spend for advertisers
- Conversion rates: Clicks to conversions
- Budget delivery: How smoothly budgets are spent
- Campaign performance: Overall advertiser satisfaction
System Health Metrics
- Latency: P50, P95, P99 response times
- Error rates: Failed requests, timeouts
- Throughput: Requests per second
- Resource utilization: CPU, memory, network
Detecting Model Degradation and Drift
Model Degradation
Performance decline over time:
- Accuracy: Predictions become less accurate
- Calibration: Probabilities drift from actual rates
- Revenue impact: System generates less revenue
Drift Detection
Data Drift
- Feature distributions: User behavior changes
- Ad inventory: New ads, new advertisers
- Market conditions: Economic changes affect behavior
Concept Drift
- CTR patterns: User clicking behavior changes
- Conversion patterns: What drives conversions shifts
- Quality signals: Relevance standards evolve
Detection Methods
- Statistical tests: Compare current vs. historical distributions
- Model performance: Track accuracy on holdout data
- A/B testing: Compare new models to current
- Anomaly detection: Identify unusual patterns
Diagnosing Revenue Drops: Model, Market, or Bug?
Model Issues
- Stale models: Not retrained with recent data
- Overfitting: Model doesn't generalize
- Feature bugs: Incorrect feature computation
- Calibration drift: Predictions no longer calibrated
Market Changes
- Advertiser behavior: Bids change, budgets shift
- User behavior: Clicking patterns change
- Competition: New platforms, market saturation
- Seasonality: Expected patterns (holidays, events)
Bugs
- Code bugs: Logic errors in serving pipeline
- Data bugs: Incorrect data in features or logs
- Infrastructure bugs: System failures, network issues
- Configuration bugs: Wrong settings, thresholds
Diagnosis Process
- Check system health: Is infrastructure working?
- Review recent changes: What was deployed recently?
- Analyze metrics: Which metrics changed and when?
- Compare segments: Is issue global or specific?
- Trace examples: Follow specific requests through system
Tracing a Bad Ad Through the System
The Problem
An ad that shouldn't have been shown (low quality, wrong targeting, etc.) was served. Why?
Tracing Steps
- Retrieval: Was ad in candidate set? Why?
- Filtering: Did it pass all filters? Should it have?
- Prediction: What were model predictions? Were they correct?
- Ranking: What was the score? Why did it rank high?
- Auction: Did it win fairly? Was price correct?
- Serving: Was correct ad served? Any last-minute changes?
Tools Needed
- Request IDs: Track single request through entire pipeline
- Distributed tracing: See all service calls for a request
- Feature logs: See exact features used in predictions
- Decision logs: See all filtering and ranking decisions
Case Studies: Real Production Incidents
Case Study 1: Model Calibration Drift
Symptom: Revenue dropped 5% over 2 weeks Investigation: Found CTR predictions were overconfident Root cause: Model not retrained, user behavior shifted Fix: Retrained model with recent data, improved calibration Prevention: Automated retraining pipeline, calibration monitoring
Case Study 2: Feature Bug
Symptom: Certain user segments had unusually low CTR Investigation: Traced to user feature computation Root cause: Bug in feature engineering pipeline Fix: Corrected feature computation, backfilled historical data Prevention: Feature validation tests, monitoring feature distributions
Case Study 3: Auction Mechanism Issue
Symptom: Fill rate dropped, many auctions had no winners Investigation: Found reserve prices too high Root cause: Recent change to reserve price algorithm Fix: Rolled back change, fixed algorithm Prevention: Gradual rollouts, A/B testing for revenue changes
These case studies illustrate the importance of comprehensive monitoring and debugging capabilities.
Content to be expanded...