Introduction
Meta's GEM (Generative Embeddings Model) represents a breakthrough in applying LLM-scale architectures to ads recommendation. This case study explores how Meta achieved significant improvements in ad relevance while maintaining strict latency requirements.
The Challenge
Ads recommendation at Meta scale presents unique challenges:
- Billions of daily predictions across Facebook and Instagram
- Strict latency SLAs (single-digit milliseconds)
- Complex multi-stakeholder objectives (users, advertisers, platform)
GEM Architecture
Foundation Model Approach
GEM treats ads recommendation as a generative modeling problem:
- Pre-training: Learn rich representations from ad content and user interactions
- Fine-tuning: Adapt to specific prediction tasks
- Efficient inference: Deploy with optimized serving
Model Components
- Transformer encoder: Process ad creative and metadata
- User sequence model: Capture temporal patterns
- Cross-attention layers: Model user-ad interactions
Technical Deep Dive
Scaling Embeddings
GEM uses massive embedding tables:
- Trillions of parameters in embedding layers
- Distributed storage across GPU clusters
- Gradient compression for efficient training
Multi-Task Learning
The model jointly optimizes:
- Click prediction (CTR)
- Conversion prediction (CVR)
- Long-term value estimation
Serving Optimization
Optimizations:
- Embedding caching
- Model quantization
- Batched inference
- Hardware acceleration
Results
- X% improvement in ads relevance metrics
- Maintained latency within strict SLAs
- Reduced model complexity vs. ensemble approaches
Lessons for ML Engineers
- Generative approaches can improve discriminative tasks
- Scale requires careful infrastructure investment
- Multi-task learning provides natural regularization
Dive deeper into ads systems in our Ads Systems at Scale course.