Introduction
Personalization at LinkedIn requires processing user actions in near real-time to provide relevant content. This case study explores how LinkedIn's feature store enables this through efficient feature serving.
The Personalization Challenge
Requirements
- Freshness: Features updated within seconds of user action
- Scale: Millions of feature requests per second
- Reliability: 99.99% availability
User Signals
- Page views and content interactions
- Search queries
- Connection requests and messages
- Time spent on content
Feature Store Architecture
Components
User Action -> Event Stream -> Feature Compute -> Feature Store -> Serving
| | |
(Kafka) (Flink/Spark) (Redis/Custom)
Storage Tiers
- Hot tier: In-memory for recent features
- Warm tier: SSD for frequently accessed
- Cold tier: HDFS for historical
Real-Time Feature Computation
Streaming Pipeline
// Simplified Flink example
stream
.keyBy(event -> event.userId)
.window(SlidingEventTimeWindows.of(Time.minutes(5), Time.seconds(30)))
.aggregate(new EngagementAggregator())
.addSink(featureStoreSink);
Feature Types
| Feature | Update Frequency | Latency |
|---|---|---|
| Click history | Real-time | <1s |
| View counts | Near real-time | <10s |
| Engagement rates | Batch | Hours |
| Demographics | Batch | Days |
Serving Architecture
Read Path
- Request arrives with user ID
- Check local cache
- Query distributed feature store
- Aggregate features for model
Optimizations
- Batch fetching: Retrieve multiple features in one call
- Compression: Reduce network overhead
- Locality: Co-locate compute and storage
Impact
- X% improvement in feed engagement
- Faster model iteration cycles
- Reduced feature engineering overhead
Lessons Learned
- Latency budgets are real constraints
- Feature freshness vs. cost tradeoff
- Monitoring feature quality is essential
Build your own feature store with our Recommendation Systems at Scale course.