case study 2024-10-20 10 min read

Near Real-Time Personalization at LinkedIn: The Feature Store Approach

How LinkedIn achieves near real-time personalization using their online feature store architecture.

LinkedIn personalization feature store real-time ML infrastructure

Introduction

Personalization at LinkedIn requires processing user actions in near real-time to provide relevant content. This case study explores how LinkedIn's feature store enables this through efficient feature serving.

The Personalization Challenge

Requirements

  • Freshness: Features updated within seconds of user action
  • Scale: Millions of feature requests per second
  • Reliability: 99.99% availability

User Signals

  • Page views and content interactions
  • Search queries
  • Connection requests and messages
  • Time spent on content

Feature Store Architecture

Components

User Action -> Event Stream -> Feature Compute -> Feature Store -> Serving
                  |                 |                              |
              (Kafka)         (Flink/Spark)                   (Redis/Custom)

Storage Tiers

  1. Hot tier: In-memory for recent features
  2. Warm tier: SSD for frequently accessed
  3. Cold tier: HDFS for historical

Real-Time Feature Computation

Streaming Pipeline

// Simplified Flink example
stream
  .keyBy(event -> event.userId)
  .window(SlidingEventTimeWindows.of(Time.minutes(5), Time.seconds(30)))
  .aggregate(new EngagementAggregator())
  .addSink(featureStoreSink);

Feature Types

Feature Update Frequency Latency
Click history Real-time <1s
View counts Near real-time <10s
Engagement rates Batch Hours
Demographics Batch Days

Serving Architecture

Read Path

  1. Request arrives with user ID
  2. Check local cache
  3. Query distributed feature store
  4. Aggregate features for model

Optimizations

  • Batch fetching: Retrieve multiple features in one call
  • Compression: Reduce network overhead
  • Locality: Co-locate compute and storage

Impact

  • X% improvement in feed engagement
  • Faster model iteration cycles
  • Reduced feature engineering overhead

Lessons Learned

  1. Latency budgets are real constraints
  2. Feature freshness vs. cost tradeoff
  3. Monitoring feature quality is essential

Build your own feature store with our Recommendation Systems at Scale course.

Want to Go Deeper?

This article is part of our comprehensive curriculum on building ML systems at scale. Explore our full courses for hands-on learning.