Machine learning at scale 🤖

MLSys Case Studies

ByteDance's TokenMixer-Large: Scaling Ranking Models

ByteDance's TokenMixer-Large: Scaling Ranking Models

Embedding Features in Weights to Kill Retrieval Latency

Embedding Features in Weights to Kill Retrieval Latency

A Blueprint for Scaling Recommender Systems

A Blueprint for Scaling Recommender Systems

Why Your $130K ML Pipeline Is Starving 65 Percent of New Merchants [Edition #11]

Why Your $130K ML Pipeline Is Starving 65 Percent of New Merchants [Edition #11]

$9,000 Monthly Vector Index That Failed on 2 Million Documents [Edition #10]

$9,000 Monthly Vector Index That Failed on 2 Million Documents [Edition #10]

12M Dollars Lost to an AUC Metric That Ignored Probability Calibration [Edition #9]

12M Dollars Lost to an AUC Metric That Ignored Probability Calibration [Edition #9]

$4.2M Lost Because of a 48-hour Labeling Loop [Edition #8]

$4.2M Lost Because of a 48-hour Labeling Loop [Edition #8]

A $1.1M Generative Recommender That Collapsed Into a 2000 Video Loop [Edition #7]

A $1.1M Generative Recommender That Collapsed Into a 2000 Video Loop [Edition #7]

The $22K Neural Search Pipeline That Was Silently 7 Days Behind [Edition #6]

The $22K Neural Search Pipeline That Was Silently 7 Days Behind [Edition #6]

$220K Lost to a Fraud Model That Passed a 0.82 Accuracy Check [Edition #5]

$220K Lost to a Fraud Model That Passed a 0.82 Accuracy Check [Edition #5]

A $27K/Month Ranking System That Silently Buried 45,000 New Listings Daily [Edition #4]

A $27K/Month Ranking System That Silently Buried 45,000 New Listings Daily [Edition #4]

The $5800 FAISS Index That Was Stale for 168 Hours Straight [Edition #3]

The $5800 FAISS Index That Was Stale for 168 Hours Straight [Edition #3]

800ms Latency Spikes From A $45K Redis Cluster That Looked Healthy [Edition #2]

800ms Latency Spikes From A $45K Redis Cluster That Looked Healthy [Edition #2]

VectoScale Is Paying $237k/Month to Hide a Bad Architectural Decision [Edition #1]

VectoScale Is Paying $237k/Month to Hide a Bad Architectural Decision [Edition #1]

Decoupling Compute from Sequence Length in CTR Scaling

Decoupling Compute from Sequence Length in CTR Scaling

LinkedIn Semantic Search

LinkedIn Semantic Search

Deep Neural Networks for YouTube Recommendations

Deep Neural Networks for YouTube Recommendations

LinkedIn's MixLM: 10x Faster LLM Ranking via Embedding Injection

LinkedIn's MixLM: 10x Faster LLM Ranking via Embedding Injection

xAI - Recommendation System deep dive [Part 2]

xAI - Recommendation System deep dive [Part 2]

xAI Recommendation System Deep Dive

xAI Recommendation System Deep Dive

Meta's GEM: Bringing LLM-Scale Architectures to Ads Recommendation

Meta's GEM: Bringing LLM-Scale Architectures to Ads Recommendation

Engineering Airbnb's Embedding-Based Retrieval System

Engineering Airbnb's Embedding-Based Retrieval System

vLLM @ LinkedIn

vLLM @ LinkedIn

Deep dive into "Memory for LLMs" architectures

Deep dive into "Memory for LLMs" architectures

Pinterest recommendation system evolutions through the years

Pinterest recommendation system evolutions through the years

Long sequence for recommendation systems

Long sequence for recommendation systems

How LinkedIn built its GenAI platform

How LinkedIn built its GenAI platform

Compound AI systems

Compound AI systems

Near real-time personalization at LinkedIn

Near real-time personalization at LinkedIn

TikTok Real Time Recommendation algorithm scales to billions

TikTok Real Time Recommendation algorithm scales to billions

Uber optimal feature discovery

Uber optimal feature discovery

Netflix ML platform

Netflix ML platform

Reddit's ML Model Deployment and Serving Architecture

Reddit's ML Model Deployment and Serving Architecture

Meta AI platform

Meta AI platform

Doordash monitoring

Doordash monitoring

Uber model deployment

Uber model deployment

Wait time prediction

Wait time prediction