Feature Stores & Feature Engineering

Why Feature Stores?

3 min read

Feature stores solve one of the most common production ML problems: training-serving skew. They ensure your model sees the same features in production as it did during training.

The Training-Serving Skew Problem

Training Pipeline                     Serving Pipeline
┌──────────────────┐                 ┌──────────────────┐
│  SQL Query A     │                 │  Python Code B   │
│  (PostgreSQL)    │                 │  (Real-time API) │
└────────┬─────────┘                 └────────┬─────────┘
         │                                    │
         ▼                                    ▼
┌──────────────────┐                 ┌──────────────────┐
│  Feature X = 10  │      ≠         │  Feature X = 10.1│
└──────────────────┘                 └──────────────────┘
         │                                    │
         ▼                                    ▼
      Model                              Model
    (accurate)                         (degraded)

The problem: Different code computes the same features, leading to subtle differences that degrade model performance.

What is a Feature Store?

A feature store is a centralized repository for:

  • Storing feature definitions
  • Computing features consistently
  • Serving features for training and inference
  • Tracking feature lineage and versions
                    ┌─────────────────────┐
                    │    Feature Store    │
                    │  ┌───────────────┐  │
   Raw Data ──────▶ │  │  Transform    │  │ ──────▶ Training
                    │  │  & Store      │  │
                    │  └───────────────┐  │ ──────▶ Serving
                    │  │  Online/      │  │
                    │  │  Offline      │  │
                    └──┴───────────────┴──┘

Online vs Offline Stores

AspectOffline StoreOnline Store
Use caseTrainingInference
LatencyMinutes-hoursMilliseconds
StorageData warehouseKey-value store
VolumeHistorical dataLatest values
AccessBatch queriesPoint lookups

Offline Store (Training)

# Query historical features for training
training_data = feature_store.get_historical_features(
    entity_df=entity_dataframe,
    features=[
        "customer_features:total_purchases",
        "customer_features:avg_order_value",
        "customer_features:days_since_last_order"
    ]
)

Online Store (Inference)

# Get latest features for real-time prediction
features = feature_store.get_online_features(
    features=[
        "customer_features:total_purchases",
        "customer_features:avg_order_value"
    ],
    entity_rows=[{"customer_id": 12345}]
)

Feature Store Benefits

1. Consistency

┌─────────────────────────────────────────────────────────┐
│              Single Feature Definition                   │
│                                                         │
│  def avg_order_value(orders):                           │
│      return orders.groupby('customer_id')['amount'].mean()│
└─────────────────────────────────────────────────────────┘
         ┌───────────────┴───────────────┐
         │                               │
         ▼                               ▼
    Training                        Serving
    (Same result)                  (Same result)

2. Reusability

Feature: customer_lifetime_value
    ├── Used by: Churn Prediction Model
    ├── Used by: Upsell Model
    ├── Used by: Risk Assessment Model
    └── Used by: Marketing Segmentation

3. Discovery

Teams can browse and reuse existing features:

Feature Catalog
───────────────────────────────────────────────
Name                    │ Owner   │ Last Updated
───────────────────────────────────────────────
customer_ltv            │ Team A  │ 2025-01-15
product_avg_rating      │ Team B  │ 2025-01-10
user_session_count      │ Team C  │ 2025-01-12
order_frequency_30d     │ Team A  │ 2025-01-14
───────────────────────────────────────────────

4. Time Travel

# Get features as they were on a specific date
point_in_time_features = feature_store.get_historical_features(
    entity_df=entity_dataframe,
    features=["customer_features:total_purchases"],
    timestamp_field="event_timestamp"
)

Common Use Cases

Use CaseFeatures NeededLatency
Fraud detectionTransaction patterns, device info< 50ms
RecommendationsUser preferences, item embeddings< 100ms
Credit scoringFinancial history, behavior patterns< 1s
Dynamic pricingDemand signals, competitor prices< 500ms

Feature Store Architecture

┌─────────────────────────────────────────────────────────────┐
│                      Data Sources                            │
│  ┌─────────┐  ┌─────────┐  ┌─────────┐  ┌─────────┐        │
│  │ Database│  │ Streams │  │  Files  │  │  APIs   │        │
│  └────┬────┘  └────┬────┘  └────┬────┘  └────┬────┘        │
└───────┼────────────┼────────────┼────────────┼──────────────┘
        │            │            │            │
        └────────────┴─────┬──────┴────────────┘
              ┌────────────────────────┐
              │   Feature Engineering  │
              │   (Transformations)    │
              └───────────┬────────────┘
        ┌─────────────────┴─────────────────┐
        │                                   │
        ▼                                   ▼
┌───────────────────┐            ┌───────────────────┐
│   Offline Store   │            │   Online Store    │
│   (Data Lake)     │            │   (Redis/DynamoDB)│
└─────────┬─────────┘            └─────────┬─────────┘
          │                                │
          ▼                                ▼
   Training Pipeline              Inference Service
ToolTypeBest For
FeastOpen-sourceGeneral purpose, self-hosted
TectonManagedEnterprise, real-time ML
DatabricksManagedSpark-based workflows
AWS SageMakerManagedAWS ecosystem
Vertex AIManagedGCP ecosystem

When Do You Need a Feature Store?

SituationNeed Feature Store?
Single model, batch inferenceMaybe
Multiple models sharing featuresYes
Real-time inferenceYes
Training-serving skew issuesYes
Feature discovery/governanceYes

Key insight: Feature stores aren't just storage—they're the bridge between data engineering and ML, ensuring consistency, reusability, and governance across your ML platform.

Next, we'll dive deep into Feast, the most popular open-source feature store. :::

Quick check: how does this lesson land for you?

Quiz

Module 4: Feature Stores & Feature Engineering

Take Quiz
FREE WEEKLY NEWSLETTER

Stay on the Nerd Track

One email per week — courses, deep dives, tools, and AI experiments.

No spam. Unsubscribe anytime.