Feature Stores & Feature Engineering

Feast Deep Dive

4 min read

Feast (Feature Store) is the most popular open-source feature store. It provides a consistent way to define, store, and serve features for training and inference.

Installation

# Install Feast
pip install feast

# Verify installation
feast version

Project Structure

# Initialize a new Feast project
feast init my_feature_store
cd my_feature_store

This creates:

my_feature_store/
├── feature_store.yaml    # Project configuration
├── feature_repo/
│   ├── __init__.py
│   ├── entities.py       # Entity definitions
│   ├── features.py       # Feature view definitions
│   └── data/             # Sample data
└── README.md

Core Concepts

1. Entity

Entities are the objects you're computing features for:

from feast import Entity

# Define a customer entity
customer = Entity(
    name="customer_id",
    description="Unique customer identifier"
)

# Define a product entity
product = Entity(
    name="product_id",
    description="Unique product identifier"
)

2. Feature View

Feature views define how features are computed and stored:

from feast import FeatureView, Field, FileSource
from feast.types import Float32, Int64, String
from datetime import timedelta

# Data source
customer_source = FileSource(
    path="data/customer_features.parquet",
    timestamp_field="event_timestamp"
)

# Feature view definition
customer_features = FeatureView(
    name="customer_features",
    entities=[customer],
    ttl=timedelta(days=365),
    schema=[
        Field(name="total_purchases", dtype=Int64),
        Field(name="avg_order_value", dtype=Float32),
        Field(name="days_since_last_order", dtype=Int64),
        Field(name="customer_segment", dtype=String),
    ],
    source=customer_source,
)

3. Feature Service

Bundle features together for serving:

from feast import FeatureService

# Feature service for churn prediction
churn_prediction_service = FeatureService(
    name="churn_prediction",
    features=[
        customer_features[["total_purchases", "avg_order_value"]],
        transaction_features[["transaction_count_30d"]],
    ],
)

Configuration

feature_store.yaml

project: my_feature_store
registry: data/registry.db
provider: local

online_store:
  type: sqlite
  path: data/online_store.db

offline_store:
  type: file

entity_key_serialization_version: 2

Production Configuration (AWS)

project: my_feature_store
registry: s3://my-bucket/registry.db
provider: aws

online_store:
  type: dynamodb
  region: us-east-1

offline_store:
  type: redshift
  cluster_id: my-redshift-cluster
  region: us-east-1
  database: ml_features
  user: feast_user
  s3_staging_location: s3://my-bucket/feast-staging/

Complete Example

Define Features

# feature_repo/features.py
from feast import Entity, FeatureView, Field, FileSource, FeatureService
from feast.types import Float32, Int64
from datetime import timedelta

# Entities
customer = Entity(
    name="customer_id",
    description="Customer identifier"
)

# Data sources
customer_source = FileSource(
    path="data/customer_stats.parquet",
    timestamp_field="event_timestamp"
)

transaction_source = FileSource(
    path="data/transaction_stats.parquet",
    timestamp_field="event_timestamp"
)

# Feature views
customer_stats = FeatureView(
    name="customer_stats",
    entities=[customer],
    ttl=timedelta(days=365),
    schema=[
        Field(name="lifetime_value", dtype=Float32),
        Field(name="total_orders", dtype=Int64),
        Field(name="avg_order_value", dtype=Float32),
    ],
    source=customer_source,
)

transaction_stats = FeatureView(
    name="transaction_stats",
    entities=[customer],
    ttl=timedelta(days=30),
    schema=[
        Field(name="transactions_7d", dtype=Int64),
        Field(name="transactions_30d", dtype=Int64),
        Field(name="amount_7d", dtype=Float32),
    ],
    source=transaction_source,
)

# Feature service
prediction_service = FeatureService(
    name="prediction_features",
    features=[
        customer_stats,
        transaction_stats[["transactions_30d", "amount_7d"]],
    ],
)

Apply Feature Definitions

# Register features with Feast
feast apply

Materialize to Online Store

# Materialize features for online serving
feast materialize 2024-01-01 2025-01-01

Using Features

Training (Offline Store)

from feast import FeatureStore
import pandas as pd

# Initialize feature store
store = FeatureStore(repo_path=".")

# Entity dataframe with timestamps
entity_df = pd.DataFrame({
    "customer_id": [1001, 1002, 1003, 1004],
    "event_timestamp": pd.to_datetime([
        "2025-01-01", "2025-01-02", "2025-01-03", "2025-01-04"
    ])
})

# Get historical features for training
training_df = store.get_historical_features(
    entity_df=entity_df,
    features=[
        "customer_stats:lifetime_value",
        "customer_stats:total_orders",
        "transaction_stats:transactions_30d",
    ],
).to_df()

print(training_df)
# Output:
#    customer_id  event_timestamp  lifetime_value  total_orders  transactions_30d
# 0         1001       2025-01-01          1500.0            25                 5
# 1         1002       2025-01-02           800.0            12                 3
# ...

Inference (Online Store)

from feast import FeatureStore

store = FeatureStore(repo_path=".")

# Get features for real-time inference
features = store.get_online_features(
    features=[
        "customer_stats:lifetime_value",
        "customer_stats:total_orders",
        "transaction_stats:transactions_30d",
    ],
    entity_rows=[
        {"customer_id": 1001},
        {"customer_id": 1002},
    ],
).to_dict()

print(features)
# Output:
# {
#     "customer_id": [1001, 1002],
#     "lifetime_value": [1500.0, 800.0],
#     "total_orders": [25, 12],
#     "transactions_30d": [5, 3]
# }

Using Feature Services

# Using a feature service for consistent feature sets
features = store.get_online_features(
    feature_service=store.get_feature_service("prediction_features"),
    entity_rows=[{"customer_id": 1001}],
).to_dict()

Feast CLI Commands

Command Purpose
feast init Create new project
feast apply Register/update features
feast materialize Populate online store
feast materialize-incremental Incremental materialization
feast ui Launch web UI
feast teardown Remove all resources

Best Practices

Practice Why
Use feature services Consistent feature sets across environments
Set appropriate TTL Balance freshness vs storage
Version your features Track changes over time
Test locally first Use file-based stores for development
Monitor staleness Alert on stale features

Key insight: Feast bridges the gap between your data warehouse (offline) and your inference service (online), ensuring the same feature logic is used everywhere.

Next, we'll explore feature engineering pipelines and best practices. :::

Quiz

Module 4: Feature Stores & Feature Engineering

Take Quiz