Feature Stores & Feature Engineering
Feast Deep Dive
4 min read
Feast (Feature Store) is the most popular open-source feature store. It provides a consistent way to define, store, and serve features for training and inference.
Installation
# Install Feast
pip install feast
# Verify installation
feast version
Project Structure
# Initialize a new Feast project
feast init my_feature_store
cd my_feature_store
This creates:
my_feature_store/
├── feature_store.yaml # Project configuration
├── feature_repo/
│ ├── __init__.py
│ ├── entities.py # Entity definitions
│ ├── features.py # Feature view definitions
│ └── data/ # Sample data
└── README.md
Core Concepts
1. Entity
Entities are the objects you're computing features for:
from feast import Entity
# Define a customer entity
customer = Entity(
name="customer_id",
description="Unique customer identifier"
)
# Define a product entity
product = Entity(
name="product_id",
description="Unique product identifier"
)
2. Feature View
Feature views define how features are computed and stored:
from feast import FeatureView, Field, FileSource
from feast.types import Float32, Int64, String
from datetime import timedelta
# Data source
customer_source = FileSource(
path="data/customer_features.parquet",
timestamp_field="event_timestamp"
)
# Feature view definition
customer_features = FeatureView(
name="customer_features",
entities=[customer],
ttl=timedelta(days=365),
schema=[
Field(name="total_purchases", dtype=Int64),
Field(name="avg_order_value", dtype=Float32),
Field(name="days_since_last_order", dtype=Int64),
Field(name="customer_segment", dtype=String),
],
source=customer_source,
)
3. Feature Service
Bundle features together for serving:
from feast import FeatureService
# Feature service for churn prediction
churn_prediction_service = FeatureService(
name="churn_prediction",
features=[
customer_features[["total_purchases", "avg_order_value"]],
transaction_features[["transaction_count_30d"]],
],
)
Configuration
feature_store.yaml
project: my_feature_store
registry: data/registry.db
provider: local
online_store:
type: sqlite
path: data/online_store.db
offline_store:
type: file
entity_key_serialization_version: 2
Production Configuration (AWS)
project: my_feature_store
registry: s3://my-bucket/registry.db
provider: aws
online_store:
type: dynamodb
region: us-east-1
offline_store:
type: redshift
cluster_id: my-redshift-cluster
region: us-east-1
database: ml_features
user: feast_user
s3_staging_location: s3://my-bucket/feast-staging/
Complete Example
Define Features
# feature_repo/features.py
from feast import Entity, FeatureView, Field, FileSource, FeatureService
from feast.types import Float32, Int64
from datetime import timedelta
# Entities
customer = Entity(
name="customer_id",
description="Customer identifier"
)
# Data sources
customer_source = FileSource(
path="data/customer_stats.parquet",
timestamp_field="event_timestamp"
)
transaction_source = FileSource(
path="data/transaction_stats.parquet",
timestamp_field="event_timestamp"
)
# Feature views
customer_stats = FeatureView(
name="customer_stats",
entities=[customer],
ttl=timedelta(days=365),
schema=[
Field(name="lifetime_value", dtype=Float32),
Field(name="total_orders", dtype=Int64),
Field(name="avg_order_value", dtype=Float32),
],
source=customer_source,
)
transaction_stats = FeatureView(
name="transaction_stats",
entities=[customer],
ttl=timedelta(days=30),
schema=[
Field(name="transactions_7d", dtype=Int64),
Field(name="transactions_30d", dtype=Int64),
Field(name="amount_7d", dtype=Float32),
],
source=transaction_source,
)
# Feature service
prediction_service = FeatureService(
name="prediction_features",
features=[
customer_stats,
transaction_stats[["transactions_30d", "amount_7d"]],
],
)
Apply Feature Definitions
# Register features with Feast
feast apply
Materialize to Online Store
# Materialize features for online serving
feast materialize 2024-01-01 2025-01-01
Using Features
Training (Offline Store)
from feast import FeatureStore
import pandas as pd
# Initialize feature store
store = FeatureStore(repo_path=".")
# Entity dataframe with timestamps
entity_df = pd.DataFrame({
"customer_id": [1001, 1002, 1003, 1004],
"event_timestamp": pd.to_datetime([
"2025-01-01", "2025-01-02", "2025-01-03", "2025-01-04"
])
})
# Get historical features for training
training_df = store.get_historical_features(
entity_df=entity_df,
features=[
"customer_stats:lifetime_value",
"customer_stats:total_orders",
"transaction_stats:transactions_30d",
],
).to_df()
print(training_df)
# Output:
# customer_id event_timestamp lifetime_value total_orders transactions_30d
# 0 1001 2025-01-01 1500.0 25 5
# 1 1002 2025-01-02 800.0 12 3
# ...
Inference (Online Store)
from feast import FeatureStore
store = FeatureStore(repo_path=".")
# Get features for real-time inference
features = store.get_online_features(
features=[
"customer_stats:lifetime_value",
"customer_stats:total_orders",
"transaction_stats:transactions_30d",
],
entity_rows=[
{"customer_id": 1001},
{"customer_id": 1002},
],
).to_dict()
print(features)
# Output:
# {
# "customer_id": [1001, 1002],
# "lifetime_value": [1500.0, 800.0],
# "total_orders": [25, 12],
# "transactions_30d": [5, 3]
# }
Using Feature Services
# Using a feature service for consistent feature sets
features = store.get_online_features(
feature_service=store.get_feature_service("prediction_features"),
entity_rows=[{"customer_id": 1001}],
).to_dict()
Feast CLI Commands
| Command | Purpose |
|---|---|
feast init |
Create new project |
feast apply |
Register/update features |
feast materialize |
Populate online store |
feast materialize-incremental |
Incremental materialization |
feast ui |
Launch web UI |
feast teardown |
Remove all resources |
Best Practices
| Practice | Why |
|---|---|
| Use feature services | Consistent feature sets across environments |
| Set appropriate TTL | Balance freshness vs storage |
| Version your features | Track changes over time |
| Test locally first | Use file-based stores for development |
| Monitor staleness | Alert on stale features |
Key insight: Feast bridges the gap between your data warehouse (offline) and your inference service (online), ensuring the same feature logic is used everywhere.
Next, we'll explore feature engineering pipelines and best practices. :::