Advanced Builds: Trading & Computer Vision

The agents we have built so far automate existing workflows — things people already do, just faster. Now we step into territory where agents enable entirely new capabilities. A trading agent that executes market strategies around the clock. A vision agent that sees the physical world through smart glasses and acts on what it observes. These are not incremental improvements — they represent a fundamentally different relationship between humans and autonomous systems.

Build 1: Trading Agent with Alpaca Markets API

Alpaca Markets (alpaca.markets) provides an API for stock trading, including a paper trading mode that lets you test strategies with simulated money before risking real capital.

Disclaimer: Trading involves financial risk. The following is for educational purposes only. Never deploy a trading bot with real money without thorough testing, risk management, and an understanding of the financial regulations in your jurisdiction. Past performance of any strategy does not guarantee future results.

Architecture:

┌──────────────────────────────────────┐
│          Trading Agent               │
├──────────────────────────────────────┤
│  Triggers: Market hours schedule     │
│  + Price alert conditions            │
├──────────────┬───────────────────────┤
│ Alpaca API   │  Market Data Feed     │
│ (orders,     │  (prices, volume,     │
│  positions,  │   news sentiment)     │
│  account)    │                       │
├──────────────┴───────────────────────┤
│  Strategy Engine                     │
│  (rule-based + LLM analysis)        │
├──────────────────────────────────────┤
│  Risk Management Layer              │
│  (position limits, stop-losses,     │
│   daily loss caps)                  │
├──────────────────────────────────────┤
│  Actions: Buy, sell, hold, alert    │
└──────────────────────────────────────┘

Tools needed:

Alpaca Markets API (paper trading account for testing)
Market data feed (Alpaca provides this with their API)
LLM for news sentiment analysis (optional but powerful)
Risk management rules (hard-coded limits the agent cannot override)

Workflow:

Agent activates during market hours
Pulls current positions, account balance, and watchlist prices
Analyzes market data against predefined rules (moving averages, volume spikes, price thresholds)
Optionally feeds recent news headlines to the LLM for sentiment analysis
If a trade signal triggers AND passes all risk checks, executes the order via Alpaca API
Logs every decision with full reasoning for later review

# Trading agent - paper trading with Alpaca
import alpaca_trade_api as tradeapi

# Paper trading setup - uses simulated money
api = tradeapi.REST(
    key_id="your_paper_key",
    secret_key="your_paper_secret",
    base_url="https://paper-api.alpaca.markets"  # Paper trading endpoint
)

def check_trading_signal(symbol: str, strategy: dict) -> dict:
    """Evaluate whether a stock meets the trading criteria."""
    # Get recent price data
    bars = api.get_bars(symbol, "1Day", limit=50).df

    # Simple moving average crossover (example strategy)
    bars["sma_20"] = bars["close"].rolling(window=20).mean()
    bars["sma_50"] = bars["close"].rolling(window=50).mean()

    latest = bars.iloc[-1]
    previous = bars.iloc[-2]

    signal = {
        "symbol": symbol,
        "action": "hold",
        "reason": "No signal detected",
    }

    # Buy signal: short-term average crosses above long-term
    if previous["sma_20"] <= previous["sma_50"] and latest["sma_20"] > latest["sma_50"]:
        signal["action"] = "buy"
        signal["reason"] = "SMA 20 crossed above SMA 50"

    # Sell signal: short-term average crosses below long-term
    if previous["sma_20"] >= previous["sma_50"] and latest["sma_20"] < latest["sma_50"]:
        signal["action"] = "sell"
        signal["reason"] = "SMA 20 crossed below SMA 50"

    return signal

def execute_with_risk_checks(signal: dict, risk_params: dict) -> dict:
    """Execute a trade only if it passes all risk management checks."""
    account = api.get_account()
    portfolio_value = float(account.portfolio_value)

    # Risk check: maximum position size
    max_position = portfolio_value * risk_params["max_position_pct"]

    # Risk check: daily loss limit
    daily_pnl = float(account.portfolio_value) - float(account.last_equity)
    if daily_pnl < -risk_params["max_daily_loss"]:
        return {"executed": False, "reason": "Daily loss limit reached"}

    # Risk check: maximum number of open positions
    positions = api.list_positions()
    if len(positions) >= risk_params["max_open_positions"]:
        return {"executed": False, "reason": "Maximum positions reached"}

    # All checks passed — execute the trade
    if signal["action"] == "buy":
        order = api.submit_order(
            symbol=signal["symbol"],
            qty=calculate_position_size(signal["symbol"], max_position),
            side="buy",
            type="market",
            time_in_force="day"
        )
        return {"executed": True, "order_id": order.id}

    return {"executed": False, "reason": "No action taken"}

Critical design decisions:

Paper trading first. Always develop and test with Alpaca's paper trading endpoint. Never connect to live trading until a strategy has been validated extensively.
Hard risk limits. The agent must have non-negotiable limits: maximum position size, maximum daily loss, maximum number of positions. These are not suggestions — they are circuit breakers.
Full logging. Every decision, every signal, every trade must be logged with the reasoning. You need to audit what the agent did and why.

What the agent handles: Data collection, signal detection, and order execution within strict risk parameters. What stays human: Strategy design, risk parameter setting, and the decision to move from paper to live trading.

Build 2: VisionCrew — Computer Vision Meets Agent Orchestration

VisionCrew is a conceptual project that combines Meta Ray-Ban smart glasses with the Gemini Live API and CrewAI to create an agent that can see and describe the world in real time. The pattern draws on open-source community explorations of combining wearable cameras with agent frameworks.

This is where agent orchestration meets the physical world.

How it works:

┌──────────────────────────────────────┐
│           VisionCrew Stack           │
├──────────────────────────────────────┤
│  Hardware: Meta Ray-Ban glasses     │
│  (camera + microphone + speaker)    │
├──────────────────────────────────────┤
│  Vision API: Gemini Live API        │
│  (real-time image understanding)    │
├──────────────────────────────────────┤
│  Orchestration: CrewAI              │
│  (tool routing, memory, actions)    │
├──────────────────────────────────────┤
│  Output: Voice response via glasses │
│  + optional actions (search, save,  │
│    navigate, identify)              │
└──────────────────────────────────────┘

The three layers:

Wearable hardware (Meta Ray-Bans): Captures what you see through the built-in camera and what you say through the microphone. Delivers audio responses through the speakers. The glasses are the agent's eyes, ears, and voice.
Multimodal AI (Gemini Live API): Processes the camera feed in real time. Understands what is in the frame — objects, text, scenes, people, landmarks. This is not simple image classification — it is contextual scene understanding that can answer questions about what it sees.
Agent orchestration (CrewAI): Takes the visual understanding and routes it through a crew of specialized agents. Each agent can use tools — search for information about what it sees, save observations to memory, provide navigation directions, or trigger other actions based on visual input.

What this enables:

Walk through a grocery store and ask "What is the cheapest organic option here?" — the agent reads prices and labels through the glasses
Visit a new city and get contextual information about buildings, landmarks, and signs in real time
Attend a conference and have the agent identify speakers, summarize their slides, and save notes

The convergence pattern:

VisionCrew illustrates a broader trend — the convergence of three capabilities:

Capability	What It Provides	Example
Multimodal AI	Understanding across text, image, audio, video	Gemini Live API
Wearable hardware	Persistent sensors in the physical world	Meta Ray-Ban glasses
Agent orchestration	Autonomous action based on understanding	CrewAI framework

Each capability alone is interesting. Combined, they create something qualitatively different: an agent that exists in the physical world, understands what it perceives, and can act on that understanding.

The Frontier of Agent Capabilities

Both of these builds represent agents that go beyond automating existing workflows:

Trading agents operate in environments that move faster than humans can process. The agent's value is not just speed — it is consistency. It follows the strategy without emotion, fatigue, or distraction.
Vision agents extend perception itself. They give you a second pair of eyes that can process, remember, and act on visual information continuously.

The common thread is autonomous action in complex environments. The agent is not waiting for you to tell it what to do next — it is perceiving, deciding, and acting within the boundaries you have defined.

Key takeaway: Advanced agents do not just automate tasks — they enable capabilities that humans alone cannot sustain. Trading agents provide emotionless, consistent execution. Vision agents provide continuous perception and contextual understanding. Both require careful boundary-setting: the agent acts within strict limits, and humans define those limits.

Next: Turning your agent-building skills into a business — the models, pricing strategies, and client acquisition tactics that work. :::

Build 1: Trading Agent with Alpaca Markets API

Build 2: VisionCrew — Computer Vision Meets Agent Orchestration

The Frontier of Agent Capabilities

Quiz