Memory Graph, Voice & Email Integration

So far your agent can think and communicate. But it cannot remember what happened last week, hear your voice, or send an email on your behalf. These three integrations — memory, voice, and email — transform your agent from a reactive chatbot into a proactive system that accumulates knowledge, accepts natural input, and takes real-world action.

Obsidian as a Memory Graph

An agent without persistent memory is like an employee who forgets everything at the end of each day. You need a structured, searchable knowledge base that your agent can both read from and write to.

Obsidian is a markdown-based note-taking application that stores files locally as plain text. This makes it an excellent memory backend for agents because:

Files are plain markdown — the agent can read and write them with standard file operations
The link graph — Obsidian's [[wikilink]] syntax creates a navigable knowledge graph between notes
Local-first — no cloud dependency, data stays on your machine or VPS
Human-readable — you can browse and edit the same knowledge base your agent uses

How the Agent Uses Obsidian

The agent interacts with an Obsidian vault (a folder of markdown files) through file system access:

# Memory configuration pointing to Obsidian vault
memory:
  type: obsidian
  vault_path: /home/agent/obsidian-vault
  index_on_startup: true

When configured, the agent can:

Store new knowledge: After completing a research task, the agent creates a new note with findings and links it to related notes
Recall past context: Before responding, the agent searches the vault for relevant prior knowledge
Build connections: The agent creates [[links]] between related concepts, building a knowledge graph over time

# Example: Agent creating a linked note
def save_research(topic, findings, related_topics):
    note_content = f"# {topic}\n\n"
    note_content += f"{findings}\n\n"
    note_content += "## Related\n"
    for related in related_topics:
        note_content += f"- [[{related}]]\n"

    # Write to Obsidian vault
    with open(f"{vault_path}/{topic}.md", "w") as f:
        f.write(note_content)

The compound effect is significant — after weeks of operation, the agent has built a rich knowledge graph of your projects, preferences, and decisions that it can reference in future interactions.

Whisper for Voice Transcription

Sometimes typing is impractical. You are walking, driving, or simply thinking faster than you can type. Voice input solves this by letting you speak naturally and having the agent process your words as text.

OpenAI's Whisper is an open-source speech recognition model that converts audio to text with high accuracy across many languages.

Integration Flow

The voice pipeline works like this:

You send a voice message (via Telegram voice note, a recording app, or a microphone)
The audio file is passed to Whisper for transcription
The transcribed text is fed to your agent as a regular text input
The agent processes it and responds through the normal channel

# Voice transcription integration
import whisper

# Load the model (options: tiny, base, small, medium, large)
model = whisper.load_model("base")

def transcribe_voice(audio_path):
    """Convert voice audio to text using Whisper."""
    result = model.transcribe(audio_path)
    return result["text"]

# Example usage in an agent pipeline
voice_text = transcribe_voice("/tmp/voice_message.ogg")
agent_response = agent.process(voice_text)

Model size trade-offs:

Model	Speed	Accuracy	VRAM Required
`tiny`	Fastest	Good for clear speech	Minimal
`base`	Fast	Good general accuracy	Low
`small`	Moderate	Better accuracy	Moderate
`medium`	Slower	High accuracy	Higher
`large`	Slowest	Highest accuracy	Significant

For most agent use cases, the base or small model provides the best balance of speed and accuracy.

AgentMail for Email Integration

Email remains one of the most important communication channels in professional life. Giving your agent its own email address unlocks powerful workflows — the agent can receive emails, draft responses, send notifications, and manage correspondence.

AgentMail (agentmail.to) is a service designed specifically for giving AI agents their own email addresses. It is a Y Combinator (YC S25) backed company that has raised $6M in funding and has delivered over 100 million emails.

Why a Dedicated Email Service

You could configure a standard email provider, but agent-specific email services offer advantages:

API-first design — built for programmatic access, not human inboxes
Deliverability — pre-configured for high deliverability rates
Multiple addresses — easily provision different email addresses for different agent roles
Webhook support — incoming emails trigger agent actions automatically

Configuration Example

# Email integration configuration
email:
  provider: agentmail
  api_key: ${AGENTMAIL_API_KEY}
  addresses:
    - address: "assistant@yourdomain.com"
      purpose: "General correspondence"
    - address: "reports@yourdomain.com"
      purpose: "Automated report delivery"

# Handling incoming email
def on_email_received(email):
    """Process incoming email and generate a response."""
    sender = email["from"]
    subject = email["subject"]
    body = email["body"]

    # Agent analyzes the email
    analysis = agent.process(
        f"Incoming email from {sender}, "
        f"subject: {subject}. Body: {body}"
    )

    # Draft response if appropriate
    if analysis.should_respond:
        send_email(
            to=sender,
            subject=f"Re: {subject}",
            body=analysis.draft_response
        )

When you combine these three integrations, you create a multi-modal input system where your agent can receive information through multiple channels:

Input Method	When to Use	Agent Action
Text (Telegram/Discord)	At your desk, precise instructions	Direct processing
Voice (Whisper)	On the go, brainstorming	Transcribe then process
Email (AgentMail)	Formal communication, external contacts	Parse, respond, or escalate
Memory (Obsidian)	Background reference	Retrieve relevant context

The power is in the combination. You send a voice note while walking: "Research the competitor's pricing page and email me a summary." The agent transcribes your voice, performs web research, stores findings in Obsidian, and emails you a formatted summary — all triggered by a single voice message.

Key takeaway: Memory, voice, and email transform your agent from a simple conversational interface into a knowledge-accumulating, multi-modal system. Obsidian gives it persistent memory, Whisper gives it ears, and AgentMail gives it a professional communication channel. Together, they create an agent that grows more capable over time.

Next: Keeping your agent running around the clock with mission control and always-on operations. :::

Quiz