Setting Up Your Agent Environment
Memory Graph, Voice & Email Integration
So far your agent can think and communicate. But it cannot remember what happened last week, hear your voice, or send an email on your behalf. These three integrations — memory, voice, and email — transform your agent from a reactive chatbot into a proactive system that accumulates knowledge, accepts natural input, and takes real-world action.
Obsidian as a Memory Graph
An agent without persistent memory is like an employee who forgets everything at the end of each day. You need a structured, searchable knowledge base that your agent can both read from and write to.
Obsidian is a markdown-based note-taking application that stores files locally as plain text. This makes it an excellent memory backend for agents because:
- Files are plain markdown — the agent can read and write them with standard file operations
- The link graph — Obsidian's
[[wikilink]]syntax creates a navigable knowledge graph between notes - Local-first — no cloud dependency, data stays on your machine or VPS
- Human-readable — you can browse and edit the same knowledge base your agent uses
How the Agent Uses Obsidian
The agent interacts with an Obsidian vault (a folder of markdown files) through file system access:
# Memory configuration pointing to Obsidian vault
memory:
type: obsidian
vault_path: /home/agent/obsidian-vault
index_on_startup: true
When configured, the agent can:
- Store new knowledge: After completing a research task, the agent creates a new note with findings and links it to related notes
- Recall past context: Before responding, the agent searches the vault for relevant prior knowledge
- Build connections: The agent creates
[[links]]between related concepts, building a knowledge graph over time
# Example: Agent creating a linked note
def save_research(topic, findings, related_topics):
note_content = f"# {topic}\n\n"
note_content += f"{findings}\n\n"
note_content += "## Related\n"
for related in related_topics:
note_content += f"- [[{related}]]\n"
# Write to Obsidian vault
with open(f"{vault_path}/{topic}.md", "w") as f:
f.write(note_content)
The compound effect is significant — after weeks of operation, the agent has built a rich knowledge graph of your projects, preferences, and decisions that it can reference in future interactions.
Whisper for Voice Transcription
Sometimes typing is impractical. You are walking, driving, or simply thinking faster than you can type. Voice input solves this by letting you speak naturally and having the agent process your words as text.
OpenAI's Whisper is an open-source speech recognition model that converts audio to text with high accuracy across many languages.
Integration Flow
The voice pipeline works like this:
- You send a voice message (via Telegram voice note, a recording app, or a microphone)
- The audio file is passed to Whisper for transcription
- The transcribed text is fed to your agent as a regular text input
- The agent processes it and responds through the normal channel
# Voice transcription integration
import whisper
# Load the model (options: tiny, base, small, medium, large)
model = whisper.load_model("base")
def transcribe_voice(audio_path):
"""Convert voice audio to text using Whisper."""
result = model.transcribe(audio_path)
return result["text"]
# Example usage in an agent pipeline
voice_text = transcribe_voice("/tmp/voice_message.ogg")
agent_response = agent.process(voice_text)
Model size trade-offs:
| Model | Speed | Accuracy | VRAM Required |
|---|---|---|---|
tiny |
Fastest | Good for clear speech | Minimal |
base |
Fast | Good general accuracy | Low |
small |
Moderate | Better accuracy | Moderate |
medium |
Slower | High accuracy | Higher |
large |
Slowest | Highest accuracy | Significant |
For most agent use cases, the base or small model provides the best balance of speed and accuracy.
AgentMail for Email Integration
Email remains one of the most important communication channels in professional life. Giving your agent its own email address unlocks powerful workflows — the agent can receive emails, draft responses, send notifications, and manage correspondence.
AgentMail (agentmail.to) is a service designed specifically for giving AI agents their own email addresses. It is a Y Combinator (YC S25) backed company that has raised $6M in funding and has delivered over 100 million emails.
Why a Dedicated Email Service
You could configure a standard email provider, but agent-specific email services offer advantages:
- API-first design — built for programmatic access, not human inboxes
- Deliverability — pre-configured for high deliverability rates
- Multiple addresses — easily provision different email addresses for different agent roles
- Webhook support — incoming emails trigger agent actions automatically
Configuration Example
# Email integration configuration
email:
provider: agentmail
api_key: ${AGENTMAIL_API_KEY}
addresses:
- address: "assistant@yourdomain.com"
purpose: "General correspondence"
- address: "reports@yourdomain.com"
purpose: "Automated report delivery"
# Handling incoming email
def on_email_received(email):
"""Process incoming email and generate a response."""
sender = email["from"]
subject = email["subject"]
body = email["body"]
# Agent analyzes the email
analysis = agent.process(
f"Incoming email from {sender}, "
f"subject: {subject}. Body: {body}"
)
# Draft response if appropriate
if analysis.should_respond:
send_email(
to=sender,
subject=f"Re: {subject}",
body=analysis.draft_response
)
The Multi-Modal Input System
When you combine these three integrations, you create a multi-modal input system where your agent can receive information through multiple channels:
| Input Method | When to Use | Agent Action |
|---|---|---|
| Text (Telegram/Discord) | At your desk, precise instructions | Direct processing |
| Voice (Whisper) | On the go, brainstorming | Transcribe then process |
| Email (AgentMail) | Formal communication, external contacts | Parse, respond, or escalate |
| Memory (Obsidian) | Background reference | Retrieve relevant context |
The power is in the combination. You send a voice note while walking: "Research the competitor's pricing page and email me a summary." The agent transcribes your voice, performs web research, stores findings in Obsidian, and emails you a formatted summary — all triggered by a single voice message.
Key takeaway: Memory, voice, and email transform your agent from a simple conversational interface into a knowledge-accumulating, multi-modal system. Obsidian gives it persistent memory, Whisper gives it ears, and AgentMail gives it a professional communication channel. Together, they create an agent that grows more capable over time.
Next: Keeping your agent running around the clock with mission control and always-on operations. :::