Setting Up Your Agent Environment
Mission Control & Always-On Operations
Your agent is installed, connected to channels, and integrated with memory, voice, and email. There is one problem left — the moment you close your terminal or your machine restarts, everything stops. A truly useful agent needs to run around the clock, survive crashes, and let you monitor its activity from anywhere.
Keeping Agents Running 24/7
The simplest way to keep an agent running is to never close the terminal. But that is not a real solution. Process managers handle this properly by automatically restarting your agent if it crashes, starting it on system boot, and managing logs.
Using systemd (Linux VPS)
systemd is the standard process manager on most Linux distributions. It is already installed on your VPS.
Create a service file for your agent:
sudo nano /etc/systemd/system/openclaw-agent.service
[Unit]
Description=OpenClaw AI Agent
After=network.target
[Service]
Type=simple
User=agent
WorkingDirectory=/home/agent/my-agent
ExecStart=/usr/bin/node /usr/lib/node_modules/openclaw/bin/openclaw start
Restart=always
RestartSec=10
Environment=NODE_ENV=production
EnvironmentFile=/home/agent/.env
[Install]
WantedBy=multi-user.target
# Enable and start the service
sudo systemctl enable openclaw-agent
sudo systemctl start openclaw-agent
# Check status
sudo systemctl status openclaw-agent
# View logs
sudo journalctl -u openclaw-agent -f
The Restart=always directive ensures your agent restarts automatically after any crash. RestartSec=10 adds a 10-second delay between restarts to avoid rapid restart loops if there is a persistent error.
Using pm2 (Cross-Platform)
If you prefer a Node.js-native solution that works on both local machines and servers, pm2 is a popular process manager:
# Install pm2 globally
npm install -g pm2
# Start your agent with pm2
pm2 start openclaw -- start
# Save the process list (survives reboots)
pm2 save
# Set up startup script (auto-start on boot)
pm2 startup
# Useful pm2 commands
pm2 status # View all managed processes
pm2 logs # Stream live logs
pm2 restart all # Restart all processes
pm2 monit # Real-time monitoring dashboard
pm2 also provides a built-in monitoring dashboard (pm2 monit) that shows CPU usage, memory consumption, and restart counts — useful for quick health checks.
Remote Access and Monitoring
When your agent runs on a VPS, you need to monitor it without physically accessing the server.
SSH Access
# Connect to your VPS
ssh agent@your-server-ip
# Check agent status
sudo systemctl status openclaw-agent
# View recent logs
sudo journalctl -u openclaw-agent --since "1 hour ago"
Setting Up a Mission Control Dashboard
A mission control dashboard gives you a visual overview of your agent's activity. You can build one using a simple web interface that displays:
- Agent status — running, stopped, or error state
- Recent actions — what has the agent done in the last hour?
- Message counts — how many messages processed across channels
- Error log — any failures or exceptions
- Resource usage — CPU, memory, and API call counts
# Dashboard configuration
dashboard:
enabled: true
port: 3001
auth:
username: ${DASHBOARD_USER}
password: ${DASHBOARD_PASS}
metrics:
- agent_status
- messages_processed
- api_calls_count
- error_count
- uptime
Protect your dashboard with authentication — it exposes operational details about your agent that should not be publicly accessible.
Mobile Notifications
Monitoring from a laptop is good. Monitoring from your wrist is better. When your agent encounters errors, completes critical tasks, or needs human input, you want to know immediately.
Apple Watch Notifications via TGWatch
TGWatch is a Telegram client for Apple Watch. Since your agent already communicates through Telegram, this creates a natural notification pipeline:
- Your agent sends a Telegram message about a completed task or an error
- The Telegram notification appears on your phone
- TGWatch mirrors it to your Apple Watch
- You can read the notification and even reply with quick responses
This means your agent can tap you on the wrist when it needs attention — no need to check a dashboard or open your laptop.
For critical alerts, configure your agent to send messages to a dedicated Telegram alert channel:
# Alert configuration
alerts:
channel: telegram
chat_id: ${ALERT_CHAT_ID}
triggers:
- event: error
message: "Agent encountered an error: {error_details}"
- event: task_complete
message: "Task completed: {task_summary}"
- event: approval_needed
message: "Human approval needed: {action_description}"
Log Monitoring and Health Checks
Logs are your agent's flight recorder. When something goes wrong, logs tell you what happened and why.
Structured Logging
Configure your agent to write structured logs that are easy to search and filter:
# Logging configuration
logging:
level: info
format: json
output:
- file: /var/log/openclaw/agent.log
- stdout
rotation:
max_size: 50MB
max_files: 10
Health Check Endpoint
Set up a simple health check that external monitoring tools can ping:
# Health check configuration
health_check:
enabled: true
port: 3002
path: /health
checks:
- name: model_connection
timeout: 5s
- name: telegram_connection
timeout: 5s
- name: memory_access
timeout: 3s
# Test health check
curl http://localhost:3002/health
A health check endpoint returns a simple status response. External monitoring services can ping this endpoint regularly and alert you if it stops responding.
Practical Reliability Tips
Start with simplicity: Use pm2 or systemd, not a complex container orchestration system. You are running a single agent process, not a distributed system.
Monitor API costs: Set up daily cost alerts with your model provider. An agent stuck in a retry loop can burn through API credits quickly.
Implement graceful shutdown: When your agent receives a stop signal, it should finish current tasks before shutting down, not cut off mid-action.
Test failure recovery: Deliberately crash your agent and verify it restarts correctly. Check that memory is preserved, channel connections are re-established, and in-progress tasks are handled.
Keep logs rotated: Unrotated logs will eventually fill your disk. Configure log rotation from day one.
Key takeaway: Always-on operation requires process management, remote monitoring, and mobile notifications. Use systemd or pm2 to keep your agent running, set up a dashboard for visibility, and route critical alerts to your phone or watch. Reliability is not about preventing all failures — it is about detecting and recovering from them automatically.
Next module: Building your agent's skills and tool integrations for real-world task execution. :::