Published dispatch Apr 11, 2026

Tailscale Is the AI Control Plane

Every service in my homelab talks over a Tailscale mesh. Every agent gets scoped access via ACL tags. The network layer does what RBAC frameworks wish they could.

For AI agents to be useful on home infrastructure, they need access to many services across multiple hosts, from anywhere, with selective permissions, which is a non-trivial networking problem. Tailscale solves essentially all of it when used properly. This is what (I think) "used properly" looks like after a month and a half of running agentic workloads on a homelab.

Filed

April 11, 2026

Published on galexc.me/dispatches

Read time

23 min

Buckle up, this can get in the weeds a bit.

Tags

#infrastructure

Editorial note

Co-authored by Gilman and GalexC.

tailnet stats

Some high level stats before the long-form writeup.

17
tailnet nodes
services, workers, agents
17
ACL tags
one per service identity
25
ACL deny tests
0
host port bindings

You want AI agents that do real work against your infrastructure - cloning repos, querying databases, searching indexes, sending notifications. That means network access to real services, on real hosts, that are always on.

But wait, there's more! You need selective access. The agent reviewing third-party code should not see your email index. The CI runner should not SSH into production. The dashboard should reach health endpoints but not the internet.

At the end of the day, this is not a simple port forwarding task. It requires RBAC at the network layer, which is something I doubt you are going to want to build with a spiderweb of iptables and nginx configs :)

Without Tailscale
Worker VM
postgres :5432
meilisearch :7700
grafana :3000
ntfy :80
dispatch API :8080
firewall ×5 certs ×5 DNS ×5
Bare Metal
forgejo :3000
forgejo SSH :222
firewall ×2 cert DNS
NAS Appliance
admin UI :9443
NFS :2049
no remote access LAN only
8 services. 3 hosts. Every cross-host call is a port, a firewall rule, a cert (and have fun when running multiple instances of the same service that want the same port).

Tailscale as the control plane for a multi-host homelab running agentic AI workloads.

When done properly every service gets a sidecar container on the mesh, every identity gets a tag, every tag gets scoped access via ACL policy committed to the repo (IaC!). Zero host port bindings. Secure access from anywhere. Granular agent permissions. One tool.

With Tailscale
postgres postgres.crested-altair.ts.net tag:postgres
meilisearch search.crested-altair.ts.net tag:meilisearch
forgejo git.crested-altair.ts.net tag:forgejo
grafana grafana.crested-altair.ts.net tag:monitoring
NAS admin galexc-nas.crested-altair.ts.net tag:nas-admin
ext. agent postgres.crested-altair.ts.net denied
Same services. Named, tagged, policy-controlled.

Let's dive in...

Every Service Is a Node

Every service gets a Tailscale sidecar container and the application container shares the sidecar’s network namespace. From the tailnet’s perspective, each service is its own node with its own hostname and its own tags.

Service = Name = Identity
meilisearch search.crested-altair.ts.net tag:meilisearch
forgejo git.crested-altair.ts.net tag:forgejo
postgres postgres.crested-altair.ts.net tag:postgres
ntfy ntfy.crested-altair.ts.net tag:ntfy
grafana grafana.crested-altair.ts.net tag:monitoring
hub hub.crested-altair.ts.net tag:hub-web

Zero host port bindings! Meilisearch is not listening on 192.168.1.x:7700, it’s listening on search.crested-altair.ts.net. The only way to reach it is through the tailnet.

services:
  tailscale-sidecar:
    image: tailscale/tailscale
    hostname: search
    environment:
      - TS_AUTHKEY=${TS_AUTHKEY}
      - TS_EXTRA_ARGS=--advertise-tags=tag:meilisearch

  meilisearch:
    image: getmeili/meilisearch
    network_mode: service:tailscale-sidecar

network_mode: service:tailscale-sidecar is the whole trick, as the application container has no network identity of its own and instead inherits the sidecar’s. Move the stack to a different host, a different VM, a different continent - the MagicDNS name stays the same, the tag stays the same, the ACL rules still apply.

Case-in-point: I moved services between a bare-metal Beelink and a Proxmox VM during buildout and I didn’t have to even think about networking.

Three Tiers of Agent Access

This is where Tailscale stops being “just a VPN” and becomes a control plane.

Every dispatched agent job runs in a Docker container with its own Tailscale sidecar, tagged based on the repo’s trust tier - and that tag determines what the agent can reach. Three tiers, default-deny:

tag:agent-admin infrastructure repo
forgejo internet meilisearch postgres ntfy grafana prometheus loki SSH proxmox
tag:agent trusted repos
forgejo internet meilisearch postgres ntfy grafana prometheus loki SSH proxmox
tag:agent-external third-party repos
forgejo internet meilisearch postgres ntfy grafana prometheus loki SSH proxmox

An agent running code from a third-party repo physically cannot reach internal services - not at the application layer, not at the container layer, but at the network layer. The dispatch system assigns the tag at container launch time, and the ACL denies everything that tag doesn’t explicitly allow.

// Trusted: repos, search, notifications, database, internet
{"action": "accept", "src": ["tag:agent"],
 "dst": ["tag:forgejo:443", "tag:meilisearch:443",
         "tag:postgres:443", "tag:ntfy:443",
         "autogroup:internet:*"]}

// External: repos and internet only. Nothing else.
{"action": "accept", "src": ["tag:agent-external"],
 "dst": ["tag:forgejo:443", "autogroup:internet:*"]}

The ACL file has 25 explicit deny tests. They outnumber the allow rules. Most of them verify what agents cannot do - cannot SSH to workers, cannot reach Proxmox, cannot touch services outside their tier. If someone adds a rule that accidentally opens a path from tag:agent-external to Postgres, the test fails on push.

The Tests Are the Spec

The policy file has built-in tests, and just tailscale acl-push runs them on every push - if a test fails, the policy is rejected.

ACL deny tests - what matters is what's blocked
tag:agent-external tag:postgres:5432
tag:agent-external tag:meilisearch:443
tag:agent tag:worker:22
tag:hub-web autogroup:internet
tag:forgejo-runner tag:postgres:5432
tag:syncthing tag:worker:22

When I added NAS admin access this week, the first thing I wrote was the deny tests - agents cannot reach it, Syncthing cannot reach it, only humans can. Then I wrote the allow rule and pushed. If I’d gotten the rule wrong, the deny tests would have caught it before it went live.

Dumb Appliances, Smart Access

Not every device runs Tailscale. My NAS is a LAN-only storage appliance with a web admin UI - it’s not a Tailscale node and never will be.

NAS proxy pattern - appliance stays dumb, tailnet handles access
data plane pve-worker-01 192.168.1.251:/volume1/agent-data NFS · LAN only
admin plane galexc-nas.crested-altair.ts.net reverse proxy :9443 tag:nas-admin

A managed reverse proxy on pve-worker-01 publishes the admin UI to the tailnet - storage traffic stays on the LAN via NFS while admin access goes through the tailnet with its own tag and ACL rules. I can reach the NAS admin panel from a laptop on airport wifi, and the NAS has no idea what Tailscale is.

This pattern generalizes to any appliance with a web UI and no Tailscale support: reverse proxy with a tagged sidecar, and let the tailnet handle identity and access.

Service-to-Service Is Just ACL Rules

When I built the Gmail sync pipeline, every service-to-service hop was a Tailscale connection.

one external hop, then tailnet fan-out
internet
Gmail API OAuth2 message fetch
gmail-sync job container on the tailnet
tailnet
1
store messages postgres.ts.net
2
full-text index search.ts.net
3
push alerts ntfy.ts.net
One public API call in. Everything else stays on the tailnet under one policy boundary.

There’s no separate service mesh to operate, no mTLS certificates to manage, no Consul or Envoy - the ACL policy is the service mesh, and each service-to-service path is an explicit line in that file. Tailscale Serve provides automatic HTTPS with valid certificates for every MagicDNS name - I didn’t configure a single one.

The hub dashboard is a good example of the opposite constraint. It can reach health endpoints on every service but cannot reach the internet:

// Hub can health-check services
{"action": "accept", "src": ["tag:hub-web"],
 "dst": ["tag:monitoring:443", "tag:forgejo:443",
         "tag:meilisearch:443", "tag:ntfy:443",
         "tag:galexc-api:443", "tag:postgres:443"]}

// Deny test: hub cannot reach internet
{"src": "tag:hub-web", "deny": ["1.2.3.4:443"]}

That is not a feature of the dashboard. It is a feature of the network.

Infrastructure Testing Over the Overlay

I have a service called “chaos monkey” (I’ll likely write about this in the future), which probes services over their Tailscale endpoints - it doesn’t shell into containers or query Docker directly, currently it just hits MagicDNS names and checks what comes back (though I’ll be expanding its destructiveness over time).

chaos monkey - first real run caught a live bug
probe postgres.crested-altair.ts.net unhealthy ×20050
docker health check: CMD-SHELL in distroless image - no shell
actual API: healthy - the overlay told the real story

The first run caught PostgREST with a failing healthcheck streak of 20,050. The service itself was healthy - the Docker health check was trying to run a shell command inside a distroless image with no shell. The chaos monkey reached PostgREST over the tailnet, confirmed the API was fine, and the health check metadata told the real story. A container can show as “up” in docker ps while its Tailscale sidecar has quietly lost connectivity - the only way to catch that is to test from outside the host, over the tailnet, the way real clients actually reach those services.

What This Actually Feels Like

I work from two machines (a Mac Mini at home and a MacBook Pro on the go) along with my phone. All of my devices are default on the tailnet. When I dispatch an agent job from the MacBook at a coffee shop, that job runs on pve-worker-01 at home. The agent clones from git.crested-altair.ts.net, queries search.crested-altair.ts.net, writes results to postgres.crested-altair.ts.net, and sends me a notification via ntfy.crested-altair.ts.net. I see the notification on my phone. None of this required opening a port on my router.

When I add a new service, the workflow is: write a Docker Compose stack with a Tailscale sidecar, add ACL rules, add deny tests, push the policy, deploy with Ansible. The new service appears on the tailnet within seconds with a MagicDNS name. Every service that should reach it already can, and every service that shouldn’t already can’t - because the default is deny.

When something breaks, I check the hub dashboard from wherever I am. If a health check fails, the problem is either the service or the overlay - and the chaos monkey can distinguish between the two. I don’t need a VPN client, I don’t need to remember port numbers - I just open hub.crested-altair.ts.net and everything is there.

The Pattern

Tailscale isn’t remote access for your homelab - it’s the control plane.

Every service gets a sidecar and a tag - the tag is its identity. The ACL policy becomes your service mesh, your RBAC system, and your certificate authority, all in one file, version-controlled, with tests. Default-deny means new services start isolated, explicit rules open exactly the paths you need, and deny tests catch regressions before they go live.

For agentic workloads this matters more than it does for traditional homelabs. Agents need access to many services, selectively, and you need to be able to reason about what each agent can reach. An agent in the external tier physically cannot reach your database - that’s not a configuration choice at runtime, it’s a network-layer guarantee enforced before the agent’s first packet leaves its container.

If you are building infrastructure that AI agents will operate on, solve the networking layer first. Everything else gets simpler.