Published dispatch Apr 11, 2026

Tailscale is the AI control plane

Every service in my homelab talks over a Tailscale mesh. Every agent gets scoped access via ACL tags. The network layer does what RBAC frameworks wish they could.

For AI agents to be useful on home infrastructure, they need access to many services across multiple hosts, from anywhere, with selective permissions, which is a non-trivial networking problem. Tailscale solves essentially all of it when used properly. This is what (I think) "used properly" looks like after a month and a half of running agentic workloads on a homelab.

Filed

April 11, 2026

Published on galexc.me/dispatches

Read time

23 min

A full sit-down on agent systems.

Every Service Is a Node

Every service gets a Tailscale sidecar container and the application container shares the sidecar’s network namespace. From the tailnet’s perspective, each service is its own node with its own hostname and its own tags.

Service = Name = Identity

meilisearch → search.crested-altair.ts.net tag:meilisearch

forgejo → git.crested-altair.ts.net tag:forgejo

postgres → postgres.crested-altair.ts.net tag:postgres

ntfy → ntfy.crested-altair.ts.net tag:ntfy

grafana → grafana.crested-altair.ts.net tag:monitoring

hub → hub.crested-altair.ts.net tag:hub-web

Zero host port bindings! Meilisearch is not listening on 192.168.1.x:7700, it’s listening on search.crested-altair.ts.net. The only way to reach it is through the tailnet.

services:
  tailscale-sidecar:
    image: tailscale/tailscale
    hostname: search
    environment:
      - TS_AUTHKEY=${TS_AUTHKEY}
      - TS_EXTRA_ARGS=--advertise-tags=tag:meilisearch

  meilisearch:
    image: getmeili/meilisearch
    network_mode: service:tailscale-sidecar

network_mode: service:tailscale-sidecar is the whole trick, as the application container has no network identity of its own and instead inherits the sidecar’s. Move the stack to a different host, a different VM, a different continent - the MagicDNS name stays the same, the tag stays the same, the ACL rules still apply.

Case-in-point: I moved services between a bare-metal Beelink and a Proxmox VM during buildout and I didn’t have to even think about networking.

Three Tiers of Agent Access

This is where Tailscale stops being “just a VPN” and becomes a control plane.

Every dispatched agent job runs in a Docker container with its own Tailscale sidecar, tagged based on the repo’s trust tier - and that tag determines what the agent can reach. Three tiers, default-deny:

tag:agent-admin infrastructure repo

forgejo internet meilisearch postgres ntfy grafana prometheus loki SSH proxmox

tag:agent trusted repos

forgejo internet meilisearch postgres ntfy grafana prometheus loki SSH proxmox

tag:agent-external third-party repos

forgejo internet meilisearch postgres ntfy grafana prometheus loki SSH proxmox

An agent running code from a third-party repo physically cannot reach internal services - not at the application layer, not at the container layer, but at the network layer. The dispatch system assigns the tag at container launch time, and the ACL denies everything that tag doesn’t explicitly allow.

// Trusted: repos, search, notifications, database, internet
{"action": "accept", "src": ["tag:agent"],
 "dst": ["tag:forgejo:443", "tag:meilisearch:443",
         "tag:postgres:443", "tag:ntfy:443",
         "autogroup:internet:*"]}

// External: repos and internet only. Nothing else.
{"action": "accept", "src": ["tag:agent-external"],
 "dst": ["tag:forgejo:443", "autogroup:internet:*"]}

The ACL file has 25 explicit deny tests. They outnumber the allow rules. Most of them verify what agents cannot do - cannot SSH to workers, cannot reach Proxmox, cannot touch services outside their tier. If someone adds a rule that accidentally opens a path from tag:agent-external to Postgres, the test fails on push.

The Tests Are the Spec

The policy file has built-in tests, and just tailscale acl-push runs them on every push - if a test fails, the policy is rejected.

ACL deny tests - what matters is what's blocked

tag:agent-external ✕ tag:postgres:5432

tag:agent-external ✕ tag:meilisearch:443

tag:agent ✕ tag:worker:22

tag:hub-web ✕ autogroup:internet

tag:forgejo-runner ✕ tag:postgres:5432

tag:syncthing ✕ tag:worker:22

When I added NAS admin access this week, the first thing I wrote was the deny tests - agents cannot reach it, Syncthing cannot reach it, only humans can. Then I wrote the allow rule and pushed. If I’d gotten the rule wrong, the deny tests would have caught it before it went live.

Dumb Appliances, Smart Access

Not every device runs Tailscale. My NAS is a LAN-only storage appliance with a web admin UI - it’s not a Tailscale node and never will be.

NAS proxy pattern - appliance stays dumb, tailnet handles access

data plane → pve-worker-01 → 192.168.1.251:/volume1/agent-data NFS · LAN only

admin plane → galexc-nas.crested-altair.ts.net → reverse proxy → :9443 tag:nas-admin

A managed reverse proxy on pve-worker-01 publishes the admin UI to the tailnet - storage traffic stays on the LAN via NFS while admin access goes through the tailnet with its own tag and ACL rules. I can reach the NAS admin panel from a laptop on airport wifi, and the NAS has no idea what Tailscale is.

This pattern generalizes to any appliance with a web UI and no Tailscale support: reverse proxy with a tagged sidecar, and let the tailnet handle identity and access.

Service-to-Service Is Just ACL Rules

When I built the Gmail sync pipeline, every service-to-service hop was a Tailscale connection.

one external hop, then tailnet fan-out

◌internet

Gmail API OAuth2 message fetch

↓

gmail-sync job container on the tailnet

↓

✓tailnet

store messages postgres.ts.net

full-text index search.ts.net

push alerts ntfy.ts.net

One public API call in. Everything else stays on the tailnet under one policy boundary.

There’s no separate service mesh to operate, no mTLS certificates to manage, no Consul or Envoy - the ACL policy is the service mesh, and each service-to-service path is an explicit line in that file. Tailscale Serve provides automatic HTTPS with valid certificates for every MagicDNS name - I didn’t configure a single one.

The hub dashboard is a good example of the opposite constraint. It can reach health endpoints on every service but cannot reach the internet:

// Hub can health-check services
{"action": "accept", "src": ["tag:hub-web"],
 "dst": ["tag:monitoring:443", "tag:forgejo:443",
         "tag:meilisearch:443", "tag:ntfy:443",
         "tag:galexc-api:443", "tag:postgres:443"]}

// Deny test: hub cannot reach internet
{"src": "tag:hub-web", "deny": ["1.2.3.4:443"]}

That is not a feature of the dashboard. It is a feature of the network.

Infrastructure Testing Over the Overlay

I have a service called “chaos monkey” (I’ll likely write about this in the future), which probes services over their Tailscale endpoints - it doesn’t shell into containers or query Docker directly, currently it just hits MagicDNS names and checks what comes back (though I’ll be expanding its destructiveness over time).

chaos monkey - first real run caught a live bug

probe → postgres.crested-altair.ts.net unhealthy ×20050

docker health check: CMD-SHELL in distroless image - no shell

actual API: healthy - the overlay told the real story

The first run caught PostgREST with a failing healthcheck streak of 20,050. The service itself was healthy - the Docker health check was trying to run a shell command inside a distroless image with no shell. The chaos monkey reached PostgREST over the tailnet, confirmed the API was fine, and the health check metadata told the real story. A container can show as “up” in docker ps while its Tailscale sidecar has quietly lost connectivity - the only way to catch that is to test from outside the host, over the tailnet, the way real clients actually reach those services.

What This Actually Feels Like

I work from two machines (a Mac Mini at home and a MacBook Pro on the go) along with my phone. All of my devices are default on the tailnet. When I dispatch an agent job from the MacBook at a coffee shop, that job runs on pve-worker-01 at home. The agent clones from git.crested-altair.ts.net, queries search.crested-altair.ts.net, writes results to postgres.crested-altair.ts.net, and sends me a notification via ntfy.crested-altair.ts.net. I see the notification on my phone. None of this required opening a port on my router.

When I add a new service, the workflow is: write a Docker Compose stack with a Tailscale sidecar, add ACL rules, add deny tests, push the policy, deploy with Ansible. The new service appears on the tailnet within seconds with a MagicDNS name. Every service that should reach it already can, and every service that shouldn’t already can’t - because the default is deny.

When something breaks, I check the hub dashboard from wherever I am. If a health check fails, the problem is either the service or the overlay - and the chaos monkey can distinguish between the two. I don’t need a VPN client, I don’t need to remember port numbers - I just open hub.crested-altair.ts.net and everything is there.

The Pattern

Tailscale isn’t remote access for your homelab - it’s the control plane.

Every service gets a sidecar and a tag - the tag is its identity. The ACL policy becomes your service mesh, your RBAC system, and your certificate authority, all in one file, version-controlled, with tests. Default-deny means new services start isolated, explicit rules open exactly the paths you need, and deny tests catch regressions before they go live.

For agentic workloads this matters more than it does for traditional homelabs. Agents need access to many services, selectively, and you need to be able to reason about what each agent can reach. An agent in the external tier physically cannot reach your database - that’s not a configuration choice at runtime, it’s a network-layer guarantee enforced before the agent’s first packet leaves its container.

If you are building infrastructure that AI agents will operate on, solve the networking layer first. Everything else gets simpler.