返回網誌

How to Configure Multi-Model Gateway Failover on Hetzner

A practical guide to running a multi-model gateway on Hetzner with fewer failover surprises. Cover node layout, health checks, private networking, and what to test before calling it reliable.

作者 Sophie Hart2026年5月25日6 分鐘閱讀

How should you set up gateway failover on Hetzner?

The safest Hetzner setup for a multi-model OpenClaw gateway is two gateway nodes on a private Hetzner Network, spread across a placement group, with a single public entry point in front of them and explicit provider failover rules behind them. Hetzner gives you the infrastructure pieces for this, especially private Networks, Load Balancers, placement groups, Floating IPs, backups, and volumes (Networks, Load Balancers, placement groups, Floating IPs).

The rest is on you: route model traffic intentionally, keep secrets and config consistent, and test whether failover works when a provider breaks in an ugly way instead of a neat one.

Quick answer

If you want the short version:

  1. run at least two gateway nodes
  2. attach them to one private Hetzner Network
  3. spread them with a placement group
  4. put one public entry point in front, usually a Load Balancer
  5. configure provider failover rules separately from instance failover
  6. test node loss, timeout loss, bad-key loss, and quota loss

Most teams do the first four and skip the last two. That is where the trouble starts.

Why instance failover and provider failover are different

It helps to separate two problems:

  • "Can traffic reach a healthy gateway node?"
  • "Can the gateway reach a healthy model provider?"

Hetzner can help with the first one. Its Load Balancers support target checks and can steer traffic away from dead nodes (Load Balancers). That does not solve the second problem. A healthy gateway process can still be pointing at a provider that is timing out, rate-limited, or rejecting auth.

So you need two failover layers:

  • infrastructure failover between gateway nodes
  • provider failover between model vendors or model classes

What is the best Hetzner layout?

For most OpenClaw teams, the practical layout looks like this:

Client traffic
     |
     v
Hetzner Load Balancer
     |
  +--+--+
  |     |
  v     v
Gateway node A   Gateway node B
  |     private Hetzner Network   |
  +-----------+-------------------+
              |
              v
      Provider routing policy
      -> primary provider
      -> fallback provider

The reason to use a private Network is simple. Hetzner Networks let your cloud servers talk over private IP space instead of spraying internal traffic across public addresses (Networks).

The reason to use a placement group is just as practical. Hetzner's spread placement groups are designed to reduce the chance that related servers land on the same underlying hardware fault domain (placement groups).

Should you use a Load Balancer or a Floating IP?

Usually a Load Balancer.

Hetzner Floating IPs are useful when you want to move one public address from one server to another (Floating IP FAQ). That is fine for active-passive setups. It is less attractive when you want automatic traffic distribution and health checks across multiple gateway nodes.

A Load Balancer is the better default when:

  • you want active health checks
  • you want more than one node live at once
  • you want cleaner cutover during a server failure

A Floating IP still makes sense for simpler active-passive designs or for services that you do not want behind a balancer. For most OpenClaw gateway cases, the Load Balancer is the safer default.

Which Load Balancer mode should you use?

Hetzner Load Balancers support TCP, HTTP, and HTTPS services (Load Balancers overview). OpenClaw's gateway uses one multiplexed port for real-time traffic and HTTP-style control paths, so TCP is often the safer first choice if you do not want a layer-7 proxy making assumptions about the gateway protocol. That is an implementation recommendation, not a Hetzner requirement.

If your gateway surface is strictly HTTP and you want path-aware behavior, HTTP or HTTPS can be reasonable. Most teams should start simpler.

How should config and secrets be shared?

Do not hand-edit two gateway nodes and hope they stay the same.

At minimum, keep:

  • the same routing policy on both nodes
  • the same provider key inventory on both nodes
  • the same health endpoint behavior on both nodes
  • the same deploy process on both nodes

If you need persistent state or shared assets, Hetzner Volumes can help, but try not to couple failover to a fragile shared disk unless you really need it (Volumes).

The more reliable pattern is stateless gateway nodes plus a clean config distribution path.

What should the provider failover policy look like?

Keep it boring. Boring is good here.

Start with:

  • one primary provider per workload class
  • one fallback provider per workload class
  • one rule for retryable failures
  • one rule for fail-closed workflows

Example:

  • code tasks -> Provider A, fallback Provider B
  • long-context tasks -> Provider B, fallback Provider A
  • cost-sensitive batch tasks -> cheaper model first, more capable model second

What you do not want is a fuzzy policy that changes by instinct during an incident.

What should you test on Hetzner before going live?

Test these on purpose:

Kill one gateway node

Confirm the Load Balancer removes it and traffic continues through the other node.

Break the primary provider

Do not just stop the node. Break the upstream model path and confirm fallback actually happens.

Expire or revoke the fallback key

If the secondary path is never tested, it will eventually disappoint you at the worst time.

Reboot both nodes one at a time

This is the basic maintenance test. If you cannot patch a node without drama, the setup is not done.

Restore from backup assumptions

Hetzner supports backups and snapshots, but you should still verify what "restore" means for your actual gateway config and secrets path (Backups and snapshots).

When is a single-node Hetzner gateway still acceptable?

Sometimes the honest answer is "right now we can only run one node."

That can still be acceptable if you keep the scope narrow and do the basics well:

  • private network discipline
  • clean secret handling
  • secondary provider already configured
  • written restore process
  • tested provider failover

That is not high availability. It is just a much less fragile single-node deployment.

FAQ

Do I need both a Load Balancer and a Floating IP?

Usually no. For most multi-node gateway setups, start with the Load Balancer. Add a Floating IP only if you have a separate active-passive need.

Can Hetzner handle private traffic between gateway nodes?

Yes. Hetzner Networks are built for private connectivity between your cloud servers.

Is node failover enough?

No. Instance failover does not guarantee provider failover. You still need explicit model-routing rules.

Sources and notes

準備部署你的 AI 雲了嗎?

3 分鐘內啟動你的專屬 AI 基礎架構,無需複雜設定。

Not sure which path fits your deployment? Talk to us

延伸閱讀

同一組 Agent、基礎架構與部署主題下的相關文章。