How to Configure Multi-Model Gateway Failover on Hetzner

How should you set up gateway failover on Hetzner?

The safest Hetzner setup for a multi-model OpenClaw gateway is two gateway nodes on a private Hetzner Network, spread across a placement group, with a single public entry point in front of them and explicit provider failover rules behind them. Hetzner gives you the infrastructure pieces for this, especially private Networks, Load Balancers, placement groups, Floating IPs, backups, and volumes (Networks, Load Balancers, placement groups, Floating IPs).

The rest is on you: route model traffic intentionally, keep secrets and config consistent, and test whether failover works when a provider breaks in an ugly way instead of a neat one.

Quick answer

If you want the short version:

run at least two gateway nodes
attach them to one private Hetzner Network
spread them with a placement group
put one public entry point in front, usually a Load Balancer
configure provider failover rules separately from instance failover
test node loss, timeout loss, bad-key loss, and quota loss

Most teams do the first four and skip the last two. That is where the trouble starts.

Why instance failover and provider failover are different

It helps to separate two problems:

"Can traffic reach a healthy gateway node?"
"Can the gateway reach a healthy model provider?"

Hetzner can help with the first one. Its Load Balancers support target checks and can steer traffic away from dead nodes (Load Balancers). That does not solve the second problem. A healthy gateway process can still be pointing at a provider that is timing out, rate-limited, or rejecting auth.

So you need two failover layers:

infrastructure failover between gateway nodes
provider failover between model vendors or model classes

What is the best Hetzner layout?

For most OpenClaw teams, the practical layout looks like this:

Client traffic
     |
     v
Hetzner Load Balancer
     |
  +--+--+
  |     |
  v     v
Gateway node A   Gateway node B
  |     private Hetzner Network   |
  +-----------+-------------------+
              |
              v
      Provider routing policy
      -> primary provider
      -> fallback provider

The reason to use a private Network is simple. Hetzner Networks let your cloud servers talk over private IP space instead of spraying internal traffic across public addresses (Networks).

The reason to use a placement group is just as practical. Hetzner's spread placement groups are designed to reduce the chance that related servers land on the same underlying hardware fault domain (placement groups).

Should you use a Load Balancer or a Floating IP?

Usually a Load Balancer.

Hetzner Floating IPs are useful when you want to move one public address from one server to another (Floating IP FAQ). That is fine for active-passive setups. It is less attractive when you want automatic traffic distribution and health checks across multiple gateway nodes.

A Load Balancer is the better default when:

you want active health checks
you want more than one node live at once
you want cleaner cutover during a server failure

A Floating IP still makes sense for simpler active-passive designs or for services that you do not want behind a balancer. For most OpenClaw gateway cases, the Load Balancer is the safer default.

Which Load Balancer mode should you use?

Hetzner Load Balancers support TCP, HTTP, and HTTPS services (Load Balancers overview). OpenClaw's gateway uses one multiplexed port for real-time traffic and HTTP-style control paths, so TCP is often the safer first choice if you do not want a layer-7 proxy making assumptions about the gateway protocol. That is an implementation recommendation, not a Hetzner requirement.

If your gateway surface is strictly HTTP and you want path-aware behavior, HTTP or HTTPS can be reasonable. Most teams should start simpler.

How should config and secrets be shared?

Do not hand-edit two gateway nodes and hope they stay the same.

At minimum, keep:

the same routing policy on both nodes
the same provider key inventory on both nodes
the same health endpoint behavior on both nodes
the same deploy process on both nodes

If you need persistent state or shared assets, Hetzner Volumes can help, but try not to couple failover to a fragile shared disk unless you really need it (Volumes).

The more reliable pattern is stateless gateway nodes plus a clean config distribution path.

What should the provider failover policy look like?

Keep it boring. Boring is good here.

Start with:

one primary provider per workload class
one fallback provider per workload class
one rule for retryable failures
one rule for fail-closed workflows

Example:

code tasks -> Provider A, fallback Provider B
long-context tasks -> Provider B, fallback Provider A
cost-sensitive batch tasks -> cheaper model first, more capable model second

What you do not want is a fuzzy policy that changes by instinct during an incident.

What should you test on Hetzner before going live?

Test these on purpose:

Kill one gateway node

Confirm the Load Balancer removes it and traffic continues through the other node.

Break the primary provider

Do not just stop the node. Break the upstream model path and confirm fallback actually happens.

Expire or revoke the fallback key

If the secondary path is never tested, it will eventually disappoint you at the worst time.

Reboot both nodes one at a time

This is the basic maintenance test. If you cannot patch a node without drama, the setup is not done.

Restore from backup assumptions

Hetzner supports backups and snapshots, but you should still verify what "restore" means for your actual gateway config and secrets path (Backups and snapshots).

When is a single-node Hetzner gateway still acceptable?

Sometimes the honest answer is "right now we can only run one node."

That can still be acceptable if you keep the scope narrow and do the basics well:

private network discipline
clean secret handling
secondary provider already configured
written restore process
tested provider failover

That is not high availability. It is just a much less fragile single-node deployment.

Hetzner references: Networks overview, Load Balancers overview, Floating IP FAQ, placement groups, Volumes overview, Backups and snapshots
OpenClaw references: documentation home, Gateway protocol, Gateway FAQ
Related internal reading: Understanding Multi-Model AI Gateways, How to Run OpenClaw on a Private VPS

How to Configure Multi-Model Gateway Failover on Hetzner

How should you set up gateway failover on Hetzner?

Quick answer

Why instance failover and provider failover are different

What is the best Hetzner layout?

Should you use a Load Balancer or a Floating IP?

Which Load Balancer mode should you use?

How should config and secrets be shared?

What should the provider failover policy look like?

What should you test on Hetzner before going live?

Kill one gateway node

Break the primary provider

Expire or revoke the fallback key

Reboot both nodes one at a time

Restore from backup assumptions

When is a single-node Hetzner gateway still acceptable?

FAQ

Do I need both a Load Balancer and a Floating IP?

Can Hetzner handle private traffic between gateway nodes?

Is node failover enough?

Sources and notes

準備部署你的 AI 雲了嗎？

延伸閱讀

Best Multi-Model Gateway Provider Routing Setup on Google Cloud

How to Configure a Managed LLM Gateway on Hetzner

How to Prevent Provider Failover Gaps in OpenClaw on Google Cloud