This post talks about Caddy, a HTTP server that’s easy to get up and running, lightweight and has a module for exposing metrics in the native Prometheus format, so we we like it a lot. In this case we’re using Caddy to host a small static gallery, generated from the image post-processing suite Lightroom.
Caddy runs in a FreeBSD jail (OS-level virtualisation), which is hosted on a fairly powerful physical machine. This machine hosts a bunch of different jails and also serves as a NAS.
The host has a nullfs mount into the Caddy jail, so images appear as being on the local file system. This makes the photographer’s workflow easy - exported gallery is automatically present on the webserver and therefore on the public Internet w/o any additional actions needed. Great, everyone now can browse the gallery and it all just works.
However, since the audience for the gallery is distributed across the world, I also setup Cloudflare CDN in front of the site. The idea is to cache content on the CDN, so that its closer to the visitors, no matter where they are located geographically.
Now if you ever ran a HTTP server behind a proxy, you know that the HTTP client making requests to your webserver is now the proxy and not the end user. This is also reflected in your webserver logs, where suddenly all requests appear to have been made by the proxy.
Squid cache, one of the early open source HTTP proxies, introduced a new HTTP request header to address this problem, X-Forwarded-For. The format of the header is simple and is intended to preserve IP of not only the client, but any intermediate proxies, which append their IP to the list:
X-Forwarded-For: client, proxy1, proxy2
(Actually, there’s a new standard header aimed at replacing
X-Forwarded-For
, called Forwarded
and is described in RFC
7239, but Caddy doesn’t support it
yet so we won’t talk about it any more in this post.)
So back to the problem at hand. We can simply update our log format to
use the address in X-Forwarded-For
instead of the client IP. An
alternative solution available with Caddy, is to use a 3rd party
middleware http.realip,
which, when enabled, restores the original client IP.
Make sure your Caddy is built with this extension and then enable it
with a single line in the Caddyfile
:
realip cloudflare
With this in place, the value of the remote
variable in the logs is
replaced with the value of X-Forwarded-For
, problem solved.
This is all well and good, but I didn’t like losing the Cloudflare proxy addresses from the logs. The Cloudflare IPs carry useful information as to which Cloudflare edge locations are connecting to Caddy and to measure latency back to them, when I feel like some probes.
So what can we do to have both? Build a custom log format of course!
Here’s what we came up with, which also includes the request headers
that Cloudflare sends us, like Cf-Ipcountry
and Cf-Visitor
:
log / access.log "{>X-Forwarded-For} - {user} [{when}] \"{method} {uri} {proto}\"
{status} {size} \"{>Referer}\" \"{>User-Agent}\" \"{tls_version}\"
\"{tls_cipher}\" \"{>Cf-Ipcountry}\" \"{Cf-Ray}\" \"{>Cf-Visitor}\"
\"{>X-Forwarded-Proto}\" {remote}"
Note that this needs to be specified as a single line, I’ve spit it into several lines only to make it more readable.
Put that in your Caddyfile
and reload Caddy with killall -SIGUSR1
caddy
to make it pick up the changes and you’re ready to go.
At this point you might be thinking that X-Forwarded-For
header can
contain multiple addresses in it in case the request passed through
several proxies (e.g. I could be running
Varnish caching server before Caddy). If that
were the case and we cared (like our log processing pipeline expects log
line to start with only a single IP address), we could configure Varnish
to preserve the original connecting IP in a separate, custom, header and
use that in our Caddy logs. Implementation of this, as always, is left
as an exercise for the reader.