Dynamic Nginx configuration for Docker with Python
In my previous post, I wrote about my multi-container setup with docker-compose
.
One of the benefits of docker-compose
is that it makes it easy to scale; for example, to scale the web
service, you can simply run:
If the web
service is currently running 1 container, docker-compose
will spin up an additional one. If you have nginx fronting the web
service (like in my setup), we’d expect nginx to round-robin requests between the two web
containers, right?
Let’s check it for ourselves. Recall how the nginx config looks like:
server { listen 80; server_name localhost; location / { .... proxy_pass http://flaskapp:5090; } }
where flaskapp
is the service name in docker-compose.yml
. Docker’s embedded DNS server resolves the service name to the actual container IPs.
It implements DNS round-robin, so a client sees the list of IPs shuffled each time it resolves the service name. Let’s confirm this.
Assuming 11d3838afca6c
is the nginx container id:
Cool, that works. Now, let’s run some curl requests to make sure that the HTTP requests made via nginx are indeed round-robined across the 2 containers:
Wait, what!? All requests seem to be going to flaskapp_1
! Turns out nginx caches DNS resolutions until the next reload. This excellent post goes into detail about how this works and suggests some workarounds.
The Workaround
TL;DR of the blog post mentioned - if you want to avoid a hefty $2000 per instance license for NGINX Plus, write your configuration like this:
resolver 127.0.0.11; set $backends flaskapp location / { .... proxy_pass http://$backends:5090; }
Note that 127.0.0.11
is the IP of Docker’s embedded DNS server. Using variables forces resolution through a resolver
which we point at the embedded
DNS server.
The resulting request distribution looks much more equitable now:
flaskapp_1 | 172.21.0.3 - - [27/Sep/2017 04:52:08] "GET / HTTP/1.0" 200 - flaskapp_2 | 172.21.0.3 - - [27/Sep/2017 04:52:08] "GET / HTTP/1.0" 200 - flaskapp_1 | 172.21.0.3 - - [27/Sep/2017 04:52:08] "GET / HTTP/1.0" 200 - flaskapp_2 | 172.21.0.3 - - [27/Sep/2017 04:52:08] "GET / HTTP/1.0" 200 - flaskapp_1 | 172.21.0.3 - - [27/Sep/2017 04:52:08] "GET / HTTP/1.0" 200 -
Can we do better?
Like all workarounds, there are drawbacks to the above approach. Sine we can’t specify an upstream
block, we lose all the nice features
provided by nginx’s upstream module like load-balancing policies,
weights and health checks.
If we could somehow update the upstream
list dynamically as docker adds/removes containers, then reload nginx on the fly,
we could have the best of both worlds. Docker provides a handy event stream
that we can hook into and listen for container lifecycle events. There are tools out there that already do this, which leads me to my
disclaimer:
DISCLAIMER: The following section talks about a python script I wrote merely for learning purposes. It is not production-quality code and serious deployments should use widely-used tools like traefik and nginx-proxy
With that out of the way, let’s start by installing the docker python sdk:
We want to listen for container lifecycle events, so let’s start with a skeleton to capture those:
But this captures events for all containers managed by our system Docker. We only want to capture container events for our web
service.
One way could be to filter by container name - as of version 1.14.0 of docker-compose, container names seem to follow a format of
$project_$service_$index, so if we have 2 containers of the flaskapp
service in a project called myproj
, they would have
names myproj_flaskapp_1
and myproj_flaskapp_2
. However, relying on implementation details seems wrong; there must be a better way.
Introducing labels
Labels are key-value metadata that can be attached to Docker objects including containers, images and even networks. If we annotate
services with custom labels, we can reliably identify them from the event stream. Let’s add a label to our nginx
and flaskapp
services in docker-compose.yml (for the full docker-compose.yml, refer to my previous post):
flaskapp: ... labels: com.ameyalokare.type: web nginx: ... labels: com.ameyalokare.type: nginx
Now we can update the event_filters
to include a label:
event_filters = {'type': 'container', 'label': 'com.ameyalokare.type=web'}
Finding the list of upstream container IPs
Our python script needs to maintain a list of currently running web
containers, and if this list changes, we’ll reload nginx.
Let’s use a dict mapping container_ids to container_objects for this:
Great, so now all we need to do is render the list of container IPs in the nginx upstream
block and reload it.
Turns out, getting the IP from a Container object is not
exactly intuitive. There is no ip
property or getIP()
method, and even then a container could have multiple IPs since it can be connected
to multiple networks. The best way I could find was to traverse the attrs
property:
where web_nw
is the user-defined network that both nginx and the web containers are connected to.
We can now render these IPs into an nginx config using our favorite templating framework like jinja2
I’m lazy so I just rolled my own with str.replace()
😁
Reloading nginx
Reloading nginx after changing the config turns out to be trivial. We simply need to send it a SIGHUP
:
Summary
Deploying nginx in a dynamic container environment takes a little work, especially if you don’t want to pay the big bucks for NGINX Plus. I wrote a low-tech python script for learning how things work under the hood; find it on my Github. There are open-source reverse-proxy solutions that are built specifically for container environments, like traefik. Traefik obviates the need for nginx altogether, but if you still want to run nginx, consider nginx-proxy.