The gevent project has allowed us to transparently bolt on asynchronous functionality to otherwise synchronous python WSGI based web frameworks and servers. However, piecing together this puzzle for yourself may prove less than convenient as libraries and tutorials remain incomplete, skipping over many pertinent details. To remedy this confusion we'll put together a starter project combining gevent, django and socket.io to build a simple asynchronous application, pausing along the journey to discuss the relevant details. But first we'll detour to discuss event loops, in particular how gevent's implementation runs.

Demystifying the Event Loop

It's easy to get caught up in the buzz about event loops. Though certain implementations have attracted a lot of attention in recent years–like node.js–they've actually been present in web development for a long time. For example, python's veteran event loop driven framework (which predates django) is called Twisted and has been in the wild since 2002. Regardless the event loop, they're all more or less coroutine based. We're not specifically talking about concrete implementations of coroutines (which I touched on in a previous post), but simply the defining qualities of coroutines: stateful (capable of being suspended and resumed), and cooperative.

This cooperative behavior is typically accomplished either explicitly, using things such as callbacks and special return values (like in Twisted), or implicitly while allowing you to code in what appears to be a synchronous manner, shielding you from the heavy lifting thats going on behind the scenes. Gevent falls into the latter category, and accomplishes this by using clever "monkey patching" of the python standard library. I personally prefer the approach gevent takes, as it eliminates a lot of unnecessary code complexity–like unwieldy callback pyramids–and allows you to focus on developing.

The magic behind gevent lies primarily with light weight pseudo threads (coroutines) called greenlets, and libevent's high performance event loop which wraps platform specific event mechanisms. A web application running on gevent will typically leverage a WSGI server derived from it's StreamServer class. This server will spawn a greenlet for every HTTP request it receives, and that greenlet will be tossed into the event loop to await its turn to run. These request greenlets are monitored by a parent greenlet called the hub. After a request greenlet has ran, it will pass control back to the hub, typically by doing a blocking operation (which has been monkey patched) or simply by returning. The hub will continually defer to the event loop to context switch to the next awaiting greenlet. Thousands of these greenelets can potentially exist at once, but only one is actually running at any given time.

There's little benefit to using gevent unless your application specifically requires websocket support, or does frequent blocking file IO and network operations. For things such as heavy CPU bound operations (like data processing), you might be better off exploring task queues such as Celery.

Project Setup

I'm going to begin things connected to a fresh Ubuntu VM using the Vagrant starter template. Gevent requires that we have libevent installed, so we'll start there:

sudo apt-get install libevent-dev

As mentioned above, libevent sits on top of various platform specific event mechanisms and is ultimately responsible for gevent's event loop. Next, using virtualenv (which you can install via sudo apt-get install python-virtualenv) lets create a new environment and install the essentials:

virtualenv ~/envs/django-socketio
source ~/envs/django-socketio/bin/activate
pip install django psycopg2 psycogreen

There is one package I've excluded from the above pip install command, and thats gevent-socketio. At the time I'm writing this article I've noticed the version registered on PyPI doesn't properly shutdown its flash policy server, whereas the latest version on github does. You can install the package from github with the following:

pip install -e git+https://github.com/abourget/gevent-socketio.git#egg=gevent_socketio-dev

Finally lets wrap up project setup by creating the django project we'll be working from:

cd /vagrant
mkdir project
cd project
django-admin.py startproject socketioapp

Painting Django Green

Django requires a little finessing to make it "async friendly." Our checklist is relatively short: call gevent.monkey.patch_all() as early as possible, patch our database backend to be async (Postgres in my case), and make running socketio.server.SocketIOServer in place of django's built in dev server as convenient as possible. These items only require a couple lines of code, but the difficult part is really deciding where they belong. Some choices might be convenient for local development, but cause problems later during deployment in production environments.

The following is a bare bones example which I'll put into a file named runserver.py alongside the manage.py module:

#!/usr/bin/env python

from gevent import monkey
monkey.patch_all()

import os
from psycogreen.gevent import patch_psycopg

os.environ.setdefault("DJANGO_SETTINGS_MODULE", "socketioapp.settings")
patch_psycopg()

from django.core.wsgi import get_wsgi_application
application = get_wsgi_application()


if __name__ == '__main__':
    from socketio.server import SocketIOServer
    server = SocketIOServer(('', 8000), application, resource="socket.io")
    server.serve_forever()

Nothing too special here. We've basically combined our monkey patching, psycopg async fix, and the SocketIOServer startup code in one place. I run python runserver.py and pull up the page on my localhost to see the welcome to django page staring me in the face. Sounds like we're done, right? Perhaps for the lazy developer, but one big convenience that django's runserver command provides that's missing here is the ability to auto-reload the dev server when code changes occur. Who wants to manually re-start the server every time we make an update to something?

Thankfully, if you take a little time to explore the example apps in the gevent-socketio project on github, you'll find a handy management command awaiting you. I've improved their version slightly by integrating additional command line options, explicitly killing running sockets to trigger socket.io namespace disconnects, and adding psycopg monkey patching support. My version looks like the following:

from optparse import make_option
from re import match
from thread import start_new_thread
from time import sleep
from os import getpid, kill, environ
from signal import SIGINT

from django.conf import settings
from django.core.handlers.wsgi import WSGIHandler
from django.core.management.base import BaseCommand, CommandError
from django.core.management.commands.runserver import naiveip_re, DEFAULT_PORT
from django.utils import six
from django.utils.autoreload import code_changed, restart_with_reloader
from socketio.server import SocketIOServer


RELOAD = False


def reload_watcher():
    global RELOAD
    while True:
        RELOAD = code_changed()
        if RELOAD:
            kill(getpid(), SIGINT)
        sleep(1)


class Command(BaseCommand):
    option_list = BaseCommand.option_list + (
        make_option(
            '--nopsyco',
            action='store_false',
            dest='use_psyco',
            default=True,
            help='Do NOT patch psycopg using psycogreen.'),
        make_option(
            '--noreload',
            action='store_false',
            dest='use_reloader',
            default=True,
            help='Do NOT use the auto-reloader.'),
        make_option(
            '--nostatic',
            action='store_false',
            dest='use_static_handler',
            default=True,
            help='Do NOT use staticfiles handler.'),
    )

    def handle(self, addrport='', *args, **options):
        if not addrport:
            self.addr = ''
            self.port = DEFAULT_PORT
        else:
            m = match(naiveip_re, addrport)
            if m is None:
                raise CommandError('"%s" is not a valid port number '
                                   'or address:port pair.' % addrport)
            self.addr, _, _, _, self.port = m.groups()

        environ['DJANGO_SOCKETIO_PORT'] = str(self.port)

        if options.get('use_psyco'):
            try:
                from psycogreen.gevent import patch_psycopg
            except ImportError:
                raise CommandError(
                    'Could not patch psycopg. '
                    'Is psycogreen installed?')
            patch_psycopg()

        if options.get('use_reloader'):
            start_new_thread(reload_watcher, ())

        try:
            bind = (self.addr, int(self.port))
            print 'SocketIOServer running on %s:%s\n\n' % bind
            handler = self.get_handler(*args, **options)
            server = SocketIOServer(
                bind, handler, resource='socket.io', policy_server=True)
            server.serve_forever()
        except KeyboardInterrupt:
            for key, sock in six.iteritems(server.sockets):
                sock.kill(detach=True)
            server.stop()
            if RELOAD:
                print 'Reloading...\n\n'
                restart_with_reloader()

    def get_handler(self, *args, **options):
        """
        Returns the django.contrib.staticfiles handler.
        """
        handler = WSGIHandler()
        try:
            from django.contrib.staticfiles.handlers import StaticFilesHandler
        except ImportError:
            return handler
        use_static_handler = options.get('use_static_handler')
        insecure_serving = options.get('insecure_serving', False)
        if (settings.DEBUG and use_static_handler or
                (use_static_handler and insecure_serving)):
            handler = StaticFilesHandler(handler)
        return handler

You can find my management command above on github inside an otherwise empty django app. Clone it into your project, and add socketio_runserver to your INSTALLED_APPS. When going this route, you should also insert gevent's monkey.patch_all() into the beginning of your manage.py file, as demonstrated in the runserver.py script above. If the monkey patching doesn't occur before django modules are loaded, chances are you'll encounter some bizarre exceptions. You're now ready to run the management command python manage.py socketio_runserver, navigate to localhost, and witness the django welcome page.

Your First Socket.io Django App

Creating and registering socket.io namespaces in your django apps is extremely simple. To demonstrate this, we'll make an app that contains its own namespace which simply echoes messages back to the client. Not the most exciting example, but in a later post we'll build something more elaborate. Begin by running python manage.py startapp echo_server to create an app skeleton, and add echo_server to your INSTALLED_APPS in settings.py.

Inside the new app, make a file named sockets.py. Here we'll create our sole namespace, which is represented by a class that inherits from socketio.namespace.BaseNamespace:

from socketio.namespace import BaseNamespace
from socketio.sdjango import namespace


@namespace('/echo')
class EchoNamespace(BaseNamespace):
    def on_msg(self, msg):
        pkt = dict(type='event',
                   name='msg',
                   args='Someone said: {0}'.format(msg),
                   endpoint=self.ns_name)

        for sessid, socket in self.socket.server.sockets.iteritems():
            socket.send_packet(pkt)

First, a few things about namespaces and sockets. The concept of a namespace in socket.io allows you to define granular "channels"–all sharing the same socket–which provide meaningful message filtering (for client and server side). Each user will have their own socket, which is accessible through self.socket in any namespace instance. Similarly, the SocketIOServer itself is accessible through self.socket.server, and it holds references to all active sockets for all users. One other item worth noting is the user specific session dict, which is accessible through self.session within a namespace, or self.socket.session (they point to the same dict).

The @namespace decorator above registers this class with a global namespace dict. Sockets will create an instance of the appropriate namespace when a user connects to the corresponding endpoint. The various on_<event name> methods inside the namespace get invoked when messages are received from the client matching the same event name. In the case of the echo server, we only have one event named msg, which iterates over all active sockets and sends the received messages right back.

Though this is only a simple example, It is worth noting that security-wise one shouldn't indiscriminately send messages to sockets. Technically, user sockets that haven't even subscribed to this particular namespace would still receive the message on the client side, though it might get discarded. If you plan on doing notable amounts of cross socket messaging, it would be worthwhile to explore the ACL system and create other utility methods to filter sockets appropriately.

Even though we have our namespace created, the module itself is still not being imported anywhere, and thus the namespace is never registered. To remedy this, we can use socketio.sdjango.autodiscover() in our base urls.py like so:

from django.conf import settings
from django.conf.urls import patterns, include, url

from socketio import sdjango
sdjango.autodiscover()


urlpatterns = patterns('',
    url(r'^', include('echo_server.urls')),
    url(r'^socket\.io', include(sdjango.urls)),
)

The autodiscover() method will search for a sockets.py module inside all installed apps, import them, and in turn will cause namespaces to be registered. The above mentioned urls.py inside of the echo_server app contains a generic template view:

from django.conf.urls import patterns, include, url
from django.views.generic import TemplateView


urlpatterns = patterns('',
    url(
        r'^$',
        TemplateView.as_view(template_name='echo_server/index.html'),
        name='index'
    ),
)

The index.html template contains a text field and element for logging messages:

{% load staticfiles %}
<!DOCTYPE html>
<html class="no-js">
    <head>
        <meta charset="utf-8">
        <meta http-equiv="X-UA-Compatible" content="IE=edge">
        <title></title>
        <meta name="description" content="">
        <meta name="viewport" content="width=device-width, initial-scale=1">

        <link rel="stylesheet" href="{% static 'css/normalize.css' %}">
        <style>
            * { box-sizing: border-box;}
            #log { background: #333; bottom: 44px; color: #fefefe; font: 18px/22px "Courier", sans-serif; left: 0; overflow: auto; padding: 10px; position: fixed; top: 0; width: 100%;}
            .input-echo { border: none; border-top: 1px solid #666; bottom: 0; font-size: 18px; height: 44px; left: 0; line-height: 22px; padding: 10px; position: fixed; width: 100%;}
        </style>
    </head>
    <body>
        <div id="log"></div>
        <form id="form-echo" action="" method="POST">
            <input type="text" id="input-echo" class="input-echo" placeholder="Enter message...">
        </form>

        <script src="//ajax.googleapis.com/ajax/libs/jquery/1.10.2/jquery.min.js"></script>
        <script src="{% static 'js/socket.io.js' %}"></script>
        <script src="{% static 'js/main.js' %}"></script>
    </body>
</html>

As this isn't an intro to django, I'm assuming readers know how to configure and manage static files. You can get a copy of the socket.io.js client side script from the socket.io-client project on github. The last piece to the puzzle is main.js which sets up the socket, and handles the sending and receiving of messages:

var socket = io.connect("/echo"),
    $log = $("#log"),
    $input = $("#input-echo");

socket.on('msg', function (msg) {
    $log.append($("<p>" + msg + "</p>"));
});

$("#form-echo").on("submit", function(event){
    event.preventDefault();
    var msg = $input.val();
    $input.val("");
    socket.emit("msg", msg);
});

Spin up your development server and navigate to your localhost. You should now be able to send messages to the server, and see them echoed right back. If you'd like you can have multiple browsers open and see the messages appear across all of them. Websockets rock!

Running Gevent in Production

One of the most popular python WSGI servers is Gunicorn, which happens to have gevent support built in. The specialized worker class it provides performs monkey patching for you as well as supplies useful hooks for patching other libraries, like psycopg. You can install the package easily by using pip install gunicorn. To configure gunicorn I'll create a python config file named gunicorn_config.py with the following:

from multiprocessing import cpu_count
from os import environ


def max_workers():
    return cpu_count() + 1


bind = '0.0.0.0:8000'
max_requests = 10000
worker_class = 'gevent'
workers = max_workers()


def post_fork(server, worker):
    from psycogreen.gevent import psyco_gevent
    psyco_gevent.make_psycopg_green()

Next, run gunicorn -c gunicorn_config.py socketioapp.wsgi:application to start the server. It's recommended to run gunicorn behind a web server like NGINX, which has added websocket support in recent versions.

comments powered by Disqus