Thursday, June 23, 2011

Act IV, Scene I

- what wizardry is this!? svn supports symlinks?

A dark Filesystem. In the middle, a Repository boiling. Thunder.
                Enter the three Programmers.
       1 PROGRAMMER.  Thrice the padded buf hath oe'r runn'd.
       2 PROGRAMMER Thrice and once, the platter spun.
       3 PROGRAMMER Hexate cries:—'tis time! 'tis time!
       1 PROGRAMMER Round about the repo go;
    In spaghetti'd source code throw.—
    Functors, that on blackest ARM,
    Caused a user grievous harm;
    Refactor'd business logic got,
    Compile first i' the charmed pot!
       ALL.  Double, double toil and trouble;
    Cycles burn, and repo bubble.
       2 PROGRAMMER Mock-up of an inode struct,
    In the repo run amock;
    Superblock, corrupt extent,
    File flat, and file bent,
    Meta data, magic prop,
    B-tree hash, and sign that's dropped,—
    For VCS of powerful trouble,
    Like a hell-broth boil and bubble.
       ALL.  Double, double toil and trouble;
    Cycles burn, and repo bubble.
       3 PROGRAMMER Shard of cluster; meg of RAM;
    ASIC ripped from Don Knuth's pram;
    Recursive matrix transform hack;
    Toggle switch that won't switch back;
    Heap address that ain't been writ,
    Yet has a quine contained in it;
    NSA encryption key;
    Source code for an AI bee;
    Lambda of Alonzo Church
    Found by Turing's A* search;—
    Document this noxious gruel
    With Microsoft's new WinWord tool.
       ALL.  Double, double toil and trouble;
    Cycles burn, and repo bubble.
       2 PROGRAMMER Cool it with a peltier,
    Then the code can ship, hooray!

Sunday, November 14, 2010

Multithreaded Python, extensions, and static data


The GIL and Context Switching

I use Boost.Python to write some C++ extensions for Python. I work on an I/O-bound Python program; Python has a global interpreter lock ("the GIL") which means that in a multi-threaded program, only one thread can be executing code inside the Python interpreter at once. Now, a thread can drop the GIL, and the built-in Python read and write routines do this so that while one thread is doing I/O, another thread can run. However, due to a peculiarity in how the GIL is implemented,1 even though the actual I/O takes place during system calls that drop the GIL, the need to re-acquire the GIL after every I/O operation was killing our performance.

For instance, one of the things that the application does a lot of is logging. Doing the logging synchronously -- as the code is executing and it wants to write something to its log, it needs to wait for the write to the logfile to finish before it can continue going about its business -- turned out to be a bottleneck. My first attempt to do something about that was to spawn off a separate thread for the logfile, and have logging look something like this:

class Logger:
    def __init__(self):
        self._buffer = []
        self._file = open("foo.log", "w")
        self._hasMessages = threading.Condition()
        self._lock = threading.Lock()

    def writeEntriesForever(self):
        while True:
            with self._lock:
                while len(self._buffer) == 0:
                messages = self._buffer
                self._buffer = []


    def log(self, message):
        with self._lock:

def main():
    logger = Logger()
    t = threading.Thread(target = logger.writeEntriesForever)


That did get the I/O off of the main thread. However, while the write call in Logger.writeEntriesForever would make the logging thread drop the GIL, allowing the main thread to continue executing, the logging thread would need to reacquire the GIL when write returned. Now, it'd drop the GIL again when while waiting, which is where the thread would spend most of its time, but then it'd need to acquire the GIL again between the end of wait and the start of write. All of these context switches were almost completely negating any performance win from offloading the actual I/O to a separate thread.


Enter Boost.Python. The GIL is only needed when using the Python interpreter, so if the entire body of writeEntriesForever doesn't need the interpreter, the thread can drop the GIL as soon as it enters that method and never reacquire it. This means writing that method in some language other than Python, which is what Boost.Python makes it easy to do.

The way Boost.Python works, you wind up compiling and linking your C++ code into a dynamic library, and that library is a Python extension. In the example above, the new Python code would look like:

from logging_extension import Logger

def main():
    logger = Logger()
    t = threading.Thread(targer=logger.writeEntriesForever)


and then you'd have C++ code that would look something like:

class ScopedGILRelease {
// The GIL will be released when an instance of this class goes in-scope
// and reacquired when it goes out of scope.
    inline ScopedGILRelease() { m_thread_state = PyEval_SaveThread(); }
    inline ~ScopedGILRelease() {
        m_thread_state = NULL;
    PyThreadState *m_thread_state;

class Logger {
   void log(boost::python::str message);
   void writeEntriesForever();

    typedef std:list LogBuffer;
    std::auto_ptr buffer;
    std::filebuf fb;
    std::ostream file;
    boost::condition hasMessages;
    boost::mutex mutex;

Logger::Logger() {"foo.log", ios::out);
    file = std::ostream(&fb);
    buffer = std::auto_ptr(new LogBuffer);

void Logger::log(boost::python::str message) {
    ScopedGILRelease noGIL; //Drop GIL before acquiring mutex to avoid deadlock.
    boost::scoped_lock lock(mutex);

void Logger::writeEntriesForever() {
    ScopedGILRelease noGIL;
    while(true) {
        std::auto_ptr messages;
            boost::scoped_lock lock(mutex);
            while(!buffer.size()) hasMessages.wait();
            messages = buffer;
        for(LogBuffer::iterator i = messages->begin(); i != messages->end(); i++) {
            file << *i;

    using namespace boost::python;
        .def("log", &Logger::log)
        .def("writeEntriesForever", &Logger::writeEntriesForever)

Python Extensions and Static Data

So, that all worked fine and dandy until I did three things.

Multiple Extensions

The first thing that caused a problem is I also decided to move the code for interacting with Oracle into a Boost.Python extension. For the usual reasons, I didn't want to have that code and the logging code in one big honking library of doom, so I put it in its own extension; there was now and

Static Data

The second thing that caused a problem is that our logging code is actually more complicated above. We have a syslog-like framework where there are different categories of log message, and the app can be configured so that different categories have different log levels. There are a lot of LOG_DEBUG statements in the application, but if none of the logging categories are configured to be that verbose, those statements will never actually make it into the log.

Since logging settings are application-wide, and it'd be ugly to have to pass around a "logging state" object everywhere (or for that matter, an instance of Logger), I used static data for that:

    static std::map theLoggers; //Map logging destination to the logger object.
    static std::map theSettings; //Map logging category to its log level.

Using One Extension From Another

The third thing was that I wanted to actually log things from inside the database interaction code. Simply including the logging headers from inside the DB sources didn't work:<

#include "../logging/logging.hpp"
logging::doLog(logging::LEVEL_DEBUG, logging::CATEGORY_DATABASE, "Hello, world!");

It would compile, but it wouldn't run because the symbols from were unresolved. Okay, easy enough to fix. I added to the link line for and went about my merry business.

Symbol Visibility

This looked like it worked, but none of the messages from were actually making it into the log! I thought I must be doing something threading-related incorrectly, or something. But, eventually, while debugging in GDB I noticed an odd message.

(gdb) b
Breakpoint 1 at 0x1234567: file logging.cpp, line 6. (2 locations)

2 locations? Oh. Well, that was the problem. To load the dynamic library at runtime on Linux, Python uses the dlopen function. The documentation for dlopen mentions, in the description of the flag argument, that the RLD_LOCAL flag (the default) means that "symbols defined in this library are not made available to resolve references in subsequently loaded libraries." This meant that when Python loaded, would map in a new copy of (because was linked to it), ignore the copy pulled in when Python did dlopen("", RTLD_LOCAL); . This meant that when Python routines called logging functions, they got one copy of the static data, while when oracle routines called logging functions, they got their own copy of the static data! So, the database code wasn't seeing any of the logging settings changes made from the Python code.

My solution was to stop linking against, and to create a new, with a dire warning in a comment that before importing any Boost.Python extensions, one must import this file:

###---*** IMPORTANT! Before importing any Boost.Python modules, you *must*
###---*** import this, e.g.:
###---***    import pre_c_import
###---***    import c_extensions.logging
###---***    import
###---*** If you don't, your import will fail with unresolved symbol errors.
###---*** Whenever you add a new extension module, you should add it to
###---*** the import below.
import sys, DLFCN
current = sys.getdlopenflags()
sys.setdlopenflags(current | DLFCN.RTLD_GLOBAL)

import c_extensions.logging


RTLD_GLOBAL is the opposite of RLTD_LOCAL, so now the extensions were able to see each other's symbols, and everything was happy.

  1. Dave Beazley has a good explanation and hour-long video describing the issues.

Wednesday, September 16, 2009

Sloppy Graph, Sloppy Design

I was spending time trying to fine-tune a graphviz file documenting the call graph of a piece of code and describing some of the critical functions. graphviz isn't really designed for the kind of long node labels I wanted to give it, so it would do things like put nodes in places which made it have to draw arrows reaching clear across the page.

Finally I realized that rather than trying to talk graphviz into reordering its nodes, I could just refactor the thing I was graphing so that the flow wasn't so darned convoluted in the first place.

Before (image links to full size version):


After (image links to full size version):


Corollary? If it's hard to get your call flow graph to look pretty, well, the graph isn't the only thing that's ugly...

Friday, January 9, 2009

Monday, January 5, 2009

The Best Game Ever

Improv Everywhere ratchets up the excitement at a little league baseball game with bleachers full of screaming fans, NBC Sports on the jumbotron, and the Goodyear Blimp. (Via AgBlog.)

Monday, December 22, 2008

An Avocado in the Snow

An avocado
An avocado, found in the snow near Portland St and Broadway, Cambridge, MA

An avocado in the snow.
Who left it there? I do not know.
Not Father, Son, nor Ghost so holy,
Rebirths you into guacamole.
Did leaping from some wretched fate
Allow you to feel special, great?
Or did, cast down like ancient foe,
You weep from terror, weep from woe?
But lie here now, near Portland Street,
And rest, green flesh and tasty meat.