Saturday, May 19, 2007

Playing with co-routines

I've been experimenting with various co-routine implementations. I've been trying out a "switch(...)" based approach, the COROUTINE C++ library, uC++ (all C++ based), as well as Python, Lua and Io.

My test cases are programs that constructs 20000 counters and then call each one 300 times.

First, here is a plain C++ program that I used as a reference:

#include <stdio.h>

class Counter {

public:

int i;

Counter() { i = 0; }

void run() {

// printf( "%d\n", i );
i++;
}
};

int main( int argc, char** argv ) {

Counter counters[ 20000 ];

for( int i = 0; i < 60 * 5; i++ ) {
for( int j = 0; j < 20000; j++ ) {

counters[i].run();
}
}

return 0;
}


This is the C++ program using the "switch" statements via preprocessor macros:

#include <stdio.h>

// new macro's
#define cr_context int __s;
#define cr_init() __s = 0;
#define cr_start() switch (__s) { case 0:
#define cr_return(x) { __s = __LINE__; return x; case __LINE__: ; }
#define cr_end() { break; default: for (;;) ; } } __s = 0;

class Counter {

public:

cr_context;
int i;

Counter() { cr_init(); }

void operator() () {

cr_start();

while( 1 ) {

// printf( "%d\n", i );
i++;

cr_return();
}

cr_end();
}
};

int main( int argc, char** argv ) {

Counter counters[ 20000 ];

for( int i = 0; i < 60 * 5; i++ ) {
for( int j = 0; j < 20000; j++ ) {

counters[ i ]();
}
}

return 0;
}

The benefits of this approach are the very easy implementation as well as the performance. The main drawback is that you cannot use the stack, so the variables and objects the co-routine uses have to be declared as class members ("i" in this example).


The third C++ program uses the COROUTINE library:

#include <stdio.h>
#include "coroutine.h"

class Counter : public Coroutine {

void Routine() {

int i;

while( 1 ) {

// printf( "%d\n", i );
i++;

Detach();
}
}
};

Coroutine* counters[ 20000 ];

void DoMain() {

for( int i = 0; i < 20000; i++ )
counters[i] = new Counter;

for( int i = 0; i < 60 * 5; i++ ) {
for( int j = 0; j < 20000; j++ ) {

Resume( counters[i] );
}
}

}

int main( int argc, char** argv )
Sequencing( DoMain() )

The COROUTINE library comes with two implementation, copy-stack and share-stack. The copy-stack implementation is way slower than the share-stack one for this example (see below).


The fourth program uses uC++:

#include <u++/uC++.h>

_Coroutine Counter {

void main() {

int i = 0;

while( 1 ) {

// printf( "%d\n", i );
i++;

uSuspend;
}
}

public:

Counter() {}

void run() {

uResume;
}
};

Counter counters[ 20000 ];

void uMain::main() {

for( int j = 0; j < 60 * 5; j++ )
for( int i = 0; i < 20000; i++ )
counters[i].run();
}

uC++ translates the code above into a C++ program and provides a runtime library to link to. The implementation seems to be similar to the COROUTINE library's copy-stack implementation, resulting in the same poor performance for this example.


The Python program:

#!/usr/bin/env python

def Counter():
i = 0
while( True ):
i += 1
yield()

counters = []

for i in range(0,20000):
counters.append( Counter() )

for j in range( 0, 60 * 5 ):
for i in range(0,20000):
counters[i].next()

The Python example uses generators and was tested with standard Python 2.4.4. It would be interesting to see how the performance compares to other implementations such as Stackless Python. I might check it out later :-)


The Lua program:

function Counter()
local i = 0
while 1 > 0 do
-- print( i )
i = i + 1
coroutine.yield()
end
end

counters = {}
for i = 0, 19999 do
counters[ i ] = coroutine.create( Counter )
end

for i = 1, 60 * 5 do
for j = 0, 19999 do
coroutine.resume( counters[ j ] )
end
end


And finally the Io program:

Counter := Object clone do (

newSlot( "coroutine" )
newSlot( "nextCoroutine" )

init := method(

coroutine = coroDo (

pause
i := 0
while( 1 > 0,
// i println
i = i + 1

nextCoroutine resume
)
)
)

run := method(

nextCoroutine = Coroutine currentCoro
coroutine resume
)
)

counters := List clone
20000 repeat(
counters append( Counter clone )
)

( 60 * 5 ) repeat (
counters foreach( c,
c run
)
)

Unfortunately, the Io interpreter/VM encountered a massive memory leak or garbage collection failure with this example. I had to Ctrl-C out of it after it ate all of my 1GB RAM and the computer started swapping manically :-( But Io seems to be a very interesting Language, so I will try again once that problem gets solved.


Performance:

Implementation Time (s)
===========================================
Function calls (for reference) 0.03
Switch macros 0.06
Library (shared stack) 0.75
Python 2.4.4 (generators) 4.01
Lua 5 4.07
uC++ 4.82
Library (copied stack) 4.87
Io Memory leak


2 comments:

Anonymous said...

Here's an Io implementation:

coroWith := method(
coro := Coroutine clone
coro setRunTarget(self)
coro setRunLocals(call sender)
coro setRunMessage(call argAt(0))
coro
)

mainCoro := Coroutine currentCoro
counter := method(j, loop(mainCoro resume))
counters := List clone
for(j, 0, 19999, counters append(coroWith(counter)))

counters foreach(run)
writeln("running...")
(60 * 5) repeat(counters foreach(resume))

henzenmann said...

Thank you very much! This example also doesn't run properly. I have now ulimit-ed io to 512MB, and it segfaults.