Parallel Remote Tasks

14 June 2008 10:52

I've been trying to write a function that will provide a queue that tasks can be added to. These tasks are command-line programs which will run on remote servers. There is some significant configuration to do on these remote machines, so it makes sense if a Python process is started on these machines in order for it to do the work. This has proved harder than I expected, since I want my program to be resilient to network failure or failure of the process being run.

My first idea was to start Python instances remotely using py.execnet. But py.execnet blocks on a request to read data where none is present, which is a fundamental flaw. Holger Krekel suggested that an asynchronous api would be possible, but neither that nor a polling API exists at present. This makes it pretty much useless to me, since I need to be able to check for new tasks without losing the ability to see whether the existing ones have finished. The only simple solution is for any side that needs to send data sometimes to periodically send something whether it needs to or not.

Subprocesses

In order to periodically read what new tasks have been added to its queue requires the server to start its subprocesses in a non-blocking way. Starting a subprocess which you wish to see the output of in Python was traditionally done using os.popen(), but this has been replaced by the less than brilliantly documented subprocess module since version 2.4. Instances of subprocess.Popen can be polled to see whether they have finished:

>>> from subprocess import Popen, PIPE, check_call
>>> from time import sleep
>>> exe = open('slow.py', 'w')
>>> exe.write("""#!/usr/bin/env python2.5
... from time import sleep
... sleep(4.5)
... print "Just finished, sorry for the wait"
... """)
>>> exe.close()
>>>
>>> check_call('chmod +x slow.py', shell=True)
0
>>>
>>> task = Popen('./slow.py', shell=True, stdout=PIPE)
>>> while task.poll() is None:
...     print "Not done yet..."
...     sleep(1)
Not done yet...
Not done yet...
Not done yet...
Not done yet...
Not done yet...
>>> print task.stdout.read()
Just finished, sorry for the wait
<BLANKLINE>

Which is useful. I think it's also possible to poll the pipes for output, but I don't currently have a need for that so haven't tested it. So it is possible for a Python process to control multiple system processes. But if you want to kill them if they don't return that's a bit trickier. Under unix you have to use os.kill; if you want a cross-platform solution then look at killableprocess.py.

Inter-system communication

But the only way for the server to not block whilst reading data from the remote process using py.execnet is for the remote process to continually send data regardless of whether it's required. One nicer solution might be to start the process using py.execnet, but then to communicate using Twisted's Perspective Broker, though the dirty solution above is probably preferable, especially since the server is having to poll its subprocesses anyway.

Returning to the client, which is going to have to send jobs out to multiple servers, and take new jobs from a queue, the main problems now all seem to be solved by polling py.execnet, with no excuse for using Twisted, which is a shame. The system isn't, however, resilient to failure of one of the servers or the network. Some work would need to be done to py.execnet for this to work.

I had imagined the clients being Twisted processes, all started in their own thread (since they are only part of a library, and so the user wouldn't necessarily know about asynchronous programming). This might have been a problem anyway, since having multiple reactors in one process doesn't necessarily work yet. And using threading, Twisted and polling py.execnet in one application might have been slightly too much anyway!

Leave a comment