Friday, February 17, 2017

Sharing a Python generator across multiple multiprocessing processes.

Sometimes I need to do some work on a seemingly endless set of data.  I often use generators to create the endless data.  Say, for example I were trying to brute-force crack a zip file's password.  I'd create a generator that methodically and continually creates new passwords for the cracking to to use in attempting to unzip the file.

Once I have my generator in place, I have a problem.  What if I want to spread this brute-force attack across all the cores in my system to speed up this slow endless effort.  One would think that multiprocessing will solve the problem.

Well, no.  Generators aren't natively shared across processes in Python.  They're copied to each process.  This means that if you try to use the generator with multiprocessing, each process will get it's own copy of the generator and each process would get the same values from each copy.

To solve that, I devised a quick and dirty solution:  Place the generator in it's own process and then have each worker process request the next value from it via inter-process communication using the multi-processing Pipe class.

To that end, here's an example.  I post this here mainly to jog my memory next time I need to do this, but if you find it useful, great!