Thursday, February 23, 2017

Python streams

I've become really comfortable with Java 8's streams over the past couple years.  As a result, when I go back and do Python work, I get stream envy.  I looked around and found a couple options, but none of them were really what I'm after... something quick and dirty that will just make my syntax a bit more readable.

So I threw this together and find it quite handy.  I hope you find it helpful as well.  If you save this to a module and name it 'streams.py', you can use it like the following:

    from lazy_streams import stream

    S = stream(range(250)) \
        .filter(lambda x: (x%2 == 0))
    print S.size()
    print S.take(10).to_string()
    print S.reverse().take(10).to_list()
    S1 = S.map(lambda x: "Item %d" % x)
    print S1.first_or_else('Nothing to see')
    print stream(['Patty Cake', 'Jim Shoe', 'Justin Case']) \
        .sort(lambda x: x.split(' ')[1]) \
        .to_list()
    print stream([[1, 2], 3, [4, 5, 6], 'seven']) \
        .flatten() \
        .to_list()

I've tested it on lists as big as 2.5 million items and the lazy evaluation seems to work really great!

Here's the gist:

Friday, February 17, 2017

Sharing a Python generator across multiple multiprocessing processes.

Sometimes I need to do some work on a seemingly endless set of data.  I often use generators to create the endless data.  Say, for example I were trying to brute-force crack a zip file's password.  I'd create a generator that methodically and continually creates new passwords for the cracking to to use in attempting to unzip the file.

Once I have my generator in place, I have a problem.  What if I want to spread this brute-force attack across all the cores in my system to speed up this slow endless effort.  One would think that multiprocessing will solve the problem.

Well, no.  Generators aren't natively shared across processes in Python.  They're copied to each process.  This means that if you try to use the generator with multiprocessing, each process will get it's own copy of the generator and each process would get the same values from each copy.

To solve that, I devised a quick and dirty solution:  Place the generator in it's own process and then have each worker process request the next value from it via inter-process communication using the multi-processing Pipe class.

To that end, here's an example.  I post this here mainly to jog my memory next time I need to do this, but if you find it useful, great!


Wednesday, February 8, 2017

Non-freeze, single file distribution of your python project

In my job, I often have to share/distribute Python code to others on my development team or to others in sibling development teams.  These folks are technical and I can easily rely on them to have a reasonably contemporary version of Python already installed.  Thus I don't really want to completely freeze my Python project (via cxFreeze, py2app, py2exe, etc.).  Nor do I want to send my coworkers through the 'create a virtualenv, pip install -r, etc.).

I want a middle-ground.  I want to just send them a file, knowing that it contains all the required dependencies and will just run, assuming they have Python in their path.

I've known for a while that if you zip up a directory full of Python files and so long as the root of the zip file contains a __main__.py file and you can directly run the zip file via python {filename.zip}.

What I didn't know is that you can concatenate a text file containing the Python shebang ('#!/usr/bin/env python') and a python zip file and the resulting file is runnable.

#!/usr/bin/env python
PK^C^D^T^@^B^@^H^@nLHJb±gL<94>^A^@^@Ò^B^@^@^K^@^\^@__main__.pyUT    ^@^C¯,<9b>Xö,<9b>Xux^K^@^A^D«<82><90>~^D^T^@^@^@u<92>ÑkÛ0^PÆßýWܲ^GÙÐ$},^EÃÂ(<85><90>ìq ^Tûâ<88>Z<92>9<9d>º<95>±ÿ}';ÎJ`Â^O<96>ôéû~ºÓ
ÈY©1^F<93>9£4@^Q&<81><8e>O^X<88>2#tCrÌ^N0©¸<83>ÛÑw°ÌÕÂ#®Ë¾^A^PÅ >«¼h@^P;c6<8d>"^Âý=^\AB!ej<91>  <9e>g¦ñ<95>Vgé<87>ÌqÏ^RC<8a>9<99>aq^O^O<9f>Cv^U^AþXQi<88>ÛÌ®°Üx<90>µó<98>^Gl­ ´Z^B<81>^E·¾&^B^C<83>ÕÚ§
,g^LÃ=kW^Y$r­â Æ]HÉ^UÝ5AÂ[V^FC$y<80>¢a<82>y^V^U{°(Ô<9b>^XW}¤h<8f>aØ}¦^G>o<96>êu<9c>e?k<8e>,àbºÔ/ýëh´¶^LÇp^A^Ye<8b>LÕBt<85>w£hi¿&(^HÛîä<96>tÿ.^@A>
dJuëX

I stumbled upon a tool called pex which will automate generating these shebang/zip files.  While I found pex useful, it seems too much for my needs.  So I started cobbling together my own shebang/zip files by hand as I needed them.

Ultimately, wanted to automate this some, so I ended up creating a simple bash script (which I named 'compile.sh') to speed things up.  Here's that script.  Hope you find this useful.

#!/bin/bash
#
# compile.sh

# Creates a self-runnable, single-file deployable for your python project.
# Your python project's main entry point must be in a file named __main__.py.
# It will embed in the deployable every dependency you've pip-installed in
# your virtualenv .env directory.
#
# The deployable still requires that python is installed on the target system.

if [[ ! -d .env/lib/python2.7/site-packages ]]; then
    echo "This script must be run from a virtualenv root."
    exit 1
fi

if [[ ! -f __main__.py ]]; then
    echo "This will only work if you have a __main__.py file."
    exit 1
fi

if [[ -z "$1" ]]; then
    echo "Usage: $(basename $0) {output_filename}"
    exit 1
fi

TMPDIR="$(mktemp -u /tmp/python.compile.XXXXXXX)"
CURDIR="$(pwd)"
TARGET="$1"

# Create payload zip with our code (remove extra junk)
mkdir -p "$TMPDIR/deps"
zip -9 "$TMPDIR/payload.zip" *
zip -d "$TMPDIR/payload.zip" compile.sh
zip -d "$TMPDIR/payload.zip" requirements.txt
zip -d "$TMPDIR/payload.zip" "$TARGET" > /dev/null

# Gather the virtualenv packages and clean them up
cp -R .env/lib/python2.7/site-packages/* "$TMPDIR/deps"
find "$TMPDIR/deps" -iname '*.pyc' -delete
find "$TMPDIR/deps" -ipath '*pip*' -delete
find "$TMPDIR/deps" -ipath '*easy_install*' -delete
find "$TMPDIR/deps" -ipath '*dist-info*' -delete

# Add the virtualenv packages to the payload
cd "$TMPDIR/deps"
zip -9 -r ../payload.zip *
cd "$CURDIR"

# Assemble the payload into a runable
echo '#!/usr/bin/env python' | cat - "$TMPDIR/payload.zip" > "$TARGET"
chmod +x "$TARGET"

# Cleanup
rm -r "$TMPDIR"