Monday, December 22, 2008

Solving the Zlib Problem – IronPython.Zlib

IronPython 2.0 does not include an implementation of the zlib module, which handles data compression. In CPython the zlib module is a C module, which it can't be used from IronPython. It is also required by several standard Python modules, include tarfile, gzip, and zipfile, all of which handle compressed archives. Thus, for setuptools - particularly, easy_install and zipimport (Eggs) - to work, there must be a fully working zlib module.

FePy includes a partial implementation built on .NET's DeflateStream, but it is missing support for the decompressobj API that is used by gzip and zipfile. Supporting this API using DeflateStream appears to be impossible, as it expects a few implementation details of the zlib implementation.

That left two options for implementing zlib: P/Invoke, and ComponentAce's zlib.net. P/Invoke had a couple of issues: first, native-managed interop is hard; second, handling 32-bit and 64-bit systems seamlessly adds a lot of extra code (I use Vista x64, so I ran into this almost immediately). Zlib.net is an open source (BSD-style license), pure .NET implementation of zlib; this avoids both native-managed interop and 32/64-bit issues. It has an interface nearly identical to zlib, which means that it could be used to mimic the Python C library, which is exactly what I've done.

IronPython.Zlib is an implementation of the zlib module for IronPython using zlib.net. Both binary and source are available under the MS-PL license. To use it create a "DLLs" directory under your IronPython implementation and drop IronPython.Zlib.dll into it.

It passes most of the Python zlib tests and all of the gzip tests. There are a few issues with the CRC-32 implementation that the tests didn't catch but real-world usage did, but I haven't figured out how to fix them yet.

There are still some issues with setuptools (the zipfile module uses binascii.crc32, which is not yet implemented) that I'll detail in a later post, but it almost works.

Update June 27, 2009: Uploaded a new version that fixes crc32 bugs by forwarding to the now-implemented but 2.6-only binascii.crc32.