Making EPUBs in Python is a non-trivial task. While the appearance of grabbing x number of HTML documents and throwing running them through the zipfile module seems straight forward, there are things you must consider. While there are a number of ‘rules’, these are the two that I had the biggest problem with – things that differentiate them from just zipping up some files.
- There must be a metafile document, and it must be the first document in the archive.
- There is an index file (ending in .opf), which is an index of all the files in the EPUB.
It would also be a good idea to read the spec on EPUB 3.0 before proceeding to understand the format expected.
Let’s look at some code.