For my web site I need to dynamically create pdf files that contain the users arabic vocabulary that they wish to revise.

This was done using python and some libraries that I created some time ago to output unicode arabic to postscript. I dynamically create the postscript file (postscript is a language of its own and it’s well worth spending a few hours reading the official developers documentation from Adobe) and then convert it to pdf using ps2pdf. I instruct ps2pdf to embed the font ( an old free type1 arabic font) so that the pdf is readable by anyone even if they do not have that font available.

The code to convert unicode arabic to postscript involves shaping the arabic (getting the correct glyph for a character according to the characters before and after it) and creating an encoding that maps the glyphs in the font to the character codes that I output. It also calculates the length of the resulting shaped text so that it can be placed, right-to-left and aligned to the right, at the correct x,y coordinate.

Also, because learners of arabic need to learn the tashkeel, the algorithm also places the harakaat nicely above and below the glyphs according to the size of the glyph. Without this the harakaat tend to overlay the glyph and are unreadable. I do this by stripping the harakaat out of the word before outputing it, and then adding each harakat in a second sweep, taking into account the dimensions of the glyph it is over.

I will be releasing the code as open-source but it’s not yet published. Contact me if you need it now and I’ll email it to you.