Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

That's correct. As a sibling has said, there other ways to do it but most the pdfs I need to work with are done by simply remapping in order of occurrence. (E.g., if an X is the first char in the doc, it's referenced as \1). You can tell subset fonts because they're named as RANDPREFIX+fontname so different subset fonts from the same base font won't collide.

You can get a good overview of the state of the fonts in your PDF using:

    pdffonts file.pdf
There's a column which tells you if there's s Unicode map available for the font. That's important. Because PDF is just rendering glyphs at positions, it doesn't even know what the character names are. To allow you to copy and paste, most fonts in most pdfs will have a Unicode map from the glyph id to the Unicode symbol.

If that's not available, in some cases you can rebuild it yourself by looking at the character encodings and substitutions.

On the book, do you have any examples? I'll probably never get around to writing anything down, but if it looks easy enough it's probably worth having a stab at.

Also, large caveat, I'm not a PDF or font expert. I've probably decimated the terminology here but hopefully it gives you a rough idea.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: