Generating PDF Files From man Pages

I’ve been writing documentation for some of my published scripts in the form of man pages. The troff/groff markup is not hard to get to grips with--the man pages for the relevant tools have a lot of useful info. And it’s easy to view the results without any special installation step, since the man(1) command includes the “-l” option to treat its argument as a file name. But if the topic name has any slashes in it, such treatment is automatic without even needing this option. Thus, in my project source directory, I can type something as simple as man ./render-batch.1 to view the man page I wrote for the “render-batch” command. Conversion to PDF is easy, too, using the underlying groff command that man itself uses to format the page, just specifying a different output “device” (i.e. format), e.g. groff -man -Tpdf «troff-file» >«output-file» Or, if I don’t want to save the output PDF file, but just pipe it straight to a viewer like Okular, this works: groff -man -Tpdf «troff-file» | okular - This is particularly useful, for example, for debugging the difference between dashes and hyphens. You are supposed to use “\-” for the hyphen-minus character (as used to prefix command-line options, for example), while hyphens in regular text should be written as “-”. The two look the same in a normal man-page view, but they get rendered differently in PDF files. However, I found that groff does not correctly render various Unicode UTF-8 characters when generating PDF output. For example, I like to use curly quotes “”. These look OK in regular man output, but get turned into junk in PDF output. After some searching online, it appears this is a known limitation of groff, because it uses various 8-bit byte codes internally for its own purposes, precluding their use as part of UTF-8 encoding. But there is a tool called “preconv” that can convert various encodings into a more groff-friendly text representation. For example preconv -e utf-8 «troff-file» | groff -man -Tpdf | okular - Since my locale is set to a UTF-8 one by default, I don’t even need the “-e” option: preconv «troff-file» | groff -man -Tpdf | okular - Or, if you want to save the output: preconv «troff-file» | groff -man -Tpdf >«output-file» Going back to the example filename I used before, these translate to something like preconv render-batch.1 | groff -man -Tpdf | okular - preconv render-batch.1 | groff -man -Tpdf >render-batch.pdf Another variation is to use preconv «troff-file» | groff -man -Tpdf | { okular - & } This spawns the Okular part of the pipeline in the background, so it comes back to the shell prompt immediately that starts running. This is convenient with a GUI tool like Okular, because I can run other commands while continuing to view the output PDF file. So how is it that groff is capable of handling these characters just fine when invoked via man? That I haven’t figured out yet ...

I wrote:
preconv «troff-file» | groff -man -Tpdf | { okular - & } [etc]
Wouldn’t you know it, another reading of the man pages for preconv and groff reveals that groff has the “-k” option to invoke preconv itself, precisely for this purpose. E.g. the above becomes the simpler groff -k -man -Tpdf «troff-file» | { okular - & } And similarly for the other cases. You can even be slightly tricky and make it just that little bit shorter: groff -kman -Tpdf «troff-file» | { okular - & }

On Wed, 13 Dec 2023 12:26:45 +1300, I wrote:
groff -kman -Tpdf «troff-file» | { okular - & }
Came across one of my own man pages that has a table in it, so need to include an invocation of tbl: groff -ktman -Tpdf «troff-file» | { okular - & } You can omit the curly braces to run the whole pipeline in the background.
participants (1)
-
Lawrence D'Oliveiro