Text-KnuthPlass

Text::KnuthPlass is a Perl and XS (C) implementation of the well-known Knuth-Plass TeX paragraph-shaping (a.k.a. line-breaking) algorithm, as created by Donald E. Knuth and Michael F. Plass in 1981.

Given a long string containing the text of a paragraph, this module decides where to split a line (possibly hyphenating a word in the process), while attempting to:

What is a stated objective of Knuth-Plass but I don't think this implementation directly does:

What it definitely doesn't do:

In spite of these limitations, the Knuth-Plass ("TeX line splitting") algorithm is still pretty much the gold standard for paragraph shaping.

The Knuth-Plass algorithm does this by defining "boxes", "glue", and "penalties" for the paragraph text, and fiddling with line break points to minimize the overall sum of demerits (a penalty value for various "bad typesetting" gaffes). This can result in the "breaking" of one or more of the listed rules, if it results in an overall better scoring ("better looking") layout.

Text::KnuthPlass handles word widths by either character count, or a user- supplied width function (such as based on the current font and font size). It can also handle varying-length lines, if your column is not a perfect rectangle (see examples).

Installation

perl Build.PL
./Build
./Build test
./Build install

Note that if the XS (C) code fails to build and install for some reason, or you enjoy watching paint dry, you can still run "pure Perl" code -- it's much slower, but will always run. In lib/Text/KnuthPlass.pm, look for the flag setting use constant purePerl => 0; and change it to a value of 1.

Documentation

After installation, documentation can be found via

perldoc Text::KnuthPlass

or

pod2html lib/Text/KnuthPlass.pm > KnuthPlass.html

Support

Bug tracking is via

"https://github.com/PhilterPaper/Text-KnuthPlass/issues?q=is%3Aissue+sort%3Aupdated-desc+is%3Aopen"

(you will need a GitHub account to create or contribute to a discussion, but anyone can read tickets.) The old RT ticket system is closed.

Do NOT under ANY circumstances open a PR (Pull Request) to report a bug. It is a waste of both your and our time and effort. Open a regular ticket (issue), and attach a Perl (.pl) program illustrating the problem, if possible. If you believe that you have a program patch, and offer to share it as a PR, we may give the go-ahead. Unsolicited PRs may be closed without further action.

License

This product is licensed under the Perl license. You may redistribute under the GPL license, if desired, but you will have to add a copy of that license to your distribution, per its terms.

(c)copyright 2020-2022 by Phil M Perry; earlier copyrights held by Simon Cozens

History

Around 2009, Bram Stein wrote a Javascript implementation of the Knuth-Plass paragraph fitting algorithm named typeset (not to be confused with the language typescript, nor the publishing system Typeset). It may be found on GitHub in bramstein/typeset, and does not appear to be maintained (last update 2017). In 2011, Simon Cozens ported typeset to Perl, and called it Text::KnuthPlass, maintaining it for only a short time. In 2020, Phil Perry took over maintenance of this package.

Note: gitpan/Text-KnuthPlass (on GitHub) appears to be a Read-Only archive of Text::KnuthPlass from before Perry took over maintenance. It is old, and thus not very useful.

There are many copies of the Knuth-Plass paper/thesis, as well as discussions and explanations of the algorithm, floating around on the Web, so I will leave it to you to find some examples. Just the keywords Knuth and Plass should get you there.

There is also a refactored (still Javascript) version of typeset, intended for use as a library, in frobnitzem/typeset, which should be looked at, as it was maintained through 2017. Finally, there are a number of Knuth-Plass implementations in other languages, such as Python (akuchling/texlib) and typescript (avery-laird/breaker) that should be studied. And of course, there is the original Knuth-Plass paper and the annotated listing in TeX: The Program. It's just a matter of finding the time to go through all these sources and fix up Text::KnuthPlass, and then extend it in various ways.

An Example

Find an example of using Text::KnuthPlass in examples/PDF/Flatland.pl. It assumes that Text::Hyphen and PDF::Builder are installed. You can easily substitute PDF::API2 and change the PDF::Builder references in the code. You can change many settings, such as the font, font size, indentation amount, leading, line length (in Points), and whether output is flush right or ragged right. The output file is Flatland.pdf.

There are more examples, including KP.pl and Triangle.pl, both giving some usage examples to get various effects, for a variety of input texts. Both PDF and text file outputs are produced.