* By default, only n-grams are printed (without backoff <epsilon>
transitions), in the same format as discussed above for reading in n-gram counts: w1 ... wk score
, where the score will be either the n-gram count or the n-gram probability, depending on whether the model has been normalized. By default, scores are converted from the internal negative log representation to real semiring counts or probabilities.
- By using the flag --ARPA, the n-gram model is printed in the well-known ARPA format.
- By using the flag --backoff, backoff <epsilon> transitions are printed along with the n-grams.
- By using the flag --negativelogs, scores are shown as negative logs, rather than being converted to the real semiring.
- By using the flag --integers, scores are converted to the real semiring and rounded to integers.
For writing n-gram counts and ARPA format models, tokens <s>
are used to represent start-of-sequence and end-of-sequence, respectively. Neither of these symbols are used in our automaton format. For the precise details of the n-gram format, see here
ngramprint [--options] [in.fst [out.txt]]
--ARPA: type = bool, default = false
--backoff: type = bool, default = false
--integers: type = bool, default = false
--negativelogs: type = bool, default = false
$ ngramprint --ARPA in.mod >out.ARPA-format.txt