This describes functions for symbol table operations.
Each FST arc has an input (ilabel
) and output (olabel
) label. Symbol tables
can be used to map between these labels and actual strings (which may be bytes,
Unicode codepoints, phones, words, etc.). See the
symbol table documentation for more information.
fst::MergeSymbols
The function fst::MergeSymbols
takes two mutable FST arguments and an enum
specifying how the tables are to be merged:
MERGE_INPUT_SYMBOLS
: merges the input tables of the input FSTs.
MERGE_OUTPUT_SYMBOLS
: merges the output tables of the input FSTs.
MERGE_INPUT_AND_OUTPUT_SYMBOLS
: merges both input and output tables of the input FSTs (i.e., for intersection, union, etc.).
MERGE_LEFT_OUTPUT_AND_RIGHT_INPUT_SYMBOLS
: merges the left-hand side's input symbols with the right-hand side's output symbols (e.g., for composition).
Asked to merge two tables (which themselves may be null), the algorithm proceeds as follows:
--fst_compat_symbols=false
is set, do no work.
Only in the last case is there the possibility of a labeling conflict (i.e., the
two tables map separate labels to the same symbol, or separate symbols to the
same label). In the case of conflict, the second FST may require relabeling. The
fst::MergeSymbols
function does this automatically so long as the flag
--fst_relabel_symbol_conflicts
is set to true (the default). However, if
relabeling is required to resolve a conflict but this flag is set to false,
fst::MergeSymbols
logs a warning and returns false to indicate failure.
The above function is used extensively in Pynini to ensure symbol table compatibility for core rational operations like composition, intersection, and union. This is done automatically, except for the following special cases:
--fst_compat_symbols=false
is set, then symbol tables are simply assumed to be compatible.
--fst_relabel_symbol_conflicts=false
is set, then symbol tables are merged unlesss there is a conflict, in which case the higher-level operation will fail and raise FstOpError
.