SFst is a library for normalizing, sampling, combining, and approximating stochastic (or probabilistic) finite-state transducers. These are weighted finite-state transducers, represented in OpenFst library format, that have the following two properties:
For example, an n-gram model produced by the OpenGrm NGram Library is a stochastic FST (provided the phi_label
is specified to match the backoff label, typically 0, of the n-gram model), but many other topologies are possible.
The following operations are provided for SFSTs:
Operation | Usage | Description | Complexity |
---|---|---|---|
Condition | Condition(fst, phi_label, delta) | modifies input moving towards a globally normalizable FST, controlled by delta >= 0 | Time, Space: O(V + E) |
Approx | Approx(ifst, &ofst, phi_label, backoff_label, delta) | approximates a normalized stochastic FST (having phi_labels ) wrt a provided backoff model (having backoff_labels ) |
same as ShortestDistance on the intersection of the input and output FSTs |
sfstapprox[--phi_label=$l][--backoff_label=$j][--delta=$d] in.fst backoff.fst out.fst | |||
IsCanonical | IsCanonical(fst, phi_label) | checks the second property above holds for a weighted FST | Time, Space: O(V + E) |
IsNormalized | IsNormalized(fst, phi_label, delta) | checks the above two properties hold for a weighted FST | Time, Space: O(V + E) |
GlobalNormalize | GlobalNormalize(&fst, phi_label, delta) | globally normalizes, when possible^{2}, a canonical weighted FST preserving total path weights (up to a global constant) | same as ShortestDistance |
sfstnormalize [--method=global] [--phi_label=$l][--delta=$d] in.fst out.fst | |||
LocalNormalize | LocalNormalize(&fst) | locally normalizes, when possible, a canonical weighted FST preserving each state's out-going arc weights up to a state-specific constant | Time, Space: O(V + E) |
sfstnormalize -method=local in.fst out.fst | |||
NGramApprox | NGramApprox(ifst, &ofst, order, phi_label, backoff_label, delta) | approximates a normalized stochastic FST (having phi_labels ) as an n-gram model (having backoff_labels in OpenGrm NGram format) |
same as ShortestDistance on the intersection of the input and output FSTs |
sfstngramapprox [--order=$o][--phi_label=$l][--backoff_label=$j][--delta=$d] in.fst out.fst | |||
Perplexity | Perplexity(fst, phi_label, unknown_label, unknown_class_size) | computes perplexity for a stochastic FST | |
sfstperplexity [--phi_label=$l] [-unknown_label=$u][--unknown_class_size=$s] in.fst test.far | (test sentences are in FST archive format) | ||
PhiNormalize | PhiNormalize(&fst, phi_label) | normalizes, when possible, a canonical weighted FST by only modifying the failure transitions | Time, Space: O(V + E) |
sfstnormalize --method=phi [-phi_label=$l][--delta=$d] in.fst out.fst | |||
RandGen | fst::RandGen(ifst, &ofst, fst::RandGenOptions<SFstArcSelector |
randomly generates paths in a stochastic FST (correctly dealing with failure transitions) | see RandGen |
sfstrandgen [--phi_label=$l] [--max_length=$l] [--npath=$n] [--seed=$s] in.fst out.fst | |||
Trim | Trim(&fst, phi_label) | Removes useless states and transitions in stochastic automata (irrespective of weights) | Time, Space: O(V + E * max-phi-order) |
sfsttrim -phi_label=$l in.fst out.fst |
^{1}Computation is done internally with Log64Weight. Conversion from the input weight type is done using a WeightConvert
functor, pre-defined for common weight types like TropicalWeight
and LogWeight
.
^{2}Possible when the sum of weight of all successful paths from the initial state is finite (and the input is trim).