OpenGrm SFst Library: Stochastic Finite-State Transducer Library

SFst is a library for normalizing, sampling, combining, and approximating stochastic (or probabilistic) finite-state transducers. These are weighted finite-state transducers, represented in OpenFst library format, that have the following two properties:

  1. normalized: the weights of the transitions (and any final weight) from each state sum to Weight::One()1
  2. canonical topology:
    • the states are sorted by input label
    • there may be failure transitions (see phi-labeled transitions) but
      • there is at most one such transition per state
      • there are no failure-transition (and/or epsilon-transition) cycles
      • the summation to admit normalized follows the failure transitions through to actual transitions accumulating the weight
    • no assumption is made of general determinism or what transitions must be present on failure (unlike in a canonical n-gram model).

For example, an n-gram model produced by the OpenGrm NGram Library is a stochastic FST (provided the phi_label is specified to match the backoff label, typically 0, of the n-gram model), but many other topologies are possible.

The following operations are provided for SFSTs:

Operation Usage Description Complexity
Condition Condition(fst, phi_label, delta) modifies input moving towards a globally normalizable FST, controlled by delta >= 0 Time, Space: O(V + E)
  sfstnormalize -condition=$d ... in.fst out.fst (as a pre-conditioning step in normalization)  
IsCanonical IsCanonical(fst, phi_label) checks the second property above holds for a weighted FST Time, Space: O(V + E)
IsNormalized IsNormalized(fst, phi_label, delta) checks the above two properties hold for a weighted FST Time, Space: O(V + E)
LocalNormalize LocalNormalize(&fst) locally normalizes, when possible, a canonical weighted FST preserving each state's out-going arc weights up to a state-specific constant Time, Space: O(V + E)
  sfstnormalize -method=local in.fst out.fst    
NGramApprox NGramApprox(ifst, &ofst, phi_label, backoff_label, delta) approximates a normalized stochastic FST (having phi_labels) as an n-gram model (having backoff_labels in OpenGrm NGram format) same as ShortestDistance on the intersection of the input and output FSTs
  sfstngramapprox [--order=$o][--phi_label=$l][--backoff_label=$j][--delta=$d] in.fst out.fst    
Normalize Normalize(&fst, phi_label, delta) globally normalizes, when possible2, a canonical weighted FST preserving total path weights (up to a global constant) same as ShortestDistance
  sfstnormalize [--phi_label=$l][--delta=$d] in.fst out.fst    
Perplexity Perplexity(fst, phi_label, unknown_label, unknown_class_size) computes perplexity for a stochastic FST  
  sfstperplexity [--phi_label=$l] [-unknown_label=$u][--unknown_class_size=$s] in.fst test.far (test sentences are in FST archive format)  
PhiNormalize PhiNormalize(&fst, phi_label) normalizes, when possible, a canonical weighted FST by only modifying the failure transitions Time, Space: O(V + E)
  sfstnormalize --method=phi [-phi_label=$l][--delta=$d] in.fst out.fst    
RandGen fst::RandGen(ifst, &ofst, fst::RandGenOptions<SFstArcSelector>(...)) randomly generates paths in a stochastic FST (correctly dealing with failure transitions) see RandGen
  sfstrandgen [--phi_label=$l] [--max_length=$l] [--npath=$n] [--seed=$s] in.fst out.fst    

1Computation is done internally with Log64Weight. Conversion from the input weight type is done using a WeightConvert functor, pre-defined for common weight types like TropicalWeight and LogWeight.
2Possible when the sum of weight of all successful paths from the initial state is finite (and the input is trim).

Topic revision: r11 - 2016-12-23 - MichaelRiley
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2017 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback