OpenFst Forum

You need to be a registered user to participate in the discussions.
Log In or Register

You can start a new discussion here:

help You can use the formatting commands describes in TextFormattingRules in your comment.
tip If you want to post some code, surround it with <verbatim> and </verbatim> tags.
warning Auto-linking of WikiWords is now disabled in comments, so you can type VectorFst and it won't result in a broken link.
warning You now need to use <br> to force new lines in your comment (unless inside verbatim tags). However, a blank line will automatically create a new paragraph.

Subject
Comment
Log In

Compatibility issues with new C++0x (C++11) and gcc 4.6

VicentAlabau - 20 Feb 2012 - 06:05

I am trying to compile a c++11 program that uses openfst but I obtain the following error:

/usr/include/fst/interval-set.h: In member function 'void fst::IntervalSet::Intersect(const fst::IntervalSet&, fst::IntervalSet*) const': /usr/include/fst/interval-set.h:221:16: error: parse error in template argument list /usr/include/fst/interval-set.h: In member function 'void fst::IntervalSet::Complement(T, fst::IntervalSet*) const': /usr/include/fst/interval-set.h:243:18: error: parse error in template argument list /usr/include/fst/interval-set.h:250:16: error: parse error in template argument list /usr/include/fst/interval-set.h: In member function 'bool fst::IntervalSet::Contains(const fst::IntervalSet&) const': /usr/include/fst/interval-set.h:351:54: error: expected ')' before 'it1'

I seems that gcc gets confused with the member variables 'begin' and 'end' in 'struct Interval', probably because they interfere with the new c++11 feature Range-based for-loop.

The problem can be easily solved by renaming such variables for instance to 'begin_' and 'end_', respectively.

If it does not break compatibility with older code or libraries, would you mind considering to update your source code?

Thank you in advance.

VicentAlabau - 20 Feb 2012 - 06:24

I have attached an updated interval-set.h. openfst compiles and passes the checks, though they seem not to make use of InvervalSet. I hope not to have introduced any bugs. I look forward for your opinion.

VicentAlabau - 20 Feb 2012 - 06:25

By the way, you can obtain the error by configuring openfst like this:

./configure CXXFLAGS=-std=c++0x

Thanks

 
Log In

Cannot figure how to get reference to pass to SingleShortestPath?

NitinMadnani - 19 Feb 2012 - 16:16

I have the following code:

...

StdVectorFst* m_transA2 = new StdVectorFst();
Map(m_transA, m_transA2, LogToStdMapper());

// .. some more processing with m_transA2

StdVectorFst* m_transA3 = new StdVectorFst();
vector<Weight> *distance;


SingleShortestPath(*m_transA2, m_transA3, distance)
...

However, when trying to compile this code, I get an error that this signature does not match the defined SingleShortestPath signature. What am I doing wrong?

However, w

PaulDixon - 20 Feb 2012 - 02:16

PaulDixon - 20 Feb 2012 - 02:17

typedef  StdArc::Weight Weight;
  StdVectorFst* m_transA2 = new StdVectorFst();  
  StdVectorFst* m_transA3 = new StdVectorFst();
  AnyArcFilter<StdArc> filter;
  vector<Weight> distance;
  AutoQueue<StdArc::StateId> Q(*m_transA2, &distance, filter);
  ShortestPathOptions<StdArc, AutoQueue<StdArc::StateId>, AnyArcFilter<StdArc> > opts(&Q, filter);
  SingleShortestPath(*m_transA2, m_transA3, &distance, opts);
  //This should do the same
  ShortestPath(*m_transA2, m_transA3);
 
Log In

Converting from Tropical Weight to Log Weight

NitinMadnani - 16 Feb 2012 - 18:33

I am an OpenFST newbie and I am really confused. If I convert a standard FST to the log semiring using fstmap --map_type=to_log, shouldn't that also actually change the weights in the FST. Right now, it seems like all it does is change the arc type but keeps the actual weights the same. Does that mean I need to convert the weights myself?

Also, would using Map(ifst, &ofst, StdToLogMapper()) have the same behavior or would that actually do the weighty conversion?

Tnx!

PaulDixon - 17 Feb 2012 - 20:25

Values in the tropical (standard) and log semirings are the same, the difference is the Plus operator

NitinMadnani - 19 Feb 2012 - 15:21

I think I understand now. I have to do the weight conversion myself.
 
Log In

Thrax newbie question

PierreIsabelle - 16 Feb 2012 - 11:04

I have started playing with the Thrax grammar compiler and I have a very simple question.

Is there any ready made tool that I can use to transform a text file into a basic fsa?

-- Pierre

PierreIsabelle - 16 Feb 2012 - 14:01

PierreIsabelle - 16 Feb 2012 - 14:07

Just in case the above request wasn't clear enough: what I mean by "basic fsa" there is an fsa that encodes the text simply as a sequence of characters. Such an fsa will then become the input to the tokenizing transducer I am developing.

-- Pierre

AlexZatv - 22 Feb 2012 - 09:51

it is very easy operation. In case if you need it, I have python script which can make such operation for any sequence of words, or for HTK MLF files (it make an fst-file for each sub-file in mlf). I can send it to you.

AlexZatv - 22 Feb 2012 - 09:51

it is very easy operation. In case if you need it, I have python script which can make such operation for any sequence of words, or for HTK MLF files (it make an fst-file for each sub-file in mlf). I can send it to you.
 
Log In

far weirdness strikes again

DoganCan - 13 Feb 2012 - 06:02

Hi everyone,

I am having a problem similar to this post with the latest release (1.2.10) on a 64-bit Ubuntu 11.10 machine. Here is what I get when I try to use the far binaries to read a far file created by another far binary:

echo "0 1" | farcompilestrings | farinfo -
ERROR: Error reading FAR: -
ERROR: GenericRegister::GetEntry : -arc.so: cannot open shared object file: No such file or directory
FATAL: No operation found for "FarInfo" on arc type

Has anyone come across this problem?

Cheers,

Dogan

DoganCan - 14 Feb 2012 - 23:38

Never mind, I figured out the problem. It seems like far binaries do not work with the stdio streams.

Cheers, Dogan

 
Log In

Minimizing an unweighted transducer is slow

ErikAxelson - 09 Feb 2012 - 09:08

Hi all,

I'm using OpenFst for compilation of natural language morphologies. Minimization of big unweighted transducers (~ 100 000 states) seems to be about four times slower than with other freely available finite-state transducer libraries. I'm using StdVectorFst with weight values TropicalWeight::Zero() and TropicalWeight::One(), so i guess that weight handling shouldn't be an issue?

Minimization is carried out for an unweighted transducer t as follows:

    RmEpsilon<StdArc>(t);
    EncodeMapper<StdArc> encode_mapper(kEncodeLabels,ENCODE);
    Encode(t, &encode_mapper);
    StdVectorFst * det = new StdVectorFst();
    Determinize<StdArc>(*t, det);
    Minimize<StdArc>(det);
    Decode(det, encode_mapper);
    return det;

Is there something in the above code that could be optimized? I already tried (kEncodeLabels | kEncodeWeights) instead of (kEncodeLabels), but it didn't have any effect.

regards, Erik

MichaelRiley - 10 Feb 2012 - 14:26

The encode and decode steps are unnecessary if the input is unweighted (i.e restricted to Zero() and One()). Also be sure you have latest version, since there was a suboptimal hash function for determinization in earlier versions. the latest version

ErikAxelson - 13 Feb 2012 - 06:25

It seems that my comment got for some reason under another topic, so here it is again:

I tried omitting the encode and decode steps, but it made the compilation just a couple of seconds faster, so it still takes about seven minutes to compile the whole morphology. I'm using the newest version of OpenFst, 1.2.10. Thanks for the tip, anyway.

- Erik

ErikAxelson - 20 Feb 2012 - 09:02

Hi.

Could it be that OpenFst's minimization algorithm is not optimized well enough? Minimization used to be slow in SFST library as well, but now it is faster because of the changes made in SFST's code based on our benchmarking tests in the HFST project.

If I have understood correctly, it requires some work to implement Hopcroft's algorithm so that its performance will actually be the theoretical O(n log (n)). Of course, it is also possible that we are not using the OpenFst functions right.

- Erik

 
Log In

Using feature vector as input

DominicHowell - 13 Jan 2012 - 11:54

Hi Has anyone used this toolbox with feature vector inputs instead of symbols in the application of speech recognition?

 
Log In

Segmentation Fault at initialization (Static Order Initialization Fiasco instance?)

AntoineRaux - 10 Jan 2012 - 21:17

I have a program that links to libfst.so (openfst-1.2.10, under Ubuntu 11.10, gcc 4.6.1) and it dies at start up. Other programs such as the openfst command line tools work fine. Also the problem does not occur on a different machine...

Starting up in a debugger shows that the crash happens in float_weight.h, specifically at the following line (which is the initialization of a static class member):

template <class T>
const T FloatLimits<T>::kPosInfinity = numeric_limits<T>::infinity();
I am now wondering if it could not be an issue with static order initialization (see e.g. http://yosefk.com/c++fqa/ctors.html#fqa-10.12), i.e. that the templated class numeric_limits is not yet initialized when that line is called and thus the call to the static member function infinity() crashes. Does this make sense? This problem is intrinsically hard to reproduce since it depends on compiler optimization, runtime issues, ... Has anyone else encountered this? If so, how can that be fixed? If not, any idea what might be happening?

Thanks.

AntoineRaux - 11 Jan 2012 - 12:33

Actually a better explanation of the static order initialization fiasco is at http://www.parashift.com/c++-faq-lite/ctors.html#faq-10.14 In my copy of float_weight.h, I (temporarily) replaced "numeric_limits::infinity()" with "((T)1)/((T)0)" (and similarly "numeric_limits::bad_number()" with "((T)0))/((T)0)"), thus removing the call to a static member function in kPosInfinity's initialization, and my program now works correctly, which seems to confirm my first impression about the cause of the segfault. Of course, this does not look like a good long term solution...
 
Log In

Problem with PhiMatcher? : "ComposeFst: 1st argument cannot match on output labels and 2nd argument cannot match on input labels (sort?)."

JuliusGoth - 08 Jan 2012 - 10:14

I was able to figure out the answer to my previous question regarding matching on failure arcs. Now, I'm at another stumbling block. I have a language model FST composed onto another FST, where I am trying to use Phi matchers to recursively search non-matched symbols from the 2nd FST.

First, I rewrite the symbol id 0 I've allocated to "" in the textual representation of the FST to -2:

vector<pair<LogArc::Label, LogArc::Label> >* ipairs;
vector<pair<LogArc::Label, LogArc::Label> >* opairs;
ipairs = new vector<pair<LogArc::Label, LogArc::Label> >;
opairs = new vector<pair<LogArc::Label, LogArc::Label> >;
pair<LogArc::Label,LogArc::Label> mypair(0,-2);

Now I declare the matchers in the ComposeFst options object:

opts.gc_limit = 0;
opts.matcher1 = new PM(*lm, MATCH_OUTPUT, -2);   
opts.matcher2 = new PM(*noise, MATCH_NONE, kNoLabel);

Now, I try to compose the two FSTs:

composition = new ComposeFst<LogArc>(*lm, *fst2, opts);

This is where I get the error. I stepped into the compose code and noticed that while I declared the matchers as MATCH_OUPUT/MATCH_NONE, the match_type variable is set to MATCH_NONE for both FSTs and thus failing to set a match_type.

JuliusGoth - 08 Jan 2012 - 10:16

Pardon me, noise == fst2 in my previous example. Sorry for the confusion.

PaulDixon - 08 Jan 2012 - 17:13

Try sorting the fsts before composition

ArcSort(lm, OLabelCompare<LogArc>());
ArcSort(fst2, ILabelCompare<LogArc>());
 
Log In

NumArcs? (stateId) in a delayed FST without expanding

LexOr - 08 Jan 2012 - 06:26

line1: ComposeFst C( A, B );
line2: C.NumArcs( C.Start() );

Why is C expanded in line2?
How can I check whether C.Start() has no arcs without expanding C?
In other words, how can I check that C.Start() is not connected to any other state without expanding C?
Best Regards
LexOr

PaulDixon - 08 Jan 2012 - 16:40

You have to expand C.Start() to count the arcs or even know if C.Start() has any arcs at all, see the background papers on composition, One expection where you could check if C.Start() is connected to any other states without expanding would be if A and/or B didn't have a valid start state in which case C wouldn't not have start state. You could check this by A.Start() == kNoStateId....

 
Log In

Can I use Rho/Phi matchers together with LookAheadMatcher? ?

LexOr - 05 Jan 2012 - 12:06

 
Log In

Update on composition with failure transitions?

JuliusGoth - 04 Jan 2012 - 16:14

Relating to the older posting from 21 Jul 2008:

"There is an undocumented support for failure transitions in composition. This feature is not documented and the handling of failure transitions will change significantly in the next version of the library. "

What is the current status of dealing with failure transitions? I was interested in composing FSTs where the transitions in one FST do not always match with the LM FST. I understand a PhiMatcher exists in the framework. Is this still the optimal way of performing this operation?

Need an example for using LookAheadComposeFilter? with LookAheadMatcher? .

LexOr - 04 Jan 2012 - 09:59

What is the problem with this:
....
VectorFst<StdArc> A, B;

typedef ArcLookAheadMatcher< SortedMatcher<StdVectorFst> > LAM;
typedef SequenceComposeFilter< LAM, LAM > SCF;
typedef LookAheadComposeFilter< SCF, LAM, LAM, MATCH_BOTH> LCF;

LAM* lam1 = new LAM(A, MATCH_OUTPUT);
LAM* lam2 = new LAM(B, MATCH_INPUT);

LCF* laf1 = new LCF( A, B, lam1, lam2, MATCH_BOTH ); //here is the compilation problem
...
How do I use LookAheadComposeFilter?
Thank You

PaulDixon - 04 Jan 2012 - 17:47

Try changing the last line to LCF* laf1 = new LCF( A, B, lam1, lam2);

But if you want to use the LookAheadComposeFilter for composition there are already definitions and conversion function in the library.

laf1 will take ownership of lam1,2 so there is no need to call delete on them.

LexOr - 05 Jan 2012 - 06:21

LexOr - 05 Jan 2012 - 06:25

Paul, Thank You. It is OK now.
What conversion functions do you mean?
Can I use Rho/Phi Matcher together with LookAheadMatcher?

Regards,
LexOr

PaulDixon - 06 Jan 2012 - 05:33

StdFst* C = Convert(A, "arc_lookahead"); 

Not sure if the arc_lookahead type needs B to be relabelled.

 
Log In

Can ComposeFst? "shift" input labels?

AlexZatv - 14 Dec 2011 - 13:05

Can ComposeFst "shift" input labels relatively output labels? Let say, I have T1: state1 state2 X1 Y1 state2 state3 X2 Y2 ....etc

And transducer T2: state1 state2 Y1 state2 state3 Y2 Z2 ....etc

Can I be sure that in T1oT2 I will see arc X2:Z2, (not X1:Z2 and X2:eps?) I understand, that theoretically it is not required. And I know that lookahead composition definitely will move this labels. But, can I force usual composition to not make such moves?

LexOr - 04 Jan 2012 - 09:52

 
Log In

Can you please explain why ProjectFst? < StdArc? >(C, PROJECT_OUTPUT) does not work, when C is of the type ComposeFst? ?

LexOr - 11 Dec 2011 - 10:47

Here is a code:

 line1: ComposeFst<StdArc> C(A, B, opts); 

 line2: ProjectFst< StdArc >(C, PROJECT_INPUT);

I printed A, B and C after line1, and they are correct.
When I print C after line2, it is not become an acceptor, rather it is still a transducer.

Thank You For Your Help,

LexOr

LexOr - 11 Dec 2011 - 12:05

My fault... I found the bug.

LexOr

 
Log In

reduce memory consumption of fstcompile

JaroslavDobrek - 18 Nov 2011 - 04:02

Hello,

is there a way to tell fstcompile not to use the main memory, but the hard disk while compiling?

I am unable to compile really large transducers, because fstcompile consumes so much memory.

Jaroslav

AlexZatv - 29 Nov 2011 - 03:24

I don't know good answer,but there is one bad:) someone in this forum was curious about binary fst format. If you will find it, you can make your own "fstcompile". Or, you can make your (partially) own fst format (not VectorFst or ConstFst), with routines to read and write it to disk, and register it with openfst.
 
Log In

google-like autocomplete and openfst

CarlosGonzalezCadenas - 14 Nov 2011 - 06:20

Hello,

We'd like to implement a Google-like autocomplete system. Given that we have 100s of millions of queries, for us memory footprint is as important as runtime input matching speed. Our queries have different probabilities and therefore for us it's critical some support for weighting. We think that using FSTs would be a good match for this problem.

Do you know if there is someone that actually has implemented an autocomplete system like the one we describe using OpenFST?.

If not, it would be great if you can provide some insights / guidelines about how to approach the problem.

Thanks a lot for your help. Carlos

PaulDixon - 22 Nov 2011 - 06:46

Something along these lines this should work. First, Build a weighted deterministic Fst from the queries. Given an input match through the Fst as far as possible, then run a visit to get the possible completions.
 
Log In

Bit of confusion on usage of ExpectationArc?

JuliusGoth - 08 Nov 2011 - 17:10

I was looking to implement an FST with arcs storing tropical weights as well as a vector of floating points, to hold expectation values for E-M, and it looked like ExpectationArc held the key to accomplish this. The documentation states the following on ExpectationWeight:

X1 is usually a probability weight like LogWeight X2 is usually a random variable or vector see SignedLogWeight or SparsePowerWeight

I tried several combination to no effect, including ExpecationArc<StdArc, vector>, ExpectationArc<StdArc, StdArc>, etc. with no success. I think that perhaps the second sentence in the documented source code:

If X1 is distinct from X2, it is required that there is an external product between X1 and X2 and if both semriring are commutative, or left or right semirings, then result must have those properties. 

Addresses my problem, but can someone tell me what I must do to satisfy this requirement? Do I need to implement another Arc subclass? Thanks in advance!

JuliusGoth - 08 Nov 2011 - 17:21

To add, the expectation values would consist of a set/vector of two floating point weights.
 
Log In

Nice way of computing total cost of all paths in FST?

JuliusGoth - 02 Nov 2011 - 14:33

Anyone know of an easy way to do this? I tried one method involving converting all arc input and outputs to Epsilons (Relabel), then running RmEpsilon. The latter produces the following error twice:

ERROR: SymbolTable::Write: write failed

I'm sure there's a nice recursive algorithm for computing this, but is there some way to do this via some command line operations? Thanks!

PaulDixon - 02 Nov 2011 - 22:29

Maybe ShortestDistance in the log semiring.

AlexZatv - 11 Nov 2011 - 05:43

Theoretically, wfst may give the weight only to input string (acceptor) or to mapping from input string to output string. So, theoretically, RmEpsilon don't have to work in your case.

I'm not sure, but may be this will help: 1)change all symbols to epsilon, as you do 2)add only one arc from new state to previous start-state, with some non-epsilon-symbol (let say, A). And make that new state as start state ( like superfinal - "superstart":) ). 3)RmEpsilon in log semiring 4)determinize (this will not needed if you add "super-superfinal" arc, not "superstart") After that, you will have only that your arc with some weight, and final state with some weight. Make Times(arcW,finalW) - and, I think, you will get what you need.

I think this will work.Am I right?

MichaelRiley - 29 Dec 2011 - 22:52

ShortestDistance as Paul says.
 
Log In

Link errors in ubuntu linux

NewGuy - 02 Nov 2011 - 09:42

I got the same error, ../script/.libs/libfstscript.so: undefined reference to `dlopen' ../script/.libs/libfstscript.so: undefined reference to `dlerror'

can someone help me ?

AlexZatv - 02 Nov 2011 - 11:16

Hello! It's something strange. I have openfst on ubuntu linux 10.10, and it's work fine. I don't know that's the problem, but may be I'll give someone a hint.

dlopen and dlerror are located in libdl.so. Bug like this sometimes happen in my programs if I try to link with openfst libraries, but forget to link with libdl.so. To link with it, you need to pass -ldl to compiler. But, I see that this parameter is passed to libtool... May be, if you will pass --enable-static to configure-script,it will work?

SamHardwick - 15 Nov 2011 - 10:51

At least in some cases this is due to the linking directives -lm and -ldl ending up in the middle of the compilation line. This can be remedied by replacing in src/bin/Makefile.am the line

AM_LDFLAGS = -lm -ldl

with

AM_LDFLAGS = ../script/libfstscript.la ../lib/libfst.la -lm -ldl

and omitting all the tool-specific LDADD-lines (ones like fstarcsort_LDADD = ../script/libfstscript.la ../lib/libfst.la).

KevinUnhammer - 26 Dec 2011 - 12:51

Just to give another data-point, this change made 1.2.9 compile for me on xubuntu 10.10 oneiric

MichaelRiley - 29 Dec 2011 - 22:54

This is fixed I believe in just uploaded 1.2.10. Used above approach but with LDADD not AM_LDFLAGS.
 
Log In

Link errors in ubuntu linux

AnakreontasMentis - 25 Oct 2011 - 07:32

The linker reports errors when compiling the sources. The reported error is:
Making all in bin
make[3]: Entering directory `/home/anakreon/tmp/d/openfst-1.2.7/src/bin'
g++ -DHAVE_CONFIG_H   -I./../include -I./../script    -g -O2 -MT fstarcsort.o -MD -MP -MF .deps/fstarcsort.Tpo -c -o fstarcsort.o fstarcsort.cc
mv -f .deps/fstarcsort.Tpo .deps/fstarcsort.Po
/bin/bash ../../libtool --tag=CXX   --mode=link g++  -g -O2 -lm -ldl  -o fstarcsort fstarcsort.o ../script/libfstscript.la ../lib/libfst.la
libtool: link: g++ -g -O2 -o .libs/fstarcsort fstarcsort.o  -lm -ldl ../script/.libs/libfstscript.so ../lib/.libs/libfst.so
../script/.libs/libfstscript.so: undefined reference to `dlopen'
../script/.libs/libfstscript.so: undefined reference to `dlerror'
collect2: ld returned 1 exit status
make[3]: *** [fstarcsort] Error 1
make[3]: Leaving directory `/home/anakreon/tmp/d/openfst-1.2.7/src/bin'
make[2]: *** [all-recursive] Error 1
make[2]: Leaving directory `/home/anakreon/tmp/d/openfst-1.2.7/src'
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory `/home/anakreon/tmp/d/openfst-1.2.7'
make: *** [all] Error 2

HistogramPruneQueue? ? or, ConstantSizeQueue? ?

AlexZatv - 14 Oct 2011 - 07:32

Are there any state queues which guarantee upper size limit? I want a queue which will contain, let say, 2000 items. And, if someone will try to add one more item, item with the worsest weight will be deleted from the queue, and new one will be added.

There is NaturalPruneQueue - it is almost like I want, but it is not have the size limit smile

 
Log In

Modifying an arc in constant time

HasanAkram - 13 Oct 2011 - 04:50

Is it possible to do the following task in constant time?

 
 for(MutableArcIterator<CustomFST> aiter(&fst, state_id);!aiter.Done(); aiter.Next())
      {
         GallicArc<LogArc, STRING_LEFT> arc = aiter.Value();  
         if(arc.ilabel==input_symbol)
         {
            arc.ilabel = label;
            aiter.SetValue(arc);
            break;

         }
      }

All i need to do here is to modify one single arc, and my code should work in constant time. I am sure openfst has better ways that I am missing.

AlexZatv - 14 Oct 2011 - 07:26

As far as I understand, it will be O(n) from number of outgoing arcs. Also you can sort your fst (ArcSort), sorting arcs by input labels , and use matchers to find ilabels that you want - it will be O(log(n)).

Also, in KALDI http://kaldi.sourceforge.net/index.html they implement and use TableMatcher. I don't ever use it, but as far as I understand, it will give you constant time. But it will use more memory.

May be there is something better then this.

 
Log In

fstshortestpath gives reversed output?

DavidZhang - 09 Oct 2011 - 23:24

Hi there. I have composed two fsts
fstcompose A.fst B.fst out.fst

and run a fstshortestpath on the output

fstshortestpath out.fst out.shortest.fst

But the final fat (out.shortest.fst) outputs a reversed string as it should be.

Any ideas? Thanks!

CyrilAllauzen - 12 Oct 2011 - 15:03

Hi, are you sure about this? The state ordering will be reverse topological and fstprint will print the arcs in reverse but the machine should be well formed.
 
Log In

Odd behavior for RmEpsilon? for Fst with LogArc?

JuliusGoth - 05 Oct 2011 - 21:45

I have the result of an Fst composition with epsilon arcs (both in and out labels), with some self loops. Some of the weights are not provided and the default 0. I read another message regarding zero weights and undefined behavior but Verify() returns true so it seems like a valid FST. Part of the printed FST looks like this:

0     1     <epsilon>     <epsilon>     2.29587674
1     1     <epsilon>     <epsilon>
1     2     that     that     4.43530607
2     3     <epsilon>     <epsilon>     1.30640042
2     4     <epsilon>     <epsilon>
2     5     is     is     5.14776754
3     3     <epsilon>     <epsilon>
3     6     is     is     6.12910652

When I run RmEpsilon in code and generate the FST again, I get something like this:

0   1   that   that   -0.200286627
1   3   is   is   0.504037499
1   2   is   is   -1.78467798

What am I missing? Why are negative weights being produced in this case?

JuliusGoth - 09 Oct 2011 - 08:48

After some tweaking, I did notice that removing self-loops does eliminate the odd negative weights in the post-RmEpsilon fst.

Another concern was raised, however. I tried the following example FST:

0   1   <epsilon>   <epsilon>   2.29587674
0   2   <epsilon>   <epsilon>
0   3   But   But   3.66515589
1   4   But   But   6.58336163
2   3   But   But   3.66515589
3   4   <epsilon>   <epsilon>   1.03407383
4   5   <epsilon>   <epsilon>   1.04145384
5   6   through   through   6.26490784
6   7   <epsilon>   <epsilon>   1.45167708
7   8   I   I   3.89830112
8   9   <epsilon>   <epsilon>   1.6464082
8   10   <epsilon>   <epsilon>
8   4.20359564
9   3.0302887
10   4.20359564

Running RmEpsilon I came up with the following FST:

0   2   But   But   8.87923813
0   1   But   But   2.97200871
1   3   through   through   8.34043503
2   3   through   through   7.30636168
3   4   I   I   5.34997845
4   3.23925138

I traced this on paper to verify that the weights were being consolidated correctly (from the elimination of epsilon transitions), and noticed that state 0 -> 1 on the new FST is approximately 0.7 less than it should be. State 0 -> 3 (or 0 -> 2 -> 3) is 3.66 pre-RmEpsilon and is 2.97 in the post-RmEpsilon FST. First, have I missed something in manually verifying this? Second, is there an easy way to eliminate self-loops programmatically to prepare an FST for running RmEpsilon, or is it just a matter of walking through the arcs and somehow removing them from the FST?

AlexZatv - 11 Oct 2011 - 11:03

Hello! I don't know about programs to eliminate the self-loops, but it is very easy to do. Just put the output of "fstprint" to some script. In this script you read the input string, split on parts. If number of parts>2 and part1=part2, it's a self loop, and you can just skip it.

CyrilAllauzen - 12 Oct 2011 - 14:24

Hi, I'm not sure why you're surprised by the presence of negative numbers. Adding two numbers less than 1 can lead to something greater than 1 hence +-log adding to positive number can lead to a negative number.

As for your second example, -l(e(-3.66515589) + e(-3.66515589)) is 2.97200870944005469070 so the output of RmEpsilon looks fine to me.

Cyril

 
Log In

Bug in VectorFst? ?

AlexZatv - 05 Oct 2011 - 10:14

Hello!

VectorFst may fail, if after the creation of ArcIterator, user will add some arc with AddArc function, or call ReserveArcs(). In such case, in initArcIterator, we will store the pointer to the first arc. But, AddArc or ReserveArc can relocate array of arcs, so pointer will point to nothing (and,probably, nArcs will be wrong number).

Is it a bug, or it is so "by design" ?

CyrilAllauzen - 12 Oct 2011 - 14:27

Hi, this is by design and similar what STL does with vectors. Mutating a vector invalidates the opened iterators, mutating the arcs at a given state invalidates the arc iterator opened for that state.

AlexZatv - 13 Oct 2011 - 03:18

Thank you for explanations!
 
Log In

Minimize gives different results on different systems

TrevorJim - 04 Oct 2011 - 11:45

Hi. I've run into an FST that minimizes differently on different operating systems. Is this expected behavior?

Both systems are running openfst 1.2.7 with the default configuration.

The resulting FSTs are equivalent according to fstequivalent, however, they are not identical, which is inconvenient for some of my regression tests.

System 1 (a 64-bit Linux):

    $ uname -a
    Linux ono 3.0-ARCH #1 SMP PREEMPT Tue Aug 30 08:53:25 CEST 2011 x86_64 Intel(R) Core(TM)2 Duo CPU U9600 @ 1.60GHz GenuineIntel GNU/Linux

    $ g++ --version
    g++ (GCC) 4.6.1 20110819 (prerelease)

System 2 (Mac OS X):

    $ uname -a
    Darwin -------------.local 10.8.0 Darwin Kernel Version 10.8.0: Tue Jun  7 16:33:36 PDT 2011; root:xnu-1504.15.3~1/RELEASE_I386 i386

    $ g++ --version
    i686-apple-darwin10-g++-4.2.1 (GCC) 4.2.1 (Apple Inc. build 5666) (dot 3)

Input FST (a deterministic acceptor).

Output of fstminimize on system 1.

Output of fstminimize on system 2.

CyrilAllauzen - 12 Oct 2011 - 14:34

Hi, Many things could affect the state ordering hence leading to not identical machines. You could change your regression to test that the machines are equivalent and have the same number of states.

TrevorJim - 19 Oct 2011 - 09:19

Thanks, but the transducer is an intermediate result and not easily checked in our workflow.

I should have mentioned that we are using unweighted acceptors; I'm surprised that minimize is not deterministic on these.

 
Log In

Help with setting up openfst on windows7

SophieChan - 29 Sep 2011 - 08:08

Hi, I have to use openfst to calculate the edit distance between a group of strings with a fixed finite state acceptor. My friend suggested me to use the C++ version of openfst. But I am using 64 bit windows7 and visual studio 2010 express. I tried to load the 64 bit file of openfst but it did not work on my computer. Is there a way around it? For example, can I use the linux version or the command line version to calculate the edit distance?

Thanks a lot for all your help in advance!

Best Sophie

PaulDixon - 29 Sep 2011 - 18:12

What exactly didn't work with 64bit on Windows? Compiling the solution or running an exe? Unless you need more than 2GB of memory for your operations you could just run the pre-built 32bit binaries on 64bit Windows.
 
Log In

not-integer StateId? 's - is it possible?

AlexZatv - 29 Sep 2011 - 05:22

I'm making Fst class, with states stored in list, not vector, to speedup adding and deleting of states. So, it is natural for me to declare StateId as pointer to internal state structure. But, to define my ListFst as child of MutableFst, I have to override pure virtual function StateId NumStates() const;

Here NumStates must return the number, but my StateId's are pointers! How can I deal with this? May be it's simply impossible with OpenFst to have non-integer StateId's?

PaulDixon - 29 Sep 2011 - 19:39

Even if you could get the NumStates to return a pointer I think it will cause problems for many of the algorithms. As buffers are frequently created based on the range of stateids (e.g. shortestpath/distance).

It's seem you are trying convert graph that uses pointers to OpenFst format. How you considered performing the conversion to OpenFst once no more edits are required, that way you can easily use hash_map to convert pointers to stateids and use the VectorFst

AlexZatv - 30 Sep 2011 - 05:14

Paul, Thank you for answer! I make NumStates() return NULL, just to try it. I see many compile-time errors, exactly in shortestpath (in queues specifically).

I can convert my graph to VectorFst after making all edits, or I can write my own implementation of some fst algorithms that I use. But I think that it will be better, and will work faster, if I will do my own lattice-class under OpenFst's interface. If it's possible,of course. If it is not, then why they introduce StateId type, not just us integers?

Now I want to try it this way. In internal state structure, I will store the number of state. I will make the class "StateId" with pointer to state, and operator of conversion to integer. It is fast, just dereferencing the pointer. If it will work,it will be fine.

If not, I will have to make backward conversion, from integer ids to state's pointers. It have to be the map or something else, big and/or slow:(

Or I will stop this attempts, and start to make my own ShortestPath and determinization for my specific structure:(

AlexZatv - 30 Sep 2011 - 10:18

Hmm, may be std::deque ...

I compile with StateId as wrapper around the pointer to the lists, fst::ShortestPath needs for operation++, and sometimes for "StateId i = 0". It is possible to do with lists, but this: StateId nscc = *max_element(scc_.begin(), scc_.end()) + 1; is something I don't know how to implement.

 
Log In

Running stream of input through FST and obtaining output and total weight

JuliusGoth - 28 Sep 2011 - 22:14

If I have made an fst using fstcompile, how would I process an array of input tokens and obtain the output as well as the weight? I was writing an implementation that traversed the fst arcs with the input but thought maybe there was an easier way to perform the same task, especially since I have to deal with eps arcs.

PaulDixon - 29 Sep 2011 - 02:10

Create a chain Fst that represents your input. One arc per token left-to-right. Then compose this with the other Fst. If there is only one valid output sequence you can get the output and weight using fstprint and fstshortestdistance. If there is more than one path you will need to enumerate them (there is old post by Roger Levy with code to do this)

There are probably some similar examples in the tutorials here: http://www.openfst.org/twiki/bin/view/FST/FstBackground

 
Log In

More updated documentation on OpenFst? internals?

AlexZatv - 27 Sep 2011 - 08:13

Are there more fresh papers about OpenFst internals and organization, except "OpenFst: A General and Efficient Weighted Finite-State Transducer LIbrary", 2007, mentioned in FstBackground (in this wiki)? Looks like the library was changed from the state described in that paper.

I want to implement smth like VectorFst, but with states and arcs stored in the lists,and I'm not sure what must be the base class for me - MutableFst? How should I use FstImpl ? What if I don't want to waste memory storing NumInputEpsilons and NumOutputEpsilons in every state? etc.

 
Log In

Customized composition

JuliusGoth - 20 Sep 2011 - 23:26

Wanted to do the following:

FST1 transduces string x to y with weight a FST2 transduces x to y with weight b Composition transduces string x to y with weight (a x b)

Since this isn't the default behavior of Compose(), what is the best way of going about this? Do I need to define a custom matcher and composition filter?

MarkusD - 27 Sep 2011 - 13:04

You could map the input and output symbols to new pairwise symbols. So your FST1 is just an acceptor that accepts the symbol x|y with weight a, for example. FST2 accepts x|y with weight b. (You could use functions from encode.h to do that.) You can just intersect these two acceptors. In the result, you can map the symbol x|y back to input x and output y, so you can use it as a standard finite-state transducer.

MarkusD - 03 Oct 2011 - 14:16

(Of course, the epsilon symbol loses its special status in that case; your FST1 and FST2 must already specify certain fixed alignments between input and output.)
 
Log In

Modifying arc using matcher

HasanAkram - 17 Sep 2011 - 07:55

I am using the following piece of code to modify an arc weight. The problem is I have to iterate all the arcs to find the right arc. I was wondering if there is a
 Matcher 
equivalent as mutable arc, where I could find the mutable arc more efficiently?
 
    for(MutableArcIterator<CustomFST> aiter_r(&fst, red);!aiter_r.Done();aiter_r.Next())
    {
      GallicArc<LogArc, STRING_LEFT> arc_r = aiter_r.Value();
       if(arc_r.ilabel==e_red.ilabel)
      {
         GallicWeight<StdArc::Label, LogWeight, STRING_LEFT> gw_r = GallicWeight<StdArc::Label, LogWeight, STRING_LEFT>(lcp, arc_r.weight.Value2());     
         arc_r.weight = gw_r;
         aiter_r.SetValue(arc_r);
      }
    }

PaulDixon - 25 Sep 2011 - 02:20

If your arcs are input label sorted you could try replacing the linear search in your code with a binary search, this is what a the SorterMarcher is doing. The implementation is in SortedMatcher::Find().

Modifying or using a Matcher might not give you the exact behavior you want because it implements an implicit self loop which is used when the searching for an epsilon label.

 
Log In

Bug in RmEpsilon? and arcs with a weight of Zero() ?

RomainLerallut - 09 Sep 2011 - 11:15

Using RmEpsilon on a StdVectorFst with epsilon:epsilon arcs with a weight of Weight::Zero treats those arcs as if they had a weight of One.

Input FST:

  • 3 states: 0,1, and 2, 0 is start, 2 is final
  • arc 0->1 with label epsilon and a weight of Zero
  • arc 0->1 with label 2 and a weight of 5.67
  • arc 1->2 with label 1 and a weight of 1.234.
(0) --0:0/inf-->(1)--1:1/1.234-->((2))
   \__2:2/5.67__/

Expected result:

(0) --2:2/5.67-->(1)--1:1/1.234-->((2))
   \_____________1:1/inf___________/
Actual result:
(O) --2:2/5.67-->(1)--1:1/1.234-->((2))
  \___________1:1/1.234____________/     <= PROBLEM !!
The infinite weight has not been taken into account when adding the 0->2 arc.

I digged into the code and it may be linked to stale data in ShortestDistanceState::distance_ that is not reset when encountering an arc with infinite weight, but if a more knowledgeable person could take a look that would be great.

I am using an older version of OpenFst (1.1.something, I think) but I didn't see any major difference in the code. FYI, I'm calling directly the C++ functions.

Thanks for any help, Romain

PaulDixon - 12 Sep 2011 - 19:45

Not sure if this helps, but I think the library doesn't consider arcs with Weight:Zero as well formed. fstcompile will not accept them and if you run Verify on this Fst you will get an error message.

RomainLerallut - 14 Sep 2011 - 10:54

Indeed, I hadn't thought of that. We use Weight::Zero in some degenerate cases to preserve the FSTs' topologies but I guess we'll have to lift that requirement and just not add any arc in that case.

Many thanks for the answer, Paul.

MarkusD - 27 Sep 2011 - 13:18

Connect() should get rid of those arcs.
 
Log In

Why doesn't LogArc? work for shortest path algorithm?

JuliusGoth - 05 Sep 2011 - 11:10

I read in a comment from 2010 concerning Log weighs not working (only Tropical is implemented). Are Log weights not right distributive? I would like to use either this or the Real arc implementation for maximum likelihood estimation. Any help is appreciated.

JuliusGoth - 05 Sep 2011 - 11:20

To clarify, I've received the following error when trying to run ShortestPath in code:

FATAL: SingleShortestPath: Weight needs to have the path property and be right distributive: log

It seems this issue was addressed but as I mentioned above, the reason for it doesn't seem clear to me.

PaulDixon - 05 Sep 2011 - 23:01

The log semiring doesn't have the path property (a+b = b or a+b =a) this is why you can't compute the shortest paths. Because the log semiring doesn't have the path property if you were to compute the potential for some state, there might not be a path to that state that has this value.

In the log semiring it is possible to compute the shortest distance instead.

JuliusGoth - 09 Sep 2011 - 09:58

Thanks for clearing that up. After some examination, it makes sense.

To follow up on this: if I used log weights to produce a log probability, What calculation do I use to get the real probability value? I'm aware of logging probabilities to avoid underflow, but usually the procedure is logging, summing, then taking the "anti-log". Is it just that simple? Doesn't seem like I'm getting probabilities less than 1 in this case.

 
Log In

1.1 Version source?

JuliusGoth - 21 Aug 2011 - 09:24

I was trying to integrate the expectation semiring implementation, fstrain, with OpenFST and noticed fstrain had dependencies on an older include file, mains.h, that no longer exists in the 1.2 distribution. Was this file in the 1.1 version? If so, what was the purpose of this file? Is the 1.1 source of OpenFST available anywhere so that I can modify fstrain to work with the 1.2 version? Thanks very much.

PaulDixon - 22 Aug 2011 - 04:49

mains.h was part of the old weight registration mechanism. In version 1.2 the weight registration has changed. See Cyril's post from 19 Nov 2010 for the procedure in version 1.2

You can download older versions from the attachments section of the download page

JuliusGoth - 01 Sep 2011 - 14:19

Thanks. I was also wondering if documentation is available for this version of OpenFST? It doesn't seem to be in the package.

PaulDixon - 05 Sep 2011 - 23:02

You could try running Doxygen on the source directory.
 
Log In

List-based Fst ?

AlexZatv - 04 Aug 2011 - 04:18

Hello! I want to make WFST lattice generation as described in “Efficient General Lattice Generation and Rescoring”. It requires adding many states during decoding, while VectorFst and ConstFst stores all the states in vectors (as far as i understand).

Are there OpenFst implementations which store the states in the list? Are such implementations possible/effective in conjunction with projection and pruning algorithms? Thank you in advance!

 
Log In

StringWeight? and epsilon

HasanAkram - 19 Jul 2011 - 08:20

I have defined a StringWeight as the following:
 StringWeight<StdArc::Label, STRING_LEFT> w1;

when ever i enter an epsilon or 0, using: w1.PushBack(0), the String is an epsilon, no matter how many times I call w1.PushBack(0);. However, when I do it on the right side of another input symbol, for example:

 w1.PushBack(1);
  w1.PushBack(0);

the output is 1_0. My question is, epsilon is the multiplicative identity of String semiring, so no matter how many times i call w1.PushBack(0); the output should be 1 and not 1_0. Can anybody explain me if I am misunderstanding anything here?

 
Log In

FSTs with Predicates

DamienDaspit - 18 Jul 2011 - 03:49

Is it possible for OpenFst to support input and output predicates as opposed to symbols, like what was outlined by Noord and Gerdemann in "Finite State Transducers with Predicates and Identities"? If you do not take in to consideration that new implementations of certain operations would be required, such as determinization and composition, would something like this be possible?

 
Log In

How to populate GallicArc? ?

HasanAkram - 03 Jul 2011 - 08:14

Hi,

I have defined a custom fst using GallicArc and now want to populate the values dynamically. Now, my question is concerning the GallicWeight format. How exactly should I build a gallic weight. E.g, if its a

 StdAcr 
we simply do the following:
StdVectorFst fst;
fst.AddState();  
fst.SetStart(0);  
fst.AddArc(0, StdArc(1, 1, 0.5, 1));  
fst.AddArc(0, StdArc(2, 2, 1.5, 1)); 

How should the same thing be done in case of the following:

 
typedef VectorFst<GallicArc<LogArc, STRING_LEFT>> CustomFST;
void MakeGallicFST()
{

   CustomFST fst;
   fst.AddState();

   GallicWeight<StdArc::Label, LogWeight, STRING_LEFT> gw;

   GallicArc<LogWeight, STRING_LEFT> ga;
}
 

HasanAkram - 03 Jul 2011 - 08:34

To be more precise with my question, what I need is probably the following:
 ga.weight = ??? 
So, the questions are: 1. What should be the format of the weight? 2. I can certainly put a GallicWeight object here, now the question is how to build a GallicWeight. For example, in the following code:
 gw.Read(???); 
What should be the format of the input file? 3. Can we manipulate
 gw.Value1(); 
or
 gw.Value2(); 
? If yes, how?

HasanAkram - 03 Jul 2011 - 11:25

I came up with the following solution, but i am not sure if this is the most elegant way of doing it.

 
void MakeGallicFST()
{

   CustomFST fst;
   fst.AddState();
   fst.SetStart(0);

   fst::StringWeight<StdArc::Label, STRING_LEFT> sw;
   
   sw.PushFront(1);
   sw.PushFront(2);
   sw.PushFront(3);
   sw.PushFront(4);

   LogWeight lw = LogWeight(0.5);

   GallicWeight<StdArc::Label, LogWeight, STRING_LEFT> gw = GallicWeight<StdArc::Label, LogWeight, STRING_LEFT>(sw, lw);

   GallicArc<LogArc, STRING_LEFT> ga;
   
   ga.ilabel = 1;
   ga.olabel = 1;
   ga.nextstate = 0;
   ga.weight = gw;

   fst.AddArc(0, ga);

   fst.AddState();
   ga.ilabel = 1;
   ga.olabel = 1;
   ga.nextstate = 1;
   ga.weight = gw;

   fst.AddArc(0, ga);
}

PaulDixon - 15 Jul 2011 - 07:06

Read() is for reading in binary format. If you need to read text format use the extraction operator >>. The text format for weight in your example would look like this 1_2_2_3,0.5. Also, the text format of the empty string would be Epsilon,0.5 not 0,0.5. regardless of how many times you called sw.PushFront(0);

HasanAkram - 15 Jul 2011 - 09:13

thanks! should something like this work with command line fstcompile if i put the following as text input file:
0 1 a 1_1_1,0.5
1 0 b 2_2_2,0.5
if not, what would be the right format?

PaulDixon - 16 Jul 2011 - 06:37

PaulDixon - 16 Jul 2011 - 06:46

No it won't work. The shell tools can only handle the log or tropical semirings. It is possible to build a shared object that will make the shell tools able to handle a user defined semiring as described here

The below code fails to compile because the GallicWeight does not implement all the required members. If you add the missing members it might work, however, there could be other reasons not to do this.

#include <fst/const-fst.h>
#include <fst/edit-fst.h>
#include <fst/vector-fst.h>
#include <fst/script/register.h>
#include <fst/script/fstscript.h>

using namespace fst;
namespace fst {
namespace script {
  typedef GallicArc<LogArc> LogGallicArc;
  REGISTER_FST(VectorFst, LogGallicArc);
  REGISTER_FST(ConstFst, LogGallicArc);
  REGISTER_FST(EditFst, LogGallicArc);
  REGISTER_FST_CLASSES(LogGallicArc);
  REGISTER_FST_OPERATIONS(LogGallicArc);
}
}

Compile and make sure the .so is on LD_LIBRARY_PATH

g++ -fPIC gallic-arc.cc -o gallic-arc.so -shared -O2

 
Log In

Transducer learning algorithm (e.g. OSTIA) in openfst

HasanAkram - 03 Jul 2011 - 07:26

I was wondering if there is an existing implemented OSTIA algorithm (the transducer learning algorithm by Oncina el al.) using openfst? Or do you know anyone who implemented such thing?

JuliusGoth - 01 Sep 2011 - 15:18

fstrain (Dreyer et al.) exists which uses expectation semiring.
 
Log In

why log/tropical semiring is -log(p), not just log?

AlexZatv - 24 Jun 2011 - 04:39

Sorry for possibly stupid question, but why is it so? Looks like basic algorithms will not change if it will be log(p), may be I overlook something?

AMorenoDaniel - 24 Jun 2011 - 13:41

legitimate question and not stupid at all. I guess some (including myself) like to think in terms of 'costs' rather than 'scores'. Since it makes no difference, it is best to stick to one of them.

AlexZatv - 27 Jun 2011 - 04:05

I am experimenting with wfst decoders, and this thing make my brain exploding:) I mean, writing Weight::One() than I want 0, ">" than I want "<",etc. It's really easier to think in term of costs with wfst, thank you for the tip!
 
Log In

compare weights

AlexZatv - 22 Jun 2011 - 08:26

How can I compare weights? I want to drop some weights which are out of the beam. Now I use (Times(w,Beam).Value() >= (wBest).Value()), for tropical semiring, but I'm sure there is some better way.

CyrilAllauzen - 22 Jun 2011 - 12:21

The "proper" way to compare weights is to use NaturalLess, it is defined in weight.h.
 
Log In

using ToGallicMapper? for alternative transition representation with ProductWeight? or GallicWeight?

HasanAkram - 18 Jun 2011 - 15:32

Hi all,

I am trying to represent the transitions as Q \times (\Sigma \cup {\epsilon}) \times \Delta^* \time K \times Q. To do that I am suing String semiring as output and LogWeights and trying to map them to ProductWeight representation using ToGallicMapper. My code is the following: Map(*fst, &gfst, ToGallicMapper<GallicArc<LogArc, STRING_LEFT>>, STRING_LEFT>()); However, its not working. It seems like the parameter types are not right. Can anybody tell me what am I probably missing here?

CyrilAllauzen - 20 Jun 2011 - 16:53

To convert from LogArc to GallicArc<LogArc, STRING_LEFT> , you need to use ToGallicMapper<LogArc, STRING_LEFT> .

HasanAkram - 22 Jun 2011 - 06:04

Thanks! I am still having some error with my code:
 typedef VectorFst<GallicArc<LogArc, STRING_LEFT>> CustomFST;
  typedef VectorFst<LogArc> LogVectorFst;
  CustomFST gfst;
  StdVectorFst *fst = StdVectorFst::Read("fst.fst");

  Map(fst, &gfst, ToGallicMapper<LogArc, STRING_LEFT>());

the error msg is the following:

 Error   1   error C2040: 'fst' : 'fst::StdVectorFst *' differs in levels of indirection from 'fst::StdVectorFst' 

Is it also possible to push the string weights along the arc?

HasanAkram - 22 Jun 2011 - 06:49

actually the eoor msg should be:
Error   1   error C2664: 'fst::GallicArc<A,S> fst::ToGallicMapper<A,STRING_LEFT>::operator ()(const A &) const' : cannot convert parameter 1 from 'const fst::ArcTpl<W>' to 'const fst::LogArc &' 

CyrilAllauzen - 22 Jun 2011 - 12:19

The problem is that fst is a StdArc Fst. As I said, ToGallicMapper<LogArc, STRING_LEFT> converts a LogArc Fst into a GallicArc<LogArc, STRING_LEFT>.

HasanAkram - 22 Jun 2011 - 14:36

thanks! it worked finally. Now I have a question regarding the internal representation of string semiring. If my output alpabet is {x, y} and the output symbol file is the following:
<eps> 0
x 1
y 2
as far as undersatnd, the internal representation of a label x is 1. However, now i can have output label xx, xyx, xxx etc. How will these labels be internally represented? Is there any normalization function to bring the labels as close to the initial state as possible? Its not only the normalization. The labels should be close to the initial state with the longest common prefix operation in the string semiring.

CyrilAllauzen - 23 Jun 2011 - 16:05

The string weights are represented as a list of integer labels, the sting weight xx will be the list 1,1. The operation you want is pushing toward the initial state (FST.PushDoc), pushing the weights in the gallic semiring. If you are going to the gallic just for that, you could have simply used label pushing instead.

HasanAkram - 23 Jun 2011 - 18:44

The reason I am going gallic is my transitions are Q \times (\Sigma \cup {\epsilon}) \times \Delta^* \time K \times Q. Now the questions are the following: 1. Is that a good reason to use gallic or is there any better way to do that? 2. The operation you mentioned, would it push the gallic weights, i.e., the labels and the log weights? What I want is only to push the output labels. Is there any way to treat the logweight and the output string semiring differently? 3. Would it be possible to give a short example, showing if I define a
 typedef VectorFst<GallicArc<LogArc, STRING_LEFT>> CustomFST; 
how exactly am i supposed to add the arc, e.g., how can you give xx as a parameter? How to define x as a symbol, and not xx. In the example that I am working on, all the output label are
 -1 
when I display them using an iterator. Any hint why is that happening? 4. Another problem is, I cant draw the fst using fstdraw when i map the fst to Gallic. However, it works with StdVectorFst.

CyrilAllauzen - 24 Jun 2011 - 11:15

1. Using the gallic is a good way to do that.

2. Indeed, but it is possible to push only the string part of the weight. Look at fst/push.h when pushing labels only.

3. In your case, you want to set the olabel to be the same as the ilabel. The string in \Delta^* should be part of the weight. Look at the fst/string-weight.h and fst/pair-product.h to see how to build a GallicWeight.

4. The GallicArcs are not registered by default to avoid code bloat. If you want the command line utilities to work you need to create a DSO file as described here. In you case MyArc is GallicArc<LogArc, STRING_LEFT> and my_arc is gallic_log.

HasanAkram - 28 Jun 2011 - 12:06

Thanks Cyril. btw, is there an existing implemented OSTIA algorithm (the transducer learning algorithm by Oncina el al.) using openfst? Or do you know anyone who implemented such thing?
 
Log In

Trying to minimize an automaton leaves the automaton unchanged

ErikAxelson - 15 Jun 2011 - 08:07

Hi all,

I'm trying to minimize the non-minimal automaton

0       1       1       1       200
0       100
1       1       1       1       100
1

and expect to get as a result the minimal automaton

0       0       1       1       100
0       100

However, calling fstminimize produces the same non-minimal automaton. I also get a negative result from fstequivalent if I compare the non-minimal automaton and the equivalent minimal automaton. I tried to modify the non-minimal automaton with

fstpush --push_weights and --to_final

to see if it would produce the correct output for fstminimize or fstequivalence but the results where the same as earlier.

I'm using OpenFst version 1.2.7.

Regards, Erik

ErikAxelson - 15 Jun 2011 - 08:09

Should be

fstpush --push_weights --to_final

without the word "and".

CyrilAllauzen - 20 Jun 2011 - 16:54

These two machines are not equivalent.

ErikAxelson - 21 Jun 2011 - 05:05

Could you give me an example? I think both automata accept the empty string with weight 100, "1" with weight 200, "11" with weight 300, etc.

best regards

CyrilAllauzen - 21 Jun 2011 - 12:34

I'm sorry, I misread your message yesterday. The machines are indeed equivalent. I should have read your post more carefully.

Minimize and Equivalent work by first normalizing the machine(s) using pushing and then applying the unweighted minimization/equivalence algorithms.

The problem is that this is a case where for the output of fstpush --push_weights to have the required normalized property (the ⊕-sum of the outgoing transition weight and final weight at each state is 1) would require the use of initial weight, which the library does not support. This leads to the creation of an additional state to represent the initial weight. Hence in both machines, calling minimize lead to a machine with 2 states (instead of 1). We should update the documentation to make this clear.

The fact that equivalent returns that the machines are not equivalent is a bug due to the way we deal with the initial weight. Since the first machine is acyclic at the initial state, the initial weight is added to all the outgoing transition of the initial state. The second machine is cyclic at the initial state, so we cannot do the same thing. Instead a new initial state is created with an epsilon transition to the old initial state with weight the initial weight. Equivalent then treats this epsilon transition as a regular symbol and conclude that the machine are not equivalent. We will fix this bug in the next release of the library.

Best, Cyril

 
Log In

Openkernel question

KonstantinSelivanov - 12 Jun 2011 - 10:02

Hi,

I'm trying to use openkernel for my task. I managed to compile n-gram based kernel and calculate svm model. Prediction tests are fine.

But I can't really understand how can I use the kernel and svm model to predict my working set. My intuition is that for each string I have to compile an automaton and use it as input to svm predict function. But that function expects an svd vector.

So my question is how can I use openkernel (and svm) tools to predict a class of the single string (that is not in the training/test corpus).

Thanks! Kostya.

 
Log In

SegFault?

GellingD - 31 May 2011 - 13:20

Hi,

I've built a wFST from a language model using the commandline tools. Now, I load this model in my c++ program, and want to compose it with some other FST, that I've just manually created in c++ using the same symboltable.

However, when sorting the FST that I created in advance, a segmentation fault occurs. The same thing happens, if I just compose the two FSTs without sorting the loaded FST. Counting the amount of states in the FST is no problem, though, neither is sorting the manually created FST.

Strangely, when doing something similar with the commandline tools, no segfault occurs either, though I do not sort the FSTs then. Any idea where I might look to find the cause of this segfault?

example code:


ArcSort(&fst, StdOLabelCompare()); // sort locally created StdVectorFst
cout << "number of states in model: " << (*model).NumStates() << endl; // runs fine
ArcSort(model, StdILabelCompare()); // generates segfault
cout << "check" << endl; // doesn't get printed

StdVectorFst result; // create local FST to store result
Compose(fst, *model, &result); // if second ArcSort is not performed, this generates segfault

CyrilAllauzen - 20 Jun 2011 - 16:56

Hi,

I thing you should try to run Verify(*model) to make sure the model Fst is well-formed.

 
Log In

What is Segmentation fault?

JoachimChau - 26 May 2011 - 04:09

When I run the following command: fstencode --encode_labels model.fst encoder|fstdeterminize -|fstminimize -|fstencode --decode - encoder model_min.fst The model.fst is about 1.3G, after about 10min, the shell command exits and says: Segmentation fault

JoachimChau - 26 May 2011 - 11:06

I run it step by step, the error occurred in the encoding step after minimized.

JoachimChau - 26 May 2011 - 17:02

Hi, all
I tried to minimize the determinized Fst got in the fisrt step, in shell, it didnot stop for hours or just got a segmentation fault in less than 30min; in VS2008 with C++, it didnot stop. The Fst is about 6.1G and converage over ShortestDistence. My C++ code is as follows, did I got something wrong?
   StdVectorFst *ifst = StdVectorFst::Read(iPath);
   EncodeMapper<StdArc> encoder(kEncodeLabels|kEncodeWeights, ENCODE);
   Encode(ifst, &encoder);
   Minimize(ifst);
   Decode(ifst, encoder);
   ifst->Write(oPath);

Any help appreciated

JoachimChau - 30 May 2011 - 11:28

At last I got the result by runing the C++ code to minimize the determinized wfst. But if I run the Determinize and Minimize togther, it never stops(the determinize step stops in about 40min at linux shell, also the minimize step, but gets "Segmentation fault" when decoding). I don't know why.

PaulDixon - 02 Jun 2011 - 19:49

What optimization flags are using when you compile the C++ code?

Have you tried compiling the Decode tool (or a simple standalone decoder) with debugging information and running the process in a debugger?

 
Log In

wFSTs as language model

GellingD - 24 May 2011 - 13:42

Hi,

I built a small unigram lm using the openFST commandline tools. it seems to be working fine for tokenizing sentences, but I'm wondering if composing could fail. Would it just return an empty fst? I want to use the models in c++ code, but the Compose function doesn't seem to have any error flags or the like.

The reason I'm asking, is that I want the model to be able to fail certain sentences, as they cannot form a valid english sentence (right now it still accepts everything, due to the inclusion of single-character words, but I will change this)

Any help appreciated

PaulDixon - 24 May 2011 - 19:37

It depends on the type of the composition you are performing and the options set in ComposeOptions.

If you use the Compose method and leave the ComposeOptions.as default, it the composition is not able to produce a valid output fst with no dead-end states the output fst should have NumStates() = 0 and Start() = -1

If you the ComposeFst or the Compose method with connect set to false. You Fst may have dead-end states. In this case you will need to look for paths that have final states or run the Connect algorithm and see if the WFST has NumStates() = 0 and Start() = -1.

 
Log In

fst determinization and minimization problem

MabS - 13 May 2011 - 11:42

Hi All,

I am trying to determize and minimize the non-functional transducer as: fstencode --encode_labels G.fst enc.dat G.enc.fst fstdeterminize G.enc.fst G.det.fst fstencode --decode G.det.fst enc.dat G1.fst fstminimize G1.fst GM.fst

where G.fst is the language model text fst

But fstdeterminize takes a very long time and does not complete.

Reply is highly appreciated. Thank you. Cheers Mabs

 
Log In

rho and sigma in openfst

BlueSky - 22 Mar 2011 - 12:58

Hi Guys, Does anyone know how to represent the rho and sigm in the openfst(python openfst). for example for epsilon it is "openfst.epsilon" I am trying to find the same for "rho" Cheers,

 
Log In

Cache usage in delayed composition of cached ComposeFsts?

MaciekSk - 13 Mar 2011 - 09:07

Hi Everyone, in my program I create one instance of a delayed composition of two large Fsts: (don't mind my pseudo C++/python language)

opts.gc_limit = 4096*2^20 delayedWithCacheFst = ComposeFst(fst1,fst2,opts)

Next, I use delayedWithCacheFst many times in composition with other different fsts:

for fst3 in manyCasesOfFst3: opts.gc_limit=0 finalComposedFst = ComposeFst(fst3, delayedWithCacheFst, opts) ShortestPahts(finalComposedFst, outFst, npaths, sp_opts) delete finalComposedFst

The behavior I hoped for is, that the cache of delayedWithCacheFst is reused in all further compositions, and speeds up the ShorthestPath alg. (since fst3 are different but from a set of similar problems).

The construction of all finalComposedFst should be fast (in this composition I use input matcher from delayed fst, the gc_limit is set to 0).

However, the behavior I observe suggests that with each loop revolution the cache of delayedWithCacheFst is copied and deleted. Is there any clear reason for that?

Both, the composition "ComposeFst(fst3,delayedWithCacheFst,...)" as well as freeing the memory "delete finalComposedFst" take seconds to run. It seems that the cache is created, used and deleted for each fst3 separately.

Why is this the case? How to avoid that?

I'm using OpenFst Version 1.2.6. I also use SWIG interface for many C++ classes, I hope this is not the reason for my problems.

MaciekSk - 13 Mar 2011 - 14:49

ooops, sorry for the mess with newlines.

I did some experiments and I think I almost solved the problem. This has something to do with InputMatchers reusage.

Is it the case that composition cache is kept in the input matcher?

 
Log In

how to use compose to reverse an fst?

CordeliaZhang - 11 Mar 2011 - 20:17

Hi, is there a way to use compose to reverse an FST? That is, design b.fst such that c.fst is the reverse of a.fst after the operation of compose: fstcompose a.fst b.fst >c.fst Thanks, Cordelia

 
Log In

Converting a grammar to an wfst

ChristopherStanley? - 10 Mar 2011 - 19:01

Hi, I was wondering if there is some kind of openfst library support for converting a grammar, like JFGS grammars to an FST? Or maybe this is not even possible? I'm pretty new to openfst and finite state stuff in general, at the moment I'm working on my bachelor's degree. Actually I read that the JSGF grammar is not just regular but includes some 'context-free' grammar features. But the finite state machines are regular, right? So i guess this means that there are some things that the JSGF can do that we cannot transform into an FST? I guess that is kind of rambling, but I heard that there was something like this and wanted to find out for sure.

MaciekSk - 14 Mar 2011 - 17:02

Hi Chirstopher, OpenFST is a library of templates and interfaces, so nothing prevents you from implementing some kind of pushdown automaton (~= context-free grammar) behind the typical FST interface. As long as you don't try to Expand all of its (infinite number of) states, everything should be ok.

Now, afaik, translating grammars to OpenFST is not supported, but it shouldn't be too difficult with some simple parser, like Python PLY+SWIG, as long as the grammar is in some normal form. Example, if the grammar is in http://en.wikipedia.org/wiki/Greibach_normal_form then you could easily implement 1-1 correspondence between states and non-terminals, with additional rules that push further non-terminals from each transition on the stack, and whenever an epsilon trans. is met, the current state is replaced by the one popped from the stack, and edges follow from there.

MaciekSykulski - 16 Mar 2011 - 07:10

...acttually a type of PDT is already implemented <fst/extensions/pdt/pdtlib.h>. http://openfst.cs.nyu.edu/doxygen/fst/html/pdtlib_8h_source.html

AlexZatv - 31 May 2011 - 12:44

Looks like it is implemented in transducersaurus.

AlexZatv - 31 May 2011 - 12:44

Looks like it is implemented in transducersaurus.

AlexZatv - 31 May 2011 - 12:44

Looks like it is implemented in transducersaurus http://code.google.com/p/transducersaurus/

AlexZatv - 31 May 2011 - 12:45

sorry for duplication of the message:)
 
Log In

Epsilon Normalization and its effect on Weight Pushing

EdWhittaker - 10 Mar 2011 - 09:53

Hi, I have a couple of large-ish, stochastic, integrated (CLGT) WFSTs. All operations have been performed in the log semi-ring. When I weight push and minimize these WFSTs after the final determinization it works fine e.g.:

fstpush --push_weights detCLG | fstminimize - MindetCLG

completes successfully in a couple of minutes.

If I try a similar operation, but after performing epsilon normalization e.g.

fstpush --push_weights EpsdetCLG | fstminimize - MinEpsdetCLG

sometimes it's pushable, sometimes it's not, and by not I mean that it takes more than a few days not to complete before I kill it.

(If I do epsilon normalization after weight pushing and minimization this invariably works but I'm not convinced the resulting WFST is stochastic, based on the above observation.)

The only difference between pushable and not was the original LM I used to construct G, but I think this difference is spurious.

So my first question is: is there any reason why epsilon normalization might sometimes render a stochastic (or at least pushable) WFST un-pushable? Looking at the function description I don't see that this should be the case.

My second question is: given that epsilon normalization is obviously useful, in terms of reducing the size of the final WFST, is it then better/safer/more reliable to run it as the final operation after weight pushing or can this negate the putative benefits of weight pushing for decoding?

EdWhittaker - 17 Apr 2011 - 08:44

Eventually solved this by tweaking the LM parameters slightly. Now all networks can be pushed reliably in the log semiring irrespective of the fst operations applied or their order of application.

P.S. The weight push in the commands above is redundant (as others have pointed out) since fstminimize appears to implicitly calls this prior to performing minimization on the encoded automaton.

 
Log In

fstdeterminize vs. Determinize()

JosefNovak - 09 Mar 2011 - 01:29

Hi, I've noticed in some cases it seems that fstdeterminize (the command line tool) is significantly faster than running Determinize() in a C++ program on the same input. Of course the result is exactly the same, but Determinize() seems to take noticeably longer. I'm curious as to why this might be the case? Are there perhaps some default properties that are set in the fstdeterminize command line tool, that I should setup when I use Determinize()?

JosefNovak - 09 Mar 2011 - 03:58

Nevermind, a look at fstdeterminize.cc cleared things up.
 
Log In

Determinizing cascade for ASR

NikolayShmyrev - 05 Mar 2011 - 11:26

Hi all

The following issue. I'm using transducersaurus scripts to build cascade. code.google.com/p/transducersaurus/

It builds C*det(L*G) and doesn't determinize the output. There are even duplicated transitions. I drop duplicated transitions from the result and it seems to decode well.

Now, following the Paul's tutorial

http://www.furui.cs.titech.ac.jp/~dixonp/pubs/apsipa_09_tutorial_dixon_furui.pdf

I'm trying to determinize the cascade to get det(C*det(L*G)) and indeed it determinized properly. I only had to change scripts to keep input aux symbols which are mapped to epsilon by default scripts. However, decoding with the new determinized cascade is worse. It's slower and less accurate.

I see that with determinized decoding there are more states reached from single one by epsilon-in transitions. It's significantly better with non-deterministic cascade. I suppose this because determinization moved the labels in cascade around.

I wonder if there is some issue with determinization or I just need more clever pruning in the decoder to get results comparable to non-determinized cascade?

NikolayShmyrev - 05 Mar 2011 - 19:53

Interesting that if I do last det over standard semiring, not over log, it gets better.

PaulDixon - 07 Mar 2011 - 06:59

You mean det(Cdet(LG)) with the final determinization in the tropical semiring gives better ASR characteristics than Cdet(LG) all performed in log semiring?

NikolayShmyrev - 07 Mar 2011 - 20:23

Hi Paul

Yes, that's what I mean. But I'm more interested why it doesn't improve over Cdet(LG) without final determinization

NikolayShmyrev - 07 Mar 2011 - 20:25

I'm also interested why transducersaurus doesn't normalize Cdet(LG). It doesn't seem like juicer will do any online normalization during decoding.

JosefNovak - 09 Mar 2011 - 00:43

Hi, Are you referring to 'epsilonnormalization'? I don't think this really makes much difference. At the moment, the python scripts in the project are designed to be the absolute simplest you can get away with, and still build a high performance LVCSR cascade (this is also why the default C transducer is just generating input epsilons on the explicit aux arcs - otherwise we have to remove them afterwards). They are mainly intended as learning tools. I'm planning the C++ build tools to be much more flexible.

I'd like to also point out that the absolute accuracy really shouldn't drop as a result of those optimization commands, it should just shift the accuracy versus real-time factor curves one way or the other (i.e. it requires a higher or lower beam to achieve the same accuracy, and the search is slower).

I'll try and generate some comparison curves for Juicer later in the week to see whether there are still issues there.

Also, I don't know if these questions are directly relevant to OpenFST though, maybe they would best just be directed to the project?

NikolayShmyrev - 09 Mar 2011 - 19:57

Hi Josef

> Are you referring to 'epsilonnormalization'

Sorry, that was a typo. I meant final determinization as described in Paul's tutorial. I understand it's a simplies way so as long as it works it's ok. But don't you remove duplicate edges before decoding?

>I'd like to also point out that the absolute accuracy really shouldn't drop as a result of those optimization commands

Maybe, it shifted to some unexplorable area in my case > 20xRT smile

>I'll try and generate some comparison curves for Juicer later in the week to see whether there are still issues there.

That would be interesting to look on, thanks a lot.

> Also, I don't know if these questions are directly relevant to OpenFST though, maybe they would best just be directed to the project?

I would love to discuss it in other place. Honest this forum is very uncomfortable, no notification, no edits. But is there any more suitable discussion place for transducersaurus?

JosefNovak - 10 Mar 2011 - 19:31

i made a google group which ive listed on the project web page. i doubt it will get much activity, but it is probably a more appropriate place for discussing questions specific to the project. hopefully we can build something nice that will foster further learning and be useful to others.
 
Log In

Bash Completion Script

PaulDixon - 28 Feb 2011 - 06:57

I uploaded a bash completion script to the contrib section. The script will add tab completion support for the flags of the OpenFst commands.

http://openfst.cs.nyu.edu/twiki/bin/view/Contrib/OpenFstBashComp

 
Log In

Building OpenFST? with unicode support on Mac OSX 10.6.

JiaPu - 25 Feb 2011 - 17:51

Hello,

When I run "./configure --with-icu" on Mac OS 10.6.6, I got following error messages: * The icu-config script could not be found. Make sure it is * in your path, and that taglib is properly installed. * Or see http://ibm.com/software/globalization/icu/ configure: error: in `/Volumes/Data/private_code/openfst-1.2.7': configure: error: --with-icu was given, but ICU Library v. 4.2 or newer not found

After a little poking around, I found that the configure script is looking for a "icu-config" file. Does anyone know if there's such file on Mac OS 10.6, and if yes, its location?

Thanks. jia

JiaPu - 25 Feb 2011 - 18:14

Never mind. Just realized that this is a well-known issue. ICU is shipped without header on mac.
 
Log In

FST Minimization

BlueSky - 08 Feb 2011 - 05:09

Hi All,

I would like to ask about 2 things. First does anyone know a method in openfst which prints the fst with the characters not ascii ?

I have a problem with the minimization function, i created an fst with 60,000 words. The fst is created and determinized, but then the minimize function does not finish??

I appreciate the help :), Cheers Meme

DoganCan - 10 Feb 2011 - 18:53

Hey,

1. I believe if you add the --with-icu flag during installation, you get unicode support. (Well, assuming you have ICU installed on your system, etc.)

2. You need to be more specific about the fst you are trying to minimize. Try to give a toy example for us to understand what it looks like.

Best, Dogan

BlueSky - 11 Feb 2011 - 03:56

Thanks Dogan, I am representing each word in the fst by: for example: "car", each transition contains: c:c/0.0 a:a/0.0 r:r/0.0 all words end in a final state. cheers, Meme

NikolayShmyrev - 22 Feb 2011 - 06:40

Hi I also have such issue combining lexicon and n-gram model. Here is the FST you can try

http://www.mediafire.com/download.php?8xbknah20xcu4dq

It's relatively small but it's minimization hangs.

PaulDixon - 22 Feb 2011 - 20:52

Hi,

When you run minimization it will first weight push the machine and in some semirings this might not terminate (See Mohri 2002 Semiring frameworks and algorithms for shortest-distance problem. Michael Riley pointed this issue out to me).

You can verify this by converting to tropical semiring and see that it terminates when running.

fstprint lg.bfsm | fstcompile | fstminimize | fstinfo

But this will give poor ASR performance. If you really want to minimize the machine another solution is to first the encode weights and labels and then run minimization.

NikolayShmyrev - 24 Feb 2011 - 07:07

Yes, this way it works. Thanks for the explanation.
 
Log In

how to use fst to input and output?

MengyunKong - 04 Feb 2011 - 19:57

Hi all,

After fstcompile and some other operations, say I now have a.fst. How can I "input some word into" the transducer and "get the output string" that I want?

(using python / shell)

Thanks.

DoganCan - 10 Feb 2011 - 18:59

Hey,

I believe you are looking for fstcompose.

fstcompile a.txt > a.fst
fstcompile b.txt > b.fst (this is the fst representing the "some input word" you are referring to)
fstcompose b.fst a.fst > c.fst (this is the fst representing the "output string" you are looking for)
fstprint c.fst

Best, Dogan

MengyunKong - 12 Feb 2011 - 20:01

Yes, I've figured it out. Thanks:)

HasanAkram - 24 Jun 2011 - 07:31

Could you please who an example, how to to in in C++ code?
 
Log In

Memory leaks after deleting transducers

JosepMCrego - 03 Feb 2011 - 06:39

Hello,
Did anyone experience memory leaks after using 'delete' to delete a transducer created through 'Read(string)'? An example:

#include <fst/fstlib.h>
int main(int argc, char** argv){
  string lm=argv[1];
  fst::StdVectorFst *fstlm=fst::StdVectorFst::Read(lm);
  delete fstlm;
  cout << "Done!" << endl;
  while (true) {}
  return 0;
}

When reading a large (~1Gb) transducer, the delete operation produces memory leaks that I noticed using 'top'.
I'm using openfst-1.2.5 on a 64bits linux machine.
Thanks

CyrilAllauzen - 03 Feb 2011 - 18:15

Hi,

I don't think there is a memory leak in VectorFst (we do run memory leak checks) and nothing in your example proves otherwise. Who knows how freed memory is reclaimed by the system and what top is reporting? You might want to run a memory leak checker (such as valgrind) if you still have doubts.

#include <fst/fstlib.h>
#define N 2
int main(int argc, char** argv){
  string lm=argv[1];
  fst::StdVectorFst *fstlm;
  for (int i = 0; i < N; i++) {
    fstlm=fst::StdVectorFst::Read(lm);
    delete fstlm;
    cout << "Done!" << endl;
  }
  while (true) {}
  return 0;
}

On my 64-bit linux machine, top reports much more memory used during the while(true) {} loop for N=1 than for N=2.

Best, Cyril

 
Log In

Has anyone used OpenFST? for morphological analysis?

RandyWest - 02 Feb 2011 - 16:57

Hi all,

I have a task similar to the following that I'd like to use OpenFST for: From some input word w, I would like to create an FST with output O corresponding to the result of a thesaurus lookup for w. The difficulty is that I would like all o in O to be of the same word form as w. I have no a priori knowledge of w's form other than the word itself, e.g., I cannot distinguish between the noun "home" and the verb "home," but when I see "homed," I would like to only output past tense verb synonyms. Likewise, I would like to use this type of word form analysis to cull any entries in the thesaurus whose possible forms do not match any of those for the input word.

I am considering using the CELEX database for this analysis, but it is as yet unclear to me how feasible that would be. Has anyone in the OpenFST user community performed similar analysis and could point me in the right direction?

Thanks very much in advance.

Best, Randy

KonstantinSelivanov - 03 Feb 2011 - 06:35

I've used OpenFST for morphological analysis of Russian. Here is example http://delinguis.com/morph Here is a shallow description http://delinguis.com/art/morph

But I didn't catch the problem you are solving. You don't have to disambiguate on the morphological level. Your analyzer's task is to produce all possible morph labels for a given word.

 
Log In

On the fly composition

GavrilovichSimonov - 29 Dec 2010 - 09:39

Hi, I use OpenFst to build an OCR system In the workflow, I compose two FSTs (one representing the optical model, another the observations) and get the shortest path (best recognized word).

The whole process is quite slow (10s) and I notice that the composition took about 5s.

I try to use on the fly composition. But then the composition is very fast, but the shortest path become even slower : 19.63s !! I use the default options.

Did I miss something ? How can I optimize my workflow ?

Thanks, Sergei

KonstantinSelivanov - 25 Jan 2011 - 07:29

I think you have to use 'NaturalShortestFirstQueue' queue discipline. Otherwise your automaton would be expanded.

You can try something like this:

// suppose you have StdVectorFst fst_a, fst_b

StdVectorFst fst_c; // to store result vector<StdArc::Weight> distance; int nshortest = 1;

NaturalShortestFirstQueue<StdArc::StateId, StdArc::Weight> state_queue(distance);

ShortestPathOptions<StdArc, NaturalShortestFirstQueue<StdArc::StateId, StdArc::Weight>, AnyArcFilter > opts(&state_queue, AnyArcFilter(), nshortest);

ComposeFst cfst(fst_a, fst_b); opts.first_path = true; ShortestPath(cfst, &fst_c, &distance, opts);

 
Log In

About FST recognition and transduction algorithm

JoachimChau - 29 Dec 2010 - 04:10

Hi,
I need to do some text normalization work, eg, transforms "a 10% rise" to "a ten precent rise", I have two problems:
1) Mark the start and end: Given the sentence (eg. "a 10% rise") and a FST (eg. FST recognizes percentage), how can I get the sub-sentence (mark the start and end in the original sentence)the FST accepts?
2) transduction algorithm: Given an input string, provide the output string(s) as defined by the regular relation provided by an FST.

Thanks,
Joachim

 
Log In

LoG? composition eps normalization runs out of memory

JagadeeshB - 29 Dec 2010 - 02:43

Hi, In the SLT 2010 slides, it is mentioned (slide #49) that standard composition for the Broadcast News task requires a RAM of 5.3GB. I am trying a simple det(LoG) experiment with a 2-gram language model (ngram 1=36345, ngram 2=730914) and a dictionary L with 25K words.

The grammar FST has : # of states 23558 # of arcs 600034

The lexicon FST has : # of states 183757 # of arcs 210297

I am doing fstarcsort and fstdeterminize on G and only fstclosure on L.

When I do fstcompose L.fst G.fst |fstepsnormalize|fstdeterminize, the fstepsnormalize step eatsup 16GB of memory and very quickly and it just fails after sometime. When I reduce the dictionary size by about half, everything runs successfully. Is there a reason why I am using up so much memory compared what is given in the tutorial for what seems to be bigger task ? Please point me to anything I might be doing wrong here.

Thanks, Jagadeesh

PaulDixon - 09 Jan 2011 - 02:52

Did you label the backoff arcs in G with epsilons or an axillary symbol? If you used epsilons this could cause epsnormalize to blow up.

Did you try omitting the epsilon normalization step?

Also you will probably find the composition of L and G is faster if you sort the output of L instead of (or as well as) the input of G.

JosefRNovak - 18 Feb 2011 - 06:45

As Paul mentions there is probably no need to perform epsilon normalization at any point in your build process. Also, assuming you are using a standard ARPA→WFST transformation for your G, you should not need to determinize it prior to composition with your lexicon.
 
Log In

N-gram model computation

KonstantinSelivanov - 19 Dec 2010 - 10:17

Hi,

is it possible to compute fst n-gram representation using openfst (or openkernel)?

It's not so hard to compute n-gram model using openfst without smoothing. But I need a smoothing technique so it's a problem for me.

Thanks.

JosefRNovak - 18 Feb 2011 - 02:26

Why don't you just use your favorite LM training module and then convert the result to a WFST?
 
Log In

Thread safety

MarkusD - 01 Dec 2010 - 23:47

If you want to run OpenFst functions in parallel using multiple threads, you should make the following small code change in fst/lock.h to make it thread-safe and prevent crashes:

  //int Incr() const { return ++count_; }                                                                                                  
  //int Decr() const {  return --count_; }                                                                                                 
  int Incr() const { return __sync_add_and_fetch(&count_, 1); }
  int Decr() const { return __sync_sub_and_fetch(&count_, 1); }

These functions are available in GCC 4.1 or higher. Otherwise use mutex locks there.

Just wanted to share this info, in case other people needs threads too. Mike tells me something like that is used internally at Google but it's not included in the tarball for better code portability.

JosefNovak - 14 Jun 2011 - 18:19

this was a big help, thanks!
 
Log In

Shell command: Print all words / create FST from word list

GeorgJaehnig - 01 Dec 2010 - 14:06

Hi, is there a shell command that prints all words recognized by an FST?

And is there an easy way on the shell for the other way around: To create an FST from a list of words?

KonstantinSelivanov - 19 Dec 2010 - 10:33

I don't know any shell commands for that. But you can use depth-first search to collect a labels along the paths.

Something like the following (for acyclic only!): ... vector traverse(StdVectorFst& fst, int state, string str=string()) { vector strs; if (fst.Final(state) = StdArc::Weight::Zero()) strs.push_back(str); for (ArcIterator > aiter(fst, state); aiter.Done(); aiter.Next()) { const StdArc &arc = aiter.Value(); vector r = traverse(fst, arc.nextstate, str + st->Find(arc.olabel)); vector::iterator i; for(i=r.begin(); i!=r.end(); i++) strs.push_back(*i); } return strs; } ...

And the second question. I don't know easy way either. But you can construct it "by hand".

GeorgJaehnig - 28 Apr 2011 - 09:48

It seems problem number one is solved: someone started openfst-tools: http://code.google.com/p/openfst-tools/ containing the command "fstprintpaths"

JosefRNovak - 12 May 2011 - 02:48

I made some modifications to this printpaths class a while back and included it in another small OS project. It returns a vector of the paths rather than just printing them out directly, and makes it a bit easier to filter unwanted tokens from the output. In case you're still looking around:

http://code.google.com/p/phonetisaurus/source/browse/

see FstPathFinder.cpp and FstPathFinder.hpp .

 
Log In

Edit Distance- compose 3 fst

BlueSky - 23 Nov 2010 - 12:55

Please I need support, I am trying to compose 3 FSTs (fst-string1, edit-distance fst, fst-string2). Similar in the paper"Linear Space Computation of the Edit Distance between a String and a Finite Automaton"by Allauzen, Mohri. I am getting many arcs with 0:0/1 in the composed FST which should not appear. Cheers, Memo

CyrilAllauzen - 24 Nov 2010 - 11:01

Could you call fstinfo your_fst | grep epsilons on each of your 3 Fsts and post the output below (between verbatim tags preferably)?

Cyril

BlueSky - 25 Nov 2010 - 07:15

I appreciate your support, Best Regards

Begin fst1 fstinfo fst-in.fst | grep epsilons # of input/output epsilons 0 # of input epsilons 0 # of output epsilons 0 input/output epsilons n input epsilons n output epsilons n End fst1

Begin fst2 fstinfo ed-loop.fst | grep epsilons # of input/output epsilons 0 # of input epsilons 3 # of output epsilons 3 input/output epsilons n input epsilons y output epsilons y End fst2

Begin fst3 fstinfo fst-corr.fst | grep epsilons # of input/output epsilons 0 # of input epsilons 0 # of output epsilons 0 input/output epsilons n input epsilons n output epsilons n End fst3

Begin fst-composition-3fsts fstinfo fst_ed_comp2.fst | grep epsilons # of input/output epsilons 8 # of input epsilons 8 # of output epsilons 8 input/output epsilons y input epsilons y output epsilons y End fst-composition-3fsts

CyrilAllauzen - 29 Nov 2010 - 13:48

Everything seems fine. Could you give me the exact sequence of operations used to generate fst_ed_comp2.fst ?

BlueSky - 30 Nov 2010 - 10:10

Thanks a lot! step 1) compose(fst1,fst2) ArcSortOutput(fst1) ArcSortInput(fst2) Compose(fst1,fst2,fst_ed_comp1) ProjectOutput(fst_ed_comp1) return fst_ed_comp1

step 2)compose(fst_ed_comp1,fst3) repeat the same sequence of operations return fst_ed_comp2

CyrilAllauzen - 30 Nov 2010 - 10:56

I think this explains everything. You project the results of composition on their output. This turns transitions of the form a:epsilon corresponding to deletions into input/output epsilon transitions.
 
Log In

When is it "safe" to call Synchronize() ?

KennethRBeesley - 19 Nov 2010 - 17:11

Can you test an Fst to determine if it's safe to call Synchronize() on it? E.g. can you always call Synchronize() safely on an acylic Fst?

CyrilAllauzen - 22 Nov 2010 - 13:17

You can always synchronize an acyclic an acyclic transducer.

As mentionned in FST.SynchronizeDoc, the condition for Synchronize to terminate is that the transducer has bounded delay, which is always the case for acyclic (the delay being at most the number of states in that case).

In the cyclic case, the bounded delay condition is equivalent to the condition that the delay of every cycle has to be zero, i.e. for every cycle c, the length of the input of c is equal to the length of the output of c.

BlueSky - 08 Feb 2011 - 05:06

Hi All,

I would like to ask about 2 things. First does anyone know a method in openfst which prints the fst with the characters not ascii ?

I have a problem with the minimization function, i created an fst with 60,000 words. The fst is created and determinized, but then the minimize function does not finish??

I appreciate the help :), Cheers Meme

 
Log In

Unable to minimize already determinized transducer

DanielBolanos - 16 Nov 2010 - 12:20

Hello,

I determinized successfully my transducer (HC o L o G) by first transforming it to an acceptor, and then tried to minimize it, however the minimization process never ends (I stop it after about an hour, note that the determinization of this transducer takes about a second). The transducer is not that big, about 50k states and 95k transitions. It is a weighted and cyclic acceptor, does anybody know what is going on?

thanks so much for your help

CyrilAllauzen - 17 Nov 2010 - 11:39

Hi,

Which version of the library are you using? There was a hashing bug in older versions that could lead to some inefficiency in Minimize.
Which semiring is the automaton over? Does fstshortestdistance terminate on this machine?

Best, Cyril

DanielBolanos - 17 Nov 2010 - 17:26

Hi Cyril,

I'm using version: 1.2.4, should I try the latest one? The automaton is over the log semiring (I specify --arc_type=log on fstcompile, which I believe is the way to do it) No, fstshortestdistance does not finish, and fstpush does not finish either.

thanks so much for your help Cyril, you are providing a great service

Daniel

CyrilAllauzen - 17 Nov 2010 - 19:39

Hi Daniel,

The issue is that the shortest distance does not converge in the log semiring when the automaton is not stochastic. You need to convert you machine to the tropical semiring, apply minimization and then convert back to the log.

Cyril

DanielBolanos - 18 Nov 2010 - 12:55

Thanks for your help Cyril, I moved to the tropical semiring (--arc_type=standard) and ran fstshortestdistance, but it still does not finish (if I run it on verbose mode it prints lines on the form: "INFO: AutoQueue: SCC #XXX: using trivial discipline", where XXX is each of the strongly connected components in the machine). fstdeterminize does not finish either.

thanks so much for your help again

Daniel

DanielBolanos - 18 Nov 2010 - 17:45

Cyril, I found the problem, it was a silly mistake, I had negative weights. I replaced them by positive weights and everything works fine.

thanks so much for your help, this is a great library

Daniel

 
Log In

Pre-Determinization Algorithm

HasimSak - 16 Nov 2010 - 12:09

Hi,

Does anybody know if there is an implementation (in openfst or other library) of the pre-determinization algorithm described in the paper "An efficient pre-determinization algorithm" by Allauzen and Mohri? I would really appreciate if someone has already implemented this algorithm and can share with us.

Please note that "fsmdeterminize -l" and "fstdeterminize --subsequential_label" don't implement this.

Thanks

CyrilAllauzen - 17 Nov 2010 - 11:20

I don't know of any available implementation of this algorithm.

Best, Cyril

 
Log In

FST Determinization Question

HasimSak - 14 Nov 2010 - 16:19

Hi,

Does fstdeterminize expect the input transducer to be epsilon normalized? When I give a specific functional transducer (not epsilon normalized), fstdeterminize terminates but the resulting transducer is not deterministic (due to two input epsilon arcs leaving the same state). When I determinize it twice, the result is then deterministic. An example functional fst is:

0 1 a x
0 2 a y
1 3 b eps
2 3 c z
2 3 d w
3 4 eps a
4 5 s eps
3
5

The resulting transducer after determinization is:

0       1       a       eps
1       2       b       x
1       3       c       y
1       4       d       y
2       5       eps   a
2
3       6       eps   z
3       7       eps   z
4       6       eps   w
4       7       eps   w
5       8       s       eps
6       8       s       a
7
8
Specifically in this case, I don't understand why the determinization algorithm creates an extra final state (7), while it can just make the state 6 final. Is this an expected behaviour?

Thanks

CyrilAllauzen - 17 Nov 2010 - 11:16

This is the expected behavior. Determinization of transducers is implemented as determinization of weighted automata over the string semiring. This lead to an automaton over the string semiring where 3 is final with final weight "z". When converting this automaton into a transducer leads to the creation of the state 7 and the transition from 3 to 7 with input epsilon and output z. You can use the --subsequential_label option to specify an input label other than epsilon for the transition form 3 to 7 leading to a machine that is indeed deterministic.

In your example, it would indeed be enough to make 6 final but in general this is difficult to determine since there might be other paths to 6 that would be affected by making 6 final.

If you do not want to use the --subsequential_label, an other solution to obtain a deterministic transducer would be to subsequently apply to the output of determinization: Encode (labels), Determinize, Decode.

Best, Cyril

HasimSak - 17 Nov 2010 - 15:23

I see, thank you very much Cyril.

Hasim

 
Log In

compiler warning comparison signed and unsigned

KeithPonting - 12 Nov 2010 - 06:01

Hi,
Thankyou for making OpenFst available - I have happily downloaded and built it, but using from my own code with gcc (4.3.2 or 4.1.2) and -Wall (which is our standard) I get many warnings about signed/unsigned comparisons. I know I can eliminate these with an extra -Wno-sign-compare switch, but I prefer code to compile as cleanly as possible (and need it to do so in order to persuade others in the group to adopt the library). Have you any plans to make OpenFst signed-ness clean?
To reproduce just add -Wall to compilation of fst_test
cd openfst-1.2.5/src/test
g++ -DHAVE_CONFIG_H   -I./../include -D_REENTRANT  -g -O2 -MT fst_test.o -MD -MP -MF .deps/fst_test.Tpo -c -o fst_test.o fst_test.cc -Wall

 
Log In

Determinize problem and Composition filters

HyungBaeJeon - 02 Nov 2010 - 20:45

Hi,

I want to do composition of L, G. But Determnize is not terminate.

My shell command is like below; fstcompose L.fst G.fst LG.com.fst fstepsnormalize LG.com.fst LG.epn.fst fstdeterminize LG.epn.fst LG.det.fst fstencode --encode_labels LG.det.fst codex LG.encode.fst fstminimize LG.encode.fst LG.min.fst fstencode --decode LG.min.fst codex LG.decode.fst fstarcsort LG.decode.fst LG.final.fst

I did arcsort input lable of G.fst, output label of L.fst. And G was generated from 1.3M entry trigrams.

Is there any mistake in my script?

When i use openfst v1.0, determinize is complete. And, If i use at&t fsmdeterminize, it is also working well. Do I need additional argument or optional parameter?

Below is my fst size.

L.fst # of states 43484 # of arcs 50264

G.fst # of states 540723 # of arcs 2421839

LG.com.fst (fstcompose L.fst G.fst LG.com.fst) # of states 8040320 # of arcs 11991899

LG.epn.fst (fstepsnormalize LG.com.fst LG.epn.fst) # of states 7502435 # of arcs 12709467

Another question is, Can I use lookahead composition filter using shell command?

Thanks,

 
Log In

Shortest Path with constraints

RolandSchwarz - 01 Nov 2010 - 11:53

Hiya,

I was wondering if there is a way to compute a shortest path closest to a certain weight over the tropical sr. So instead of getting the absolute shortest path with the lowest total score I'd be interested in getting the path that is closest to a total score of e.g. 5.

Related to that is there a tropical semiring variant which orders paths according to their absolute values instead of their raw values and that is still distributive (i.e. shortest path still works)? So if I have paths with total weights -7, 1, 4, 6, 8 I'd like to have them ordered as 1, 4, 6, -7, 8, i.e. the path with a score of 1 should be the shortest path. Thanks for your help!

Cheers,

Roland

 
Log In

Compose FST

BlueSky - 29 Oct 2010 - 07:56

Hi, I am new with fst, I have the same problem. I am trying to compose 2fsts, the fist one is "cot" and the second one {cat, car,come}. I could not get any composition's output. I used the fstinfo, the input/output symbol table are none. Although, some arcs should be composed. Thanks a lot, Memo

CyrilAllauzen - 29 Oct 2010 - 11:22

Your description seems vague but it seems that the composition should be empty. fstcompose automatically connects the output so you should use --connect=false if you want to see the non coaccessible paths.

Cyril

BlueSky - 01 Nov 2010 - 12:41

Hi Cyril, thanks a lot i tried it and it works by using the fstcompose in the shell. I got only the connected arcs. If i want to use it inside my code is the function "Connect" has the same option? I used "Connect(fst)" after the Compose method but i could not see the output when i tried to drew the fst as i did before in the shell.

Does it work fine if i use such a fst(the result of the first composition) in further operations, such as compose it with other fst? because i am unable to draw or print it to make sure! I appreciate your support. Memo

IvanProvalov - 03 Feb 2011 - 12:54

I am trying to test a simple string input-output with FST. Is there an equivalent of lexfsmstrings to replace the last part of the command below:

lexcompre -l example.lab -s "test" | fsmcompose - test.fst | lexfsmstrings -l example.lab

IvanProvalov - 03 Feb 2011 - 20:31

I found what I was looking for: fstprint --acceptor --isymbols=example.lab --osymbols=example.lab | awk 'BEGIN {printf("\n")} {printf("%s ",$4)} END {printf("\n")}
 
Log In

Compact fst

KonstantinSelivanov - 22 Oct 2010 - 06:20

Hi,

Could compact fsts be composed, intersected etc? I couldn't find related methods. Did I missed something?

And I came across a strange behavior. When start state isn't defined it causes segfaults with some operations (concatenation, union).

PaulDixon - 22 Oct 2010 - 09:19

I've composed a compact_acceptor type fst with a olabel_lookahead fst using the shell tools and lookahead composition.

KonstantinSelivanov - 29 Oct 2010 - 12:49

Thanks Paul. I managed to do what I need.
 
Log In

Incremental redundancy reduction

SimonDobrisek - 19 Oct 2010 - 15:00

Please allow me to draw your attention to the paper http://uni-lj.academia.edu/documents/0152/7609/tsd200a.pdf

It seems that the incremental FST redundancy reduction considerably outperforms the usual batch optimization approach.

The demonstration implementation of the presented algorithm is available at http://luks.fe.uni-lj.si/spider/

We believe that the incremental algorithm could be generalized and implemented also as part of the OpenFST toolkit.

Please let me know if you need any additional explanation regarding this algorithm.

Kind Regards!

Simon Dobrišek

 
Log In

Forward / Backward variables

JasonTuskanov - 13 Oct 2010 - 09:49

Hi. I would like to compute the forward and backward variables (see L.R. Rabiner. A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE, 1989) for each state of a FST. I'm surprised I have found nothing relevant so far. Did I miss something existing in the library, or shall I just write the piece of code on my own ?

CyrilAllauzen - 13 Oct 2010 - 18:45

You missed something: shortest distance

MichaelRiley - 20 Oct 2010 - 17:57

Use ShortestDistance
 
Log In

'fst::Cast' : ambiguous call to overloaded function

HonzaSilovsky - 12 Oct 2010 - 04:43

Compiling a project that refers to openfst library (I'm using v. 1.2.3) in Visual Studio (2010 express) I was getting the following error: error C2668: 'fst::Cast' : ambiguous call to overloaded function

It turned out that the problem is the incorrect order of keywords in the function declaration; the original declaration in vector-fst.h looks like template void friend Cast(const F &, G *); while I guess it should look like template friend void Cast(const F &, G *);

after this change everything compiles allright.

CyrilAllauzen - 13 Oct 2010 - 14:38

Indeed, you're right. This will be fixed in the next release. Thanks.

 
Log In

Compiling openfst-1.2.3 with Visual C++ 2008 Express

HenrikKinnemann - 01 Oct 2010 - 07:02

Does anybody has tried to compile openfst-1.2.3 with Visual C++ 2008 Express yet? For me it's not possible to install Visual C++ 2010.

How can I compile under MS Windows with the following options "--enable-bin=yes --enable-compact-fsts=no --enable-const-fsts=no --enable-far=no --enable-lookahead-fsts=no --enable-pdt=no --with-icu=yes" ?

 
Log In

Lookahead Composition Filter with RhoMatcher?

SashaRush - 22 Sep 2010 - 12:46

Is it possible to use one of the new lookahead filters on a composition with a Rho or other special matcher? It looks like the lookahead filter takes a lookahead matcher as an argument, but I'm guessing this is different than the matcher passed to ComposeFST.

 
Log In

unexpected composition result

CameronSmith - 22 Sep 2010 - 12:02

When I compose two cyclic, single-state, unweighted transducers I get an unexpected result. The composition is TA o TF = TB; where TA: http://j.mp/d1fDP9 , TF: http://j.mp/ddSOPy , TB: http://j.mp/amYjNt . However I would expect the output to be TBc: http://j.mp/b5dnNo . The code I have used to define and compose these transducers is copied below. Is there some mistake I have made in defining the transducers for this purpose or is there a problem with the way in which I have used the Compose() or other functions?

#include <iostream>
#include <fstream>
#include <stdlib.h>
#include <string>
#include <cstring>

#include <fst/fstlib.h>
#include <fst/compact-fst.h>
#include <fst/extensions/far/farlib.h>



using namespace fst;


int main()
{
int fstnum = 3;
string fstnames[fstnum];

fstnames[0]="TA";
fstnames[1]="TF";
fstnames[2]="TB";
//------------------Generate [population] of FSTs--------------------//

// A vector FST is a general mutable FST
StdVectorFst TA;

// Adds state 0 to the initially empty FST and make it the start state.
TA.AddState();   // 1st state will be state 0 (returned by AddState)
TA.SetStart(0);  // arg is state ID

// Adds two arcs exiting state 0.
// Arc constructor args: ilabel, olabel, weight, dest state ID.
TA.AddArc(0, StdArc(0, 0, 0, 0));  // 1st arg is src state ID


TA.SetFinal(0,0); // SetFinal (StateID, weight)

//We can save this FST to a file with:
TA.Write("onestate/TA.fst");

StdVectorFst TF;

// Adds state 0 to the initially empty FST and make it the start state.
TF.AddState();   // 1st state will be state 0 (returned by AddState)
TF.SetStart(0);  // arg is state ID

// Adds two arcs exiting state 0.
// Arc constructor args: ilabel, olabel, weight, dest state ID.
TF.AddArc(0, StdArc(0, 1, 0, 0));  // 1st arg is src state ID
TF.AddArc(0, StdArc(1, 0, 0, 0));  // 1st arg is src state ID


TF.SetFinal(0,0); // SetFinal (StateID, weight)

//We can save this FST to a file with:
TF.Write("onestate/TF.fst");


//------------------randomly select 2 FSTs----------//
//--------------------Compose 2 FSTs----------------//

// The FSTs must be sorted along the dimensions they will be joined.
// In fact, only one needs to be so sorted.
// This could have instead been done for "model.fst" when it was created.
ArcSort(&TA, StdOLabelCompare());
ArcSort(&TF, StdILabelCompare());

// Container for composition result.
StdVectorFst result;

// Create the composed FST.
Compose(TA, TF, &result);

result.Write("onestate/TB.fst");


//---------compute and store interaction network complexity-------//

//---------print interaction network---------------//
int cmdtst = system(NULL);
if (cmdtst != 0 )
{
    for (int i=0; i<fstnum; i++)
    {

        string sh1="fstdraw onestate/.fst onestate/.dot";
        string sh2="dot -Tps onestate/.dot > onestate/.ps";
        string sh3="evince onestate/.ps &";
        string fstname = fstnames[i];
        sh1.insert(17,fstname); sh1.insert(33,fstname);
        sh2.insert(18,fstname); sh2.insert(36,fstname);
        sh3.insert(16,fstname);
        system(sh1.c_str()); system(sh2.c_str()); system(sh3.c_str());

    }

} else { cout << "No command interpreter available \n"; };

return 0;

}

CameronSmith - 22 Sep 2010 - 15:19

update: I apologize I see from the "conventions" section that I am using the epsilon transition label 0 when I do not intend to indicate an epsilon transition. Is there a way to redefine the epsilon symbol so that I can use the binary alphabet in my state transition labels?

MichaelRiley - 20 Oct 2010 - 17:56

no.
 
Log In

far weirdness

RichardSproat - 15 Sep 2010 - 21:44

This ought to be simple and just work out of the box, so why am I getting the following behavior:

rocotillo:testdata sproatr$ fstinfo TOKENIZER | sed 3q fst type vector arc type standard input symbol table none rocotillo:testdata sproatr$ farcreate TOKENIZER tokenizer.far rocotillo:testdata sproatr$ farinfo tokenizer.far ERROR: FstHeader::Read: Bad FST header: tokenizer.far: ERROR: ReadSTTableHeader: error reading file: tokenizer.far ERROR: Error reading FAR: tokenizer.far ERROR: GenericRegister::GetEntry : dlopen(-arc.so, 1): image not found FATAL: No operation found for "FarInfo" on arc type

So, the fst TOKENIZER is well-formed, "farcreate" apparently works, but whatever it's creating ain't happy.

CyrilAllauzen - 16 Sep 2010 - 14:38

There is a bug in FAR that only manifest itself on 32-bit machines. This can be fixed by changing line 79 of src/include/extensions/far/sttable.h to:

    WriteType(stream_, static_cast<int64>(positions_.size()));

This will be fixed in the next release. Thanks to Richard for his help on this!

 
Log In

unweighted transducers

CameronSmith - 15 Sep 2010 - 17:09

How would one construct and compose unweighted transducers? Is it necessary to write a BoolWeight class to do this and then create new type definitions:

typedef ArcTpl WeightlessArc; typedef VectorFst WlsVectorFst;

as respective analogs to:

typedef ArcTpl StdArc; typedef VectorFst StdVectorFst;

Is it possible and reasonable to set the kNotWeighted property to True for each transducer?

Is there a simpler and/or better way of accomplishing this?

As a disclaimer, I may have a fundamental misunderstanding of the weights themselves (or of other essential concepts involved for that matter).

Thanks.

CameronSmith - 21 Sep 2010 - 16:15

update: sorry for this comment i figured out what was happening
 
Log In

compilation needs -ldl

ChrDr - 15 Sep 2010 - 07:08

On my Arch Linux System the compilation only works if I specify LIBS="-ldl" ./configure --prefix=/usr but fails with ./configure --prefix=/usr

Shouldn't it compile also in the second case?

 
Log In

Random FST generation

CameronSmith - 14 Sep 2010 - 12:30

Is it possible to generate a random FST from the space of all possible (e.g. two-state) transducers given a particular input and output alphabet (e.g. {0,1}) and a constant weight (e.g. 1 or unweighted)?

If anyone could provide a brief skeleton for accomplishing this I would appreciate it.

Thanks.

 
Log In

Convert machine from FSM to FST

KonstantinSelivanov - 10 Sep 2010 - 12:29

I wonder if there is a way to convert FSM to FST format?

Thanks.

KonstantinSelivanov - 10 Sep 2010 - 12:33

upd: I mean from ATT FSM library format to OpenFST format.

AMorenoDaniel - 10 Sep 2010 - 14:32

They share the text representation. fsmprint your.fsm | fstcompile > your.fst

KonstantinSelivanov - 11 Sep 2010 - 08:58

Thanks Daniel.

But it seems they have some distinctions. E.g. when I create language model using grmcount/grmake there are some arcs with weight "Infinity". I didn't see such weights in openfst library. Or maybe I'm doing something wrong?

MichaelRiley - 20 Oct 2010 - 18:00

Should be the same. Post small example if not.
 
Log In

FAST_LOG_PROB_ARC_SELECTOR

AMorenoDaniel - 08 Sep 2010 - 13:01

Hi, I'm taking a look at the version 1.2.2, and I wonder if
FAST_LOG_PROB_ARC_SELECTOR
was accidentally included as option for fstrandgen.cc. It is not defined anywhere else as far as I see.

My first compilation succeeded on Cygwin after I commented out that option. Cheers

BusyBeaver - 08 Sep 2010 - 13:27

Had the same issue here. The macro is not defined anywhere in the tar ball

CyrilAllauzen - 09 Sep 2010 - 11:40

My bad. Last minute changes before the release not done carefully. FAST_LOG_PROB_ARC_SELECTOR should have been defined in the enum in script/randgen.h.

MichaelRiley - 09 Sep 2010 - 17:11

Fixed in version 1.2.3
 
Log In

Compiling 1.2.1 on Mac OSX

RichardSproat - 02 Sep 2010 - 17:09

Has anyone run into the following problem when trying to compile 1.2.1 out of the box on OSX?

g++ -DHAVE_CONFIG_H -I./../include -g -O2 -MT replace.lo -MD -MP -MF .deps/replace.Tpo -c replace.cc -fno-common -DPIC -o .libs/replace.o /usr/include/c++/4.0.0/tr1/hashtable: In member function 'typename std::tr1::hashtable<Key, Value, Allocator, ExtractKey, Equal, H1, H2, H, RehashPolicy, cache_hash_code, mutable_iterators, unique_keys>::const_iterator std::tr1::hashtable<Key, Value, Allocator, ExtractKey, Equal, H1, H2, H, RehashPolicy, cache_hash_code, mutable_iterators, unique_keys>::find(const Key&) const [with Key = int, Value = std::pair, Allocator = std::allocator<std::pair >, ExtractKey = Internal::extract1st<std::pair >, Equal = std::equal_to, H1 = std::tr1::hash, H2 = Internal::mod_range_hashing, H = Internal::default_ranged_hash, RehashPolicy = Internal::prime_rehash_policy, bool cache_hash_code = false, bool mutable_iterators = true, bool unique_keys = true]': ./../include/fst/replace.h:531: instantiated from 'bool fst::ReplaceFstImpl<A, T>::IsNonTerminal(typename A::Label) const [with A = fst::LogArc, T = fst::DefaultReplaceStateTable<fst::LogArc, ssize_t>]' ./../include/fst/replace.h:560: instantiated from 'size_t fst::ReplaceFstImpl<A, T>::NumInputEpsilons(typename A::StateId) [with A = fst::LogArc, T = fst::DefaultReplaceStateTable<fst::LogArc, ssize_t>]' ./../include/fst/fst.h:732: instantiated from 'size_t fst::ImplToFst<I, F>::NumInputEpsilons(typename I::Arc::StateId) const [with I = fst::ReplaceFstImpl<fst::LogArc, fst::DefaultReplaceStateTable<fst::LogArc, ssize_t> >, F = fst::Fst<fst::LogArc>]' replace.cc:43: instantiated from here /usr/include/c++/4.0.0/tr1/hashtable:1136: error: passing 'const std::tr1::hashtable<int, std::pair, std::allocator<std::pair >, Internal::extract1st<std::pair >, std::equal_to, std::tr1::hash, Internal::mod_range_hashing, Internal::default_ranged_hash, Internal::prime_rehash_policy, false, true, true>' as 'this' argument of 'typename std::tr1::hashtable<Key, Value, Allocator, ExtractKey, Equal, H1, H2, H, RehashPolicy, cache_hash_code, mutable_iterators, unique_keys>::node* std::tr1::hashtable<Key, Value, Allocator, ExtractKey, Equal, H1, H2, H, RehashPolicy, cache_hash_code, mutable_iterators, unique_keys>::find_node(Internal::hash_node<Value, cache_hash_code>*, const Key&, typename std::tr1::hashtable<Key, Value, Allocator, ExtractKey, Equal, H1, H2, H, RehashPolicy, cache_hash_code, mutable_iterators, unique_keys>::hash_code_t) [with Key = int, Value = std::pair, Allocator = std::allocator<std::pair >, ExtractKey = Internal::extract1st<std::pair >, Equal = std::equal_to, H1 = std::tr1::hash, H2 = Internal::mod_range_hashing, H = Internal::default_ranged_hash, RehashPolicy = Internal::prime_rehash_policy, bool cache_hash_code = false, bool mutable_iterators = true, bool unique_keys = true]' discards qualifiers make[3]: * [replace.lo] Error 1

RichardSproat - 02 Sep 2010 - 19:01

BTW, I know about this:

http://openfst.cs.nyu.edu/twiki/bin/view/FST/CompilingOnMacOSX

did the relevant updates, and was fine compiling 1.1

CyrilAllauzen - 03 Sep 2010 - 11:56

I'll look into this but in the meantime I recommend you use the first work-around mentioned on the page: using gcc 4.2 from Fink instead of Apple's old gcc version.

CyrilAllauzen - 03 Sep 2010 - 16:50

Richard and I traced down the issue to an other bug in gcc 4.0 TR1 implementation. An updated hashtable header file is available at FST.CompilingOnMacOSX.
 
Log In

Application of RTNs?

KennethRBeesley - 01 Sep 2010 - 13:16

OpenFst provides the Replace() and ReplaceFst() operations that take an RTN (an Fst wherein some of the output labels are references to subnetworks, i.e. are non-terminals) and replaces each such reference with a copy of the referred-to subnetwork. The replacement operations appear to function well, turning an RTN into a normal Fst. Is there any support for applying an RTN to input without first performing a Replace()ment? Thanks.

CyrilAllauzen - 01 Sep 2010 - 15:38

ReplaceFst is a delayed representation. Copying it into a VectorFst will perform the replacement. Composing your input with the ReplaceFst is what you want.
 
Log In

Mindtuning for PDT?

KennethRBeesley - 01 Sep 2010 - 13:05

A colleague and I are trying to understand the experimental library for Push Down Transducers (PDTs) in OpenFst 1.2. What does this mean: "some transitions are open or close parentheses"? Could some kind soul post a small example showing what the transition labels really look like? Thanks

CyrilAllauzen - 01 Sep 2010 - 15:41

Each pair of opening and closing parentheses represents a stack symbol. The opening parenthesis corresponds to pushing that stack symbol on the stack, the closing parenthesis corresponds to poping that symbol from the stack.
 
Log In

pre-determinization

WeiwenLeung - 30 Aug 2010 - 06:19

Are there any tool in openfst that i can use to predeterminization any fst?

CyrilAllauzen - 01 Sep 2010 - 15:42

Unfortunately, I don't know of any such tool.
 
Log In

fstepsnormalize

WeiwenLeung - 27 Aug 2010 - 01:56

When calling fstepsnormalize for the composed fst for L and G, the memory increases regularly to the available size of my server (48GB) and killed by the Linux system. The sizes of my L and G fsts are 4MB and 31MB respectively.

is there anyone encounter the same problem?

 
Log In

fstcompose output

AliAzimizadeh - 20 Aug 2010 - 16:47

Hi, I have two fst file, both of them are right. when they are composed by fstcompose the output file size is 66k and when i use fstprint and convert it to text mode, there is nothing there and file is empty. can you give me any hint about this problem? Thanks so much Ali

AMorenoDaniel - 21 Aug 2010 - 15:04

You can use 'fstinfo' or traverse the fst using state/arcs iterators using c++ lib. Expose the guts and you'll know whats going on.

AliAzimizadeh - 21 Aug 2010 - 16:34

Thanks i found the problem

BlueSky - 29 Oct 2010 - 05:15

Hi, I am new with fst, I have the same problem. I am trying to compose 2fsts, the fist one is "cot" and the second one {cat, car,come}. I could not get any composition's output. I used the fst info, the input/output symbol table are none. Although, some arcs should be composed. Thanks a lot, Memo
 
Log In

Compiling openfst-1.2 on cygwin

AMorenoDaniel - 18 Aug 2010 - 17:48

Has anyone compiled openfst-1.2 (recently released) on cygwin?
As suggested a year ago for openfst-1.1, I did:
./configure CXX=g++-4.exe CC=gcc-4.exe --prefix=$HOME/local
make
#make install  # <--- I haven't made it here though.
Make aborts with the error message:
g++-4.exe -g -O2 -o fstarcsort.exe fstarcsort.o  -ldl ../lib/.libs/libfst.a ../script/.libs/libfstscript.a /usr/lib/gcc/i686-pc-cygwin/4.3.4/libstdc++.dll.a   -L/usr/lib/gcc/i686-pc-cygwin/4.3.4 -L/usr/lib/gcc/i686-pc-cygwin/4.3.4
../script/.libs/libfstscript.a(fst-class.o):fst-class.cc:(.text$_ZN3fst16CompatPropertiesEyy[fst::CompatProperties(unsigned long long, unsigned long long)]+0x143): undefined reference to `fst::PropertyNames'
~~~----ETCETERA----~~~
collect2: ld returned 1 exit status
make[3]: *** [fstarcsort.exe] Error 1
make[2]: *** [all-recursive] Error 1
make[1]: *** [all-recursive] Error 1
make: *** [all] Error 2
so it looks as if the linker doesn't find the definition of CompatProperties.
I must add that the following warning occurs twice during Make:
libtool: link: warning: undefined symbols not allowed in i686-pc-cygwin shared libraries
Once here:
/bin/sh ../../libtool --tag=CXX   --mode=link g++-4.exe  -g -O2 -version-info 0:0:0  -o libfst.la -rpath /usr/local/lib compat.lo flags.lo fst.lo properties.lo symbol-table.lo util.lo
and once again here:
/bin/sh ../../libtool --tag=CXX   --mode=link g++-4.exe  -g -O2 -version-info 0:0:0  -o libfstscript.la -rpath /usr/local/lib arcsort.lo closure.lo compile.lo compose.lo concat.lo connect.lo convert.lo decode.lo determinize.lo difference.lo draw.lo encode.lo epsnormalize.lo equal.lo equivalent.lo fst-class.lo info.lo intersect.lo invert.lo map.lo minimize.lo print.lo project.lo prune.lo push.lo randequivalent.lo randgen.lo relabel.lo replace.lo reverse.lo reweight.lo rmepsilon.lo script-impl.lo shortest-distance.lo shortest-path.lo synchronize.lo text-io.lo topsort.lo union.lo weight-class.lo

DanielPovey - 19 Aug 2010 - 05:43

I am getting this problem too. Currently I am investigating whether adding -no-undefined to all the *_la_LDFLAGS variables in the Makefiles in the subdirectories will fix this problem. Dan

DanielPovey - 19 Aug 2010 - 06:06

BTW, this didn't work.

AMorenoDaniel - 19 Aug 2010 - 11:37

yeah, setting LDFLAGS="-no-undefined" took care of the warnings, but an error of similar nature remains.

DanielPovey - 19 Aug 2010 - 13:26

A co-worker suggested adding the following options to configure: --enable-static --disable-shared

DanielPovey - 20 Aug 2010 - 05:09

BTW, this didn't work. It seems it still tried to build the shared libraries. Somehow configure didn't correctly accept the options.

PaulDixon - 21 Aug 2010 - 00:21

I have most of version 1.2 compiling under VS2010, just the far and pushdown transducers are missing. I put the port so far here:

www.furui.cs.titech.ac.jp/~dixonp/openfstwin-1.2-210810.zip?

It includes prebuilt 32-bit binaries that should also work under cygwin. It's all 1.2 code the 1.1 directory name still needs to be changed.

DanielPovey - 21 Aug 2010 - 05:10

Thanks, Paul. Following up on the cygwin issue: A co-worker was able to work-around this problem as follows by manually changing some of the compile commands. For example, Not sure what it is. It looks like some stuff is missing from the lib archives. But I was able to compile arcsort by hand, the others are probably the same. Instead of (from src/bin): /bin/sh ../../libtool --tag=CXX --mode=link g++ -g -O2 -lm -ldl -o fstarcsort.exe fstarcsort.o ../lib/libfst.la ../script/libfstscript.la I went to that dir and did by hand: /bin/sh ../../libtool --tag=CXX --mode=link g++ -g -O2 -lm -ldl -o fstarcsort.exe fstarcsort.o ../script/libfstscript.la ../lib/*.o ../lib/libfst.la

 
Log In

ReplaceFst? () and testing for CyclicDependencies?

KennethRBeesley - 17 Aug 2010 - 14:21

I've just started experimenting with ReplaceFst(). The following script defines three Fsts, with cyclic dependencies. The line

if (ReplaceFst<StdArc>(pairlabelfsts, slabel).CyclicDependencies()) { ...}

returns true, apparently detecting the cyclic dependencies correctly.

Question: Have I used the proper idioms for testing for cyclic dependencies and doing the actual replacement?

#include <fst/fstlib.h>
#include <iostream>

using namespace fst ;

int main() {
   std::cout << "Hello, world!\n" ;

   int a = 97 ;   // code point values
   int b = 98 ;

   // example with cyclic dependencies

   // StdVectorFst is a typedef for VectorFst<StdArc>
   StdVectorFst fst1, fst2, fst3 ;

   // define fst1, a subnetwork (will be arbitrarily associated with -1)
   fst1.AddState() ;     // return 0
   fst1.SetStart(0) ;
   fst1.AddState() ;   // returns 1
   fst1.AddState() ;   // returns 2
   fst1.SetFinal(2, StdArc::Weight::One()) ;
   fst1.AddArc(0, StdArc(a, a, StdArc::Weight::One(), 1)) ;
   // fst1 refers to fst2 (associated below with -2)
   fst1.AddArc(1, StdArc(0, -2, StdArc::Weight::One(), 2)) ;
   //         src        i  o          w             dest

   // define fst2, a subnetwork (will be arbitrarily associated with -2)
   fst2.AddState() ;   // return 0
   fst2.SetStart(0) ;
   fst2.AddState() ;   // return 1
   fst2.AddState() ;   // return 2
   fst2.SetFinal(2, StdArc::Weight::One()) ;
   fst2.AddArc(0, StdArc(b, b, StdArc::Weight::One(), 1)) ;
   // fst2 refers to fst1 (associated below with -1)
   fst2.AddArc(1, StdArc(0, -1, StdArc::Weight::One(), 2)) ;

   // define fst3, the base network (will be arbitrarily associated with -3)
   fst3.AddState() ;   // return 0
   fst3.SetStart(0) ;
   fst3.AddState() ;   // return 1
   fst3.AddState() ;   // return 2
   fst3.SetFinal(2, StdArc::Weight::One()) ;
   // add arcs with "references" to subnetworks
   // output-side -1 will be associated with fst1
   // output-side -2 will be associated with fst2
   // The label references to subnetworks are on the Output Side.
   // The corresponding Input-Side labels can be anything (here 0 for epsilon).
   fst3.AddArc(0, StdArc(0, -1, StdArc::Weight::One(), 1)) ;
   fst3.AddArc(1, StdArc(0, -2, StdArc::Weight::One(), 2)) ;
   //         src        i   o           w            dest

   // set up the vector associating arbitrary numbers with the three networks
   // -1 refers to fst1
   vector< pair< StdArc::Label, const Fst<StdArc> * > > pairlabelfsts ;
   pairlabelfsts.push_back(pair< StdArc::Label, const Fst<StdArc> *>(-1, &fst1) ) ;
   // -2 refers to fst2
   pairlabelfsts.push_back(pair< StdArc::Label, const Fst<StdArc> *>(-2, &fst2) ) ;
   // -3 is the base network
   pairlabelfsts.push_back(pair< StdArc::Label, const Fst<StdArc> *>(-3, &fst3) ) ;
   
   StdArc::Label slabel = -3 ;      // specifies the "base" Fst, which is fst3

   fst3.Write("before.fst") ;      // the network with references

   // Using delayed ReplaceFst
   if (ReplaceFst<StdArc>(pairlabelfsts, slabel).CyclicDependencies()) {
      std::cout << "Has cyclic dependencies.\n" ;
   } else {
      std::cout << "No cyclic dependencies.\n" ; 
      StdVectorFst expandedFst ;   // the result
      // slabel (here -3) is used to find the base Fst supplied in pairlabelfsts
      expandedFst = ReplaceFst<StdArc>(pairlabelfsts, slabel) ;
      expandedFst.Write("intermediate.fst") ;   // the network just after ReplaceFst()
      RmEpsilon(&expandedFst) ;
      expandedFst.Write("after.fst") ;   // the network after RmEpsilon()
   }

   std::cout << "Good-bye, world!\n" ;
}

CyrilAllauzen - 17 Aug 2010 - 15:52

Yes, this looks correct to me.

Best, Cyril

 
Log In

Multithreading in OpenFST?

AlbertPark - 09 Jul 2010 - 18:30

Hi,

I am currently using the OpenFST library, and I was thinking of multithreading my program, and I was curious as to whether the OpenFST library currently supports multithreading. Thank you!

Albert

AlbertPark - 21 Jul 2010 - 04:13

As I haven't gotten a response yet, this is a self-reply to my question.

I've gone through looking at the implementation and from what I can tell the OpenFST library currently is not multithreading safe. I've resorted to doing a deep copy of FSTs for each thread. Unfortunately, doing a deep copy was not as straightforward as I thought it would be, or at least I couldn't figure out how to do it elegantly and quickly (The FST I am reading in is very large, and trying to read in the FST multiple times takes much longer than I want). I tried doing a deep copy using Map() with the IdentityMapper as mentioned in a previous post, but found that this only does deep copies of the arcs (given the copy constructor of the Arc and Weight do deep copies), and does a shallow copy of the symbol table, which creates problems for multithreading (when multiple threads try to access the symbol table at once).

I'm not sure if there is an easier way to do this, but here is what I did to change the OpenFST code to support deep copying of the symbol table. I added a new enum code in map.h to MAP_SYMBOLS_ACTION as well as a couple of lines of code for supporting symbol table deep copying, added a function to do a deep copy of the symbol table to symbol-table.h. Finally, I added a Mapper almost identical to the identity mapper, but returns the new enum code for InputSymbolsAction() and OutputSymbolsAction().

Hope this helps anybody trying to multithread!

MichaelRiley - 09 Aug 2010 - 20:42

OpenFst is meant to be thread compatible/safe in the sense that if you use the method Copy(true) to copy FSTs then different threads can read the each of those. This is a 'deep' copy only as deep as needed for this to work. NOTE HOWEVER IT IS NECESSARY TO IMPLEMENT the classes in lock.h in a non-trivial way at your site. Do so and the behavior as described will work.

MichaelRiley - 09 Aug 2010 - 20:44

BTW, these comments may apply more to OpenFst 1.2 - very soon to be released. Don't remember how complete 1.1 was in this respect.
 
Log In

memory problem in fstcompose

AliAzimizadeh - 07 Jul 2010 - 09:18

Hi, i use openfst in HTS 2.1.1 . i just compile two fst by a symbol table in AT&T format:

0 #_f0 1 #_m0_s2 2 #_m0_s3 3 #_m0_s4 4 #_m0_s5 5 #_m0_s6 6 e$_f0 7 e$_m1_s2 8 e$_m1_s3 9 e$_m1_s4 10 e$_m1_s5 11 e$_m1_s6 12 . . . ow_f6037 170654 s$_f6037 170655 k_f6037 170656 u_f6037 170657 i_f6037 170658 s_f6037 170659 v_f6037 170660 h_f6037 170661 c$_f6037 170662 j_f6037 170663 z$_f6037 170664

its size is about 2.5Mb. then i run the below command successfully:

fstcompile --isymbols=a.sym --osymbols=a.sym tmp1.fst fstcompile --isymbols=a.sym --osymbols=a.sym tmp2.fst

then i want to compose them and run this:

fstcompose tmp2.fst tmp3.fst tmp4.fst

when the symbol number is little the composition is done but in some symbol files the symbol number is very large, for example 170655. in this case, the memory is fulled rapidly and the CPU doesn't work any more. this is my system properties: CPU: Pentium 4 Dual core 2.26GHz Ram: 4G Swap: 15G OS: CentOS 5.3

could you please give me suggestions about the case? Thanks a lot -Ali

PaulDixon - 08 Jul 2010 - 21:52

Have you tried arc sorting the fsts before composing?

fstarcsort -sort_type=olabel tmp2.fst > sort2.fst

fstarcsort -sort_type=ilabel tmp3.fst > sort3.fst

fstcompose sort2.fst sort3.fst > tmp4.fst

AliAzimizadeh - 09 Jul 2010 - 06:37

I tested your solution, but the problem remains yet. i think my fst model is big. the size of tmp2.fst and tmp3.fst are 147MB and 1.5M. Do you think my Ram memory enough to this process? if your answer is no, approximately, how much Ram does need ? Thanks for your attention -Ali

WeiwenLeung - 27 Aug 2010 - 01:33

i encounter almost the same problem. when fstepsnormalize running, it takes more than 50GB, and killed by the system. the gram fsm 31MB, lexicon fsm 4MB.

did you figure out what the problem was ?

 
Log In

Composition and ShortestPath?

ChristopherKermorvant - 22 Apr 2010 - 12:20

Hi,

I make a composition of 3 transducers F o G o H and I extract the ShortedPath on the resulting transducer. Is there a way to have access to the arcs in F, G and H corresponding to the arcs in the ShortestPath transducer ? I would like to have access to the output symbols of the arcs in F for example.

thank you,

-- Chris

DoganCan - 27 Apr 2010 - 09:59

Hi Chris,

There may be better solutions to your problem provided by the library but a quick solution is:

1. remove the weights of the shortestpath fst, S = shortest(F o G o H), and project it onto the input and output labels

fstmap -map_type=rmweight s.fst | fstproject > s.in.fst
fstmap -map_type=rmweight s.fst | fstproject -project_output > s.out.fst

2. compose shortest input with F and project the result onto the output labels to get the candidate paths in F

compose shortest output with H and project the result onto the input labels to get the candidate paths in H

fstcompose s.in.fst f.fst | fstproject -project_output > f.out.fst
fstcompose h.fst s.out.fst | fstproject > h.in.fst

3. compose the candidate paths with G (F_out o G o H_in) to obtain the paths in G that survive during the composition

fstcompose f.out.fst g.fst | fstcompose - h.in.fst > internal.fst

4. finally apply shortestpath to this fst to obtain the path in G internal to the shortest path in F o G o H

fstshortestpath internal.fst > shortest.internal.fst

This final transducer should provide the internal path you need. Note that the weights of this transducer are those obtained during the composition not the original weights of G.

Best, Dogan

RolandSchwarz - 13 May 2010 - 09:47

Hi, I'm also struggling with this and unfortunately the solution you provided doesn't seem to be working for me. Let's say F and H are strings of length 1000 and G is a simple two-state transducer. I'm interested in the internal path between the two states of the transducer G within S. In your solution f.out.fst is still of length 1000 and so is h.in.fst. It seems natural to me, that the composition in step 3 therefore leads to another large transducer and the shortestpath in step 4 is just as complex as the original transducer S. But I'm not totally sure if this is maybe even a different problem. Best, Roland

DoganCan - 14 May 2010 - 18:05

Hi Roland,

I am not entirely sure that I got your problem. I mean the internal path in G should be as complex as S.

Can you post a simple example where the above procedure fails?

Best, Dogan

RolandSchwarz - 15 May 2010 - 04:49

Hm, I probably misunderstood you. Take my example from above. I compare two sequences using a transducer, so F and H are acceptors, G is a transducer. Now I am not only interested in the shortest path in terms of input and output symbols and the shortest distance, but also in the state path within transducer G that lead to that distance. All the fstshortestpath gives me, is the shortest path with states numbered from 1 to 1000, but I would be interested in which of those steps went through state 1 in transducer G and which through state 2 (or how many states the transducer has). This is not evident from the output of fstshortestpath if the transducer G is not deterministic. I hope that clears things up a bit, I thought this was what you were referring to when talking about "internal paths". Best,

Roland

 
Log In

transducing phoneme lattices using WFSTs

BusyBeaver - 03 Apr 2010 - 11:38

Dear Community,

I'd like to convert phoneme lattices into WFSTs and use a grammar transducer to obtain a word lattice out of it. The thing is, that I want to keep additional information e.g. time stamps, acoustic scores, etc. while doing so. Note, that I do not want to obtain any likelihoods on paths in the first place. Instead, I would traverse the final lattice using those accumulated values to determine a likelihood for each transition and user-defined scaling factors myself.

I see two possibilities to keep track on times/scores - by storing them on arcs or using the string semiring to push just arbitrary information in the weights. When I use the string semiring, I could simply push there as much as I want to,and do a composition with my grammar. But then I am somewhat restricted: composition is not supported, because the string semiring is not commutative. I was thinking to rewrite the weighted composition algorithm from the paper, just change the weighting part, since I do not care about the order of elements within my strings anyways. But maybe I could keep my input fst as log/tropical semiring type and attach my scores/time stamps in the arcs. Would it then be possible to use the existing composition algorithm and still copy my attached information into the resulting FST? Maybe it is all much easier than what I thought of here right now, then I'd be glad if you give me some hints smile Thanks a lot in advance, Stefan

 
Log In

Probabilities as weights

NoChips - 17 Mar 2010 - 16:35

Hi

As far as I know the default arcs StdArc is using Tropical weight. How can I change that into Probabilty weight, so that transition weights are multiplied and not summed up?

I was looking for something predefined, but couldn't find anything...

Thank you in advance!

PaulDixon - 18 Mar 2010 - 04:14

You could use the LogArc type and convert to probabilities when needed.

There is a counting utility that includes a real semiring implementation and arc registration with the standard command line tools [here][1]

[1]: http://www.furui.cs.titech.ac.jp/~dixonp/wfstcount.zip]

 
Log In

Deleting arcs in an Fst

KennethRBeesley - 05 Mar 2010 - 19:37

Assume that I iterate through the states and arcs of an Fst, using the available iterators. Based on values of fields in the arcs (ilabel, olabel) I need to delete some particular arcs. How can I do that safely?

KennethRBeesley - 12 Mar 2010 - 16:46

I tried the following: 1) I called AddState() to add a new "sink" state to the FST, 2) In the Mutable Arc Iterator, for all the arcs to be deleted, I reset the nextstate to the sink state, then 3) I called Connect(). As the sink state is non-final, and has no exit arcs, Connect() deletes the sink state and all the arcs leading to it. It seems to work.

CyrilAllauzen - 17 Aug 2010 - 15:56

Indeed this would work. If you use VectorFst you could also use the DeleteStates method instead of Connect to delete the "sink" state.

JoachimChau - 01 Jun 2011 - 14:58

hi, sorry to bother you, but how to reset a arc's nextstate to the sink state? For example, the original arc is (sate1, state2), how to reset it to (state1, "sink")?
 
Log In

arcsum.h and arcmerge.h minor bug?

PhilSours - 05 Mar 2010 - 17:29

In ArcSumCompare and ArcMergeCompare, the 4th line: if (x.olabel < y.olabel) return false; should probably be: if (x.olabel > y.olabel) return false;

 
Log In

Memory footprint of an Fst

KennethRBeesley - 22 Feb 2010 - 15:16

Given a pointer to an Fst, is there an easy way to calculate its total memory footprint?

 
Log In

hbka reference paper

NewestUser - 26 Jan 2010 - 05:05

Hi, I have been reviewing the hbka paper referenced on the background material section of the site,

http://www.cs.nyu.edu/~mohri/postscript/hbka.pdf

and the figure for the minimization example, pg. 8, fig. 5(c) seems quite wrong. particularly the 'minimized' version is not equivalent to the original transducer. in the other reference

http://www.cs.nyu.edu/~mohri/pub/csl01.pdf

an identical transducer, albeit with different weights on a couple of the arcs, appears very differently in its minimized form.

i verified that the second version referenced in the latter link produces the expected output but it would be nice to confirm this, or alternatively to understand why they are different.

CyrilAllauzen - 26 Jan 2010 - 18:44

There is a "typo" in Fig 5(c). There should have been transitions labeled by d (with weight 4) and e (with weight 5) from state 0 to state 1.

NewestUser - 26 Jan 2010 - 22:28

thanks for the clarification. i was getting worried that i was very confused about something.
 
Log In

Problems with fstshortestpath

JoeWang - 15 Jan 2010 - 03:56

HI, all. I run fstshortestpath on a toy composed WFST, however I got the error message:

ERROR: FstMainRegister::GetMain: log-arc.so: cannot open shared object file: No such file or directory ERROR: fstshortestpath: Bad or unknown arc type "log" for this operation (ShortestPathMain)

I have checked my $LD_LIBRARY_PATH, it already includes /usr/local/lib, how come it still cannot find the shared object? Thanks in advance

CyrilAllauzen - 15 Jan 2010 - 13:41

Hi,

The issue is that the shortest path algorithm does not work over the log semiring.

Hence support for LogArc is not compiled into the fstshortestpath utility and the code tries to locate a shared object containg the definition of that arc type.

Best, Cyril

 
Log In

Is it possible to use fstcompile to construct an acceptor instead of a transducer?

JoeWang - 12 Jan 2010 - 22:09

Hi, all, Is it possible to use fstcompile to construct an acceptor (i.e. symbol/weight) instead of a transducer (i.e. ilabel:olabel/weight)? Thanks in advance.

PaulDixon - 13 Jan 2010 - 00:46

Use the -acceptor=true command line switch

JoeWang - 13 Jan 2010 - 02:43

Thanks, Paul.
 
Log In

Missing installation files for creating a user-defined arc type?

PaulDixon - 12 Jan 2010 - 01:11

When creating a new arc type some of the files (the *-main.h files) needed to compile the shared object don't seem to be copied as part of the installation. A workaround is to add compiler include path to the files in the openfst-1.1/ src/bin/ dir.

For the shared object to be detected and loaded a name of the form arc-type.so worked but arc_type.so did not.

I put sample code for a real semiring arc type here

KeithPonting - 19 Nov 2010 - 07:41

Looks good, but with openfst 1.2.5 there do not seem to be any *-main.h files... it is possible that REGISTER_FST_OPERATIONS (namespace fst::script) has replaced REGISTER_FST_MAINS, but I have so far failed to find a combination which works when using the resultant shared library with fstprint. (Building using gcc 4.1.2 on OpenSusE 10.2)

CyrilAllauzen - 19 Nov 2010 - 11:22

Indeed, the registration mechanism was change in 1.2 and the *-main.h files are gone.

The contain of my-arc.cc (the source for my-arc.so) should look like:

#include "nlp/fst/lib/fst-inl.h"
#include "nlp/fst/lib/const-fst-inl.h"
#include "nlp/fst/script/register.h"
#include "nlp/fst/script/fstscript.h"
#include "my-arc.h"
namespace fst {
namespace script {

REGISTER_FST(VectorFst, MyArc);
REGISTER_FST(ConstFst, MyArc);
REGISTER_FST_CLASSES(MyArc);

REGISTER_FST_OPERATIONS(MyArc);

}
}

Hope this will work for you.

Cyril

KeithPonting - 30 Nov 2010 - 05:17

Thankyou for getting back to me so promptly. I am afraid above does not work for me as I do not seem to have any inl.h files in either the installed directories or the downloaded sources or the fst build directories ...

CyrilAllauzen - 30 Nov 2010 - 15:36

This is a copy paste mistake. You just need to delete the -inl 's, the lib/ 's and the nlp/ 's. This gives:

#include <fst/fst.h>
#include <fst/const-fst.h>
#include <fst/script/register.h>
#include <fst/script/fstscript.h>

KeithPonting - 01 Dec 2010 - 05:39

Eureka! It works. Thankyou.
 
Log In

Problems on Ubuntu Linux 9.10

JoeWang - 11 Jan 2010 - 04:40

Hi, I have successfully maked openfst-1.1 on Ubuntu Linux 9.10. However, when I use fstcompile to run the first example, I got the error message from the shell: "fstcompile: error while loading shared libraries: libfst.so.0: cannot open shared object file: No such file or directory" which shows that libfst.so.0 doesn't exist, I checked under /usr/local/lib, it does exist. I am quite confused, can somebody help me? Thanks very much

CyrilAllauzen - 11 Jan 2010 - 12:05

Try adding /usr/local/lib to your LD_LIBRARY_PATH environment variable:
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/lib
Best, Cyril

JoeWang - 12 Jan 2010 - 02:28

Thanks, Cyril. It works
 
Log In

Relabel() and epsilon Properties

KennethRBeesley - 04 Jan 2010 - 15:50

I think that I've found a problem with Relabel(), when hard symbols (stored on arcs as non-zero integers) are relabeled with epsilons (stored on arcs as zeros). E.g. I tried

vector < pair<StdArc::Label, StdArc::Label> > vect ; vect.push_back(pair<StdArc::Label, StdArc::Label> ( (StdArc::Label) 5, (StdArc::Label) 0 )) ; Relabel(fstp, vect, vect) ;

where fstp is a pointer to a StdVectorFst. ilabels and olabels originally labeled 5 are relabeled correctly as 0, possibly introducing input-side, output-side and double-sided epsilons; but (here's the problem) the Properties kIEpsilons, kNoIEpsilons, kOEpsilons, kNoOEpsilons, kEpsilons and kNoEpsilons are not updated.

At least that's what seems to be happening.

KennethRBeesley - 04 Jan 2010 - 18:03

Please ignore the previous message. The error was mine.
 
Log In

Finding n-shortest paths

NewUser - 29 Dec 2009 - 16:44

I was wondering what is the best way to go about finding n-shortest paths in an fst when the fst is considerably large (the fst in question is 1.3 GB in size; has around 10 million states and three times more number of arcs).

The shortest path algorithm expands the whole transducer and memory usage goes up to 2 GB! (I did see a previous post about setting the option first_path to true, but i need the first 100 or 1000 best paths to be outputted).

It would be of great help if someone had ideas on how to proceed.

Thanks a lot in advance.

 
Log In

Why is 64bit determinization slower 100 times than 32bit determinization?

KyuwoongHwang - 13 Dec 2009 - 21:29

To overcome memory limitation, I compiled the openfst for x64 in visual studio. It seems working well. but the speed is very slow. I suspected garbage collection and set the limit to 10GB to practically disable it. It didn't matter. The in.fst file size is 84MB. Did anyone have same problem? Thanks in advance for any help.

x32/fstdeterminize.exe --fst_default_cache_gc_limit=10000000000 in.fst out.fst ==> 292.40 secs.

x64/fstdeterminize.exe --fst_default_cache_gc_limit=10000000000 in.fst out.fst ==> 39034.45 secs.

NewestUser - 22 Dec 2009 - 04:32

is either one using swap? seems unlikely given the fairly small input size and x64, but i noticed that if the operation starts thrashing the swap it generally takes orders of magnitude longer to complete. also, what is your pointer size?

CyrilAllauzen - 11 Jan 2010 - 12:21

We commonly use 64 bit binaries on Linux and didn't notice any inefficiencies. Swapping could be the issue as pointed out by NewestUser since going 64 bits would increase the memory usage. You should also double-check that you are using the right compiler optimization options.

Best, Cyril

 
Log In

Adding other properties to arcs

AlexeiIvanov - 22 Nov 2009 - 17:00

I wonder what is the best way to augment arcs with properties other then input/output symbols and weights? E.g. in ASR lattice re-scoring sometimes it is beneficial to keep the time stamps of elementary acoustic events and propagate them to the final FST.

Thank you.

AlexeiIvanov - 24 Nov 2009 - 07:01

OK, I have found a way to do it. It works perfectly. It was achieved without any code modification. All necessary information is inserted into arc label.

If anyone needs details please, write me to alexei_v_ivanov@ieee.org

 
Log In

k-closed semirings

RolandSchwarz - 20 Nov 2009 - 08:28

I probably did not totally understand the Mohri papers, but what I got was that the single source shortest distance algorithm only works for locally closed semirings, where in the general case the Floyd-Warshall or Gauss-Jordan all-pairs shortest distance has to be applied. Now the code documentation of the openFST library tells me that the implementation there is the single source one according to Mohri 2002 and that it only works for k-closed semirings. Now here's what I think that means: (i) the code documentation is not totally precise, in that the algorithm also works for locally closed, not just k-closed semirings (ii) it therefore works on the tropical (which is 1-closed) and the log semiring with positive weights (which corresponds to the real semiring with 0 <= weights <=1) up to a certain accuracy (iii) if that is true, it does not work on the log semiring with positive and negative weights. It does not give an error though, and it seems to compute sensible distnances. What happens in that case? Is there an internal check for closedness? Does it fall back to an all-pairs implementation of the Floyd-Warshall-whatever variant?

Or alternatively, am I totally mistaken? Can someone shed some light on my confusion?

Cheers,

Roland

CyrilAllauzen - 11 Jan 2010 - 12:17

As long as there are no cycles with negative weights (in the log semiring), the single source shortest-distance will converge. This is what M. Mohri tried to convey by the notion of being k-closed for a given automaton.

We haven't implemented Floyd-Warshall. So there is no backup in case of a bad cycle, the algorithm would simply not converge.

Best, Cyril

 
Log In

ReWeight? : Working Example

NewestUser - 08 Nov 2009 - 04:11

Would it be possible to get a practical working example of the ReWeight operation, along with an example of the 'potentials.txt' input? At present it is not exactly clear how this operation might be used, or what it might be useful for.

CyrilAllauzen - 09 Nov 2009 - 11:45

An example of potentials.txt input is the ouptut of fstshortestdistance.

fstshortestdistance a.fst > potentials.txt
fstreweight --to_initial a.fst potentials.txt > b.fst
This is equivalent to:
fstpush --push_weights --to_initial a.fst > b.fst
This is actually the way pushing is implemented in the library, shortest-distance followed by reweighting.

 
Log In

read function causes problems

MichaelBerger - 30 Oct 2009 - 16:38

Hi,

I am trying to read a fst into a simple C++ application, however, for some reason I am being told that certain symbols are undefined:

Ld build/Debug/fstprintallpaths normal x86_64
cd /.../fstprintallpaths
setenv MACOSX_DEPLOYMENT_TARGET 10.6
/Developer/usr/bin/g++-4.2 -arch x86_64 -isysroot /Developer/SDKs/MacOSX10.6.sdk -L/Users/michael/code/fstprintallpaths/build/Debug -L/usr/local/lib -L/usr/local/lib/Cabal-1.6.0.2 -L/usr/local/lib/ImageMagick-6.4.1 -L/usr/local/lib/pkgconfig -L/usr/local/lib/Cabal-1.6.0.2/ghc-6.10.4 -L/usr/local/lib/ImageMagick-6.4.1/config -L/usr/local/lib/ImageMagick-6.4.1/modules-Q16 -L/usr/local/lib/Cabal-1.6.0.2/ghc-6.10.4/Distribution -L/usr/local/lib/Cabal-1.6.0.2/ghc-6.10.4/Language -L/usr/local/lib/ImageMagick-6.4.1/modules-Q16/coders -L/usr/local/lib/ImageMagick-6.4.1/modules-Q16/filters -L/usr/local/lib/Cabal-1.6.0.2/ghc-6.10.4/Distribution/Compat -L/usr/local/lib/Cabal-1.6.0.2/ghc-6.10.4/Distribution/PackageDescription -L/usr/local/lib/Cabal-1.6.0.2/ghc-6.10.4/Distribution/Simple -L/usr/local/lib/Cabal-1.6.0.2/ghc-6.10.4/Language/Haskell -L/usr/local/lib/Cabal-1.6.0.2/ghc-6.10.4/Distribution/Simple/Build -L/usr/local/lib/Cabal-1.6.0.2/ghc-6.10.4/Distribution/Simple/GHC -L. -F/Users/michael/research/projects/lattice_parsing/code/fstprintallpaths/build/Debug -filelist /Users/michael/research/projects/lattice_parsing/code/fstprintallpaths/build/fstprintallpaths.build/Debug/fstprintallpaths.build/Objects-normal/x86_64/fstprintallpaths.LinkFileList -mmacosx-version-min=10.6 -o /Users/michael/research/projects/lattice_parsing/code/fstprintallpaths/build/Debug/fstprintallpaths

Undefined symbols:
  "fst::Int64ToStr(long long, std::basic_string<char, std::char_traits<char>, std::allocator<char> >*)", referenced from:
      fst::FloatWeightTpl<float>::GetPrecisionString()  in fstprintallpaths.o
  "fst::FstHeader::Read(std::basic_istream<char, std::char_traits<char> >&, std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool)", referenced from:
      fst::Fst<fst::ArcTpl<fst::TropicalWeightTpl<float> > >::Read(std::basic_istream<char, std::char_traits<char> >&, fst::FstReadOptions const&)in fstprintallpaths.o
ld: symbol(s) not found
collect2: ld returned 1 exit status



-----------------------------------------------------------------
fstprintallpaths.cpp:

#include "fstprintallpaths.h"

#include <iostream>
#include <vector>
#include <fst/fstlib.h>
#include <fst/fst-decl.h>
#include <fst/vector-fst.h>
#include <fst/weight.h>

using namespace fst;

int main (int argc, char * const argv[]) {
    // insert code here...
   std::cout << "Hello, World!\n";
   StdFst *input = StdFst::Read("blah.fst");
   //StdFst *model = StdFst::Read("blah.fst");
   //printAllStrings(&fst);
    return 0;
}

string itos(int i) { // convert int to string; from Bjarne Stroustrup's FAQ
   stringstream s;
   s << i;
   return s.str();
}

string vectorToString(vector<int> v) {
   if(v.size() == 0)
      return "<>";
   string result = "<" + itos(v[0]);
   for(int i = 1; i < v.size(); i++) {
      result + "," + itos(v[i]);
   }
   return result + ">";
}

void printAllStringsHelper(VectorFst<StdArc> &fst, int state, vector<int> str, TropicalWeight cost) {
   if(fst.Final(state) == TropicalWeight::Zero()) 
      cout << vectorToString(str) << " with cost " << (Times(cost,fst.Final(state))) << endl;
   for(ArcIterator <VectorFst <StdArc> > aiter(fst,state); ! aiter.Done(); aiter.Next()) {
      StdArc arc = aiter.Value();
      str.push_back(arc.ilabel);
      printAllStringsHelper(fst, arc.nextstate, str, Times(cost, arc.weight.Value()));
      str.pop_back();
   }
}

// a bad idea if there are loops :)
void printAllStrings(VectorFst<StdArc> &fst) {
   vector<int> str(0);
   TropicalWeight tw(TropicalWeight::One());
   printAllStringsHelper(fst, 0, str, tw);
}

-----------------------------------------------------------------

fstprintallpaths.h

/*
 *  fstprintallpaths.h
 *  fstprintallpaths
 *
 */

#include <iostream>
#include <vector>
#include <fst/fstlib.h>
#include <fst/fst-decl.h>
#include <fst/vector-fst.h>
#include <fst/weight.h>

using namespace fst;

int main(int argc, char * const argv[]);

string itos(int i);

string vectorToString(vector<int> v);

void printAllStringsHelper(VectorFst<StdArc> &fst, int state, vector<int> str, TropicalWeight cost);

void printAllStrings(VectorFst<StdArc> &fst);

MichaelBerger - 30 Oct 2009 - 16:57

Trying it on the commandline gives the same problems:
$ g++ fstprintallpaths.cpp fstprintallpaths.h -I  /usr/local/include 
Undefined symbols:
  "fst::FstHeader::Read(std::basic_istream<char, std::char_traits<char> >&, std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool)", referenced from:
      fst::Fst<fst::ArcTpl<fst::TropicalWeightTpl<float> > >::Read(std::basic_istream<char, std::char_traits<char> >&, fst::FstReadOptions const&)in ccspqZtg.o
  "fst::Int64ToStr(long long, std::basic_string<char, std::char_traits<char>, std::allocator<char> >*)", referenced from:
      fst::FloatWeightTpl<float>::GetPrecisionString()  in ccspqZtg.o
ld: symbol(s) not found
collect2: ld returned 1 exit status

PaulDixon - 31 Oct 2009 - 08:19

Add a link switch -lfst for openfst to the g++ command
 
Log In

Determinization increases size of FST

MichaelBerger - 25 Oct 2009 - 14:36

Hi, I am trying to determinize a FST using fstdeterminize but for some reason the determinized fst is much larger than the original one:

$ fstinfo a_binary.fst
fst type                                          vector
arc type                                          standard
input symbol table                                ia.txt
output symbol table                               oa.txt
# of states                                       24
# of arcs                                         9095
initial state                                     0
# of final states                                 4
# of input/output epsilons                        0
# of input epsilons                               0
# of output epsilons                              9095
# of accessible states                            24
# of coaccessible states                          24
# of connected states                             24
# of strongly conn components                     24
expanded                                          y
mutable                                           y
acceptor                                          n
input deterministic                               n
output deterministic                              n
input/output epsilons                             n
input epsilons                                    n
output epsilons                                   y
input label sorted                                n
output label sorted                               y
weighted                                          y
cyclic                                            n
cyclic at initial state                           n
top sorted                                        y
accessible                                        y
coaccessible                                      y
string                                            n


$ fstinfo a_rmeps.fst
fst type                                          vector
arc type                                          standard
input symbol table                                ia.txt
output symbol table                               oa.txt
# of states                                       24
# of arcs                                         645
initial state                                     0
# of final states                                 4
# of input/output epsilons                        0
# of input epsilons                               0
# of output epsilons                              645
# of accessible states                            24
# of coaccessible states                          24
# of connected states                             24
# of strongly conn components                     24
expanded                                          y
mutable                                           y
acceptor                                          n
input deterministic                               n
output deterministic                              n
input/output epsilons                             n
input epsilons                                    n
output epsilons                                   y
input label sorted                                n
output label sorted                               y
weighted                                          y
cyclic                                            n
cyclic at initial state                           n
top sorted                                        y
accessible                                        y
coaccessible                                      y
string                                            n


$ fstinfo a_determinized.fst 
fst type                                          vector
arc type                                          standard
input symbol table                                ia.txt
output symbol table                               oa.txt
# of states                                       1026
# of arcs                                         22054
initial state                                     0
# of final states                                 906
# of input/output epsilons                        0
# of input epsilons                               0
# of output epsilons                              22054
# of accessible states                            1026
# of coaccessible states                          1026
# of connected states                             1026
# of strongly conn components                     1026
expanded                                          y
mutable                                           y
acceptor                                          n
input deterministic                               y
output deterministic                              n
input/output epsilons                             n
input epsilons                                    n
output epsilons                                   y
input label sorted                                y
output label sorted                               y
weighted                                          y
cyclic                                            n
cyclic at initial state                           n
top sorted                                        y
accessible                                        y
coaccessible                                      y
string                                            n

MichaelBerger - 25 Oct 2009 - 14:38

sorry for the cluttered command line output. The point I would like to get across is that the size of the original FST increases by 1000 states and the number of arcs also more than doubles when determinizing it.

Is this normal?

CyrilAllauzen - 02 Nov 2009 - 14:31

I fixed the formating of your original post. I don't see anything very unusual here. It is possible for determinization to significantly increase the size of the Fst.
 
Log In

ArcSort? error with Mutable FSTs

JonathanB - 16 Oct 2009 - 16:45

I am receiving the compile error "no matching function for call to 'ArcSort(fst::StdVectorFst&, fst::StdOLabelCompare)' " when I attempt to compose two mutable FSTs. Although an answer to an older post with the same error stated that making FSTs mutable would resolve this error, I receive the same error with mutable FSTs. I would really appreciate some help!

My code is of the form:

#include "fst/fstlib.h"

using namespace std; using namespace fst;

int main() { StdVectorFst? fst1; StdVectorFst? fst2; StdVectorFst? fst3;

ArcSort? (fst1, StdOLabelCompare? ()); fst3 = ComposeFst? (fst1, fst2);

return 0; }

I posted this question as part of an old topic, but I am reposting it as a new topic because I am not sure whether old topics are still reviewed.

Thank you very much, Jonathan

JonathanB - 16 Oct 2009 - 16:48

I apologize for the poor formatting. The code should read:

#include "fst/fstlib.h"

using namespace std;
using namespace fst;

int main()
{
StdVectorFst fst1;
StdVectorFst fst2;
StdVectorFst fst3;

ArcSort(fst1, StdOLabelCompare());
fst3 = ComposeFst(fst1, fst2);
return 0;
}

PaulDixon - 16 Oct 2009 - 20:27

Changing to the following compiles for me

 
ArcSort(&fst1, StdOLabelCompare());
fst3 = StdComposeFst(fst1, fst2);

 
Log In

do we have a transition that encodes a fail transition?

CamilleChen - 15 Oct 2009 - 15:37

Hi,

I know that an epsilon transition encodes an empty string. However, I'd like to compose two FSA. In case a label exists in one FSA, but not in another, do we have a fail transition, instead of the epsilon transition, can encode this?

Thanks very much!!

CyrilAllauzen - 15 Oct 2009 - 16:00

Hi Camille,

You want to use PhiMatcher (or maybe RhoMatcher, I'm not completely sure from your message). See the matcher documentation here.

Best, Cyril

CamilleChen - 15 Oct 2009 - 17:35

Hi Cyril, Yeah, these matchers are quite useful! Thanks!

Best, Camille

 
Log In

regular expression compiler

FrancoisBarthelemy - 15 Oct 2009 - 05:10

Is there an available regular expression compiler for OpenFst, namely a compiler which compiles a regular expression into an fst?

Thanks

 
Log In

Traverse the FST to check if it accepts a string or not and give appropriate output

VipulMittal - 15 Oct 2009 - 02:09

I have built an FST for a dictionary and I want to traverse it to check if the given word is a valid dictionary entry. If its a valid word, I'll output its feature values. Can someone please help me with how can I traverse a FST ? My Fst size is around 100MB.

Example FST for word : cat

0 1 c c 1 2 a a 2 3 t n,sg 3

VipulMittal - 15 Oct 2009 - 02:15

dont knw why the FST appeared like this. It is: 0 1 c c 1 2 a a 2 3 t n,sg 3

CyrilAllauzen - 15 Oct 2009 - 14:41

What you type is interpreted as twiki markup language. As mentioned above, you would have need to use the verbatim tags.

As for your original question, you want to use Composition.

 
Log In

compute weight automatically

CamilleChen - 14 Oct 2009 - 23:34

Hi,

I have another question:

if we know the total cost of path1 and we know the total cost of path 2 and path 1 is part of path 2 (say path 1 has 1 arc, and path 2 has 2 arcs, and path2's 1st arc label is the same as the path1's arc's label.)

then, is there a way to automatically combine these two paths in a single fst, by keeping path1's cost as the 1st arc's cost, and subtracting the path1's total cost from path2's total cost, and assign it as the cost of the second arc?

thanks!

CamilleChen - 14 Oct 2009 - 23:36

sorry: let me reorganize the condition a little bit:

if we know the total cost of path1

and we know the total cost of path 2

and path 1 is part of path 2 (say path 1 has 1 arc, and path 2 has 2 arcs, and path2's 1st arc label is the same as the path1's arc's label.)

CyrilAllauzen - 15 Oct 2009 - 16:02

Hi Camille,

This is still not very clear to me. Could you clarify further?

Best, Cyril

CamilleChen - 16 Oct 2009 - 19:18

Hi Cyril,

Now I see how to solve my question. Actually I did not know hot to indicate a path's weight when writing this path into a fst format. After I can do that, I can use the commands to do the calculation, and now, I know how to indicate a path's weight, so, the question is solved! smile

Thanks!

Camille

 
Log In

how to remove a weighted FSA's weights?

CamilleChen - 13 Oct 2009 - 21:50

Hi all, Suppose now I have a weighted FSA ready, and I'd like to rewrite it into another FSA by removing all the weights. Can anybody give me some suggestions? Thanks a lot!

RolandSchwarz - 14 Oct 2009 - 03:49

I don't know if there is a built-in function, but you could always try it with fstprint | cut -f 1-4 | fstcompile ... i.e. you print it, extract the fields which you want to keep and then recompile it or redirect it to a new text file.

RolandSchwarz - 14 Oct 2009 - 03:50

btw, I'm still looking for a solution to my problem below...

PaulDixon - 14 Oct 2009 - 19:00

fstmap with map_type set to rmweight can remove the arc and final weights

fstmap --map_type=rmweight

CamilleChen - 14 Oct 2009 - 23:27

both way works!! Thanks so much!!!

 
Log In

Performance issues?

RolandSchwarz - 09 Oct 2009 - 10:07

Hiya,

I'v been experimenting around with the openfst library lately and have the following problem: I'm trying to align sequences using a transducer (i.e. find the edit distance with specifics weights to indels and mismatches). The transducer has 4 states and some epsilon transitions to model indels (no double epsilon transitions tho). The sequences to be aligned are nucleotide sequences (alphabet of size 4) and about 1000 nucleotides long. Composition of two of these sequences with the alignment transducer (as in I o T o O, where I is the input and O the output sequence/acceptor) results in a 400mb fst file and takes approximately 3-4 minutes. Computation of the shortest path again another 2 minutes or so.

Now here's my question. It is somehow obvious, that the number of states explodes in composition (1000 * 4 * 1000 are at least 4 mill states), so maybe direct composition is not the way to go. But what can I honestly do to improve performance? I switched from the shell versions to the C library and tried lazy composition. Now composition is done in an instant, but the shortest path algorithm still seems to expand the whole transducer, as memory usage goes up to 500mb and it still takes about 4 minutes.

Is it simply such that the transducer approach to alignment is slow? Computing 200 of these pairwise alignments using traditional dynamic programming takes about 20 seconds. Or did I make a mistake somewhere? Does the shortest path algorithm neccessarily have to expand the full transducer, i.e. does lazy composition help at all in this case? I didn't even expect it to be competitive in terms of speed to dedicated tools, but I hoped it to be at least computationally feasible. I would really appreciate any kind of help.

Cheers,

Roland

CyrilAllauzen - 15 Oct 2009 - 15:51

Hi Roland,

The issue is indeed that computing the direct composition is not the way to go since it is rather expensive. Your idea of using lazy composition was good but as you noticed, ShortestPath expands the full machine by default. However, there is a way to avoid this behaviour: by using the shortest-first queue discipline and setting the option first_path to true.

ComposeFstOptions opts();
opts.gc_limit=0;  
// This disable the caching of the result of composition
// and should lower the memory usage at no computational cost since
// shortest-path should visit each state in the composed machine at most once.
ComposeFst<Arc> cfst(fst1, fst2, opt);

NaturalShortestFirstQueue<StateId, Weight> state_queue(distance);
ShortestPathOptions<Arc, NaturalShortestFirstQueue<StateId, Weight>,
AnyArcFilter<Arc> >
opts(&state_queue, AnyArcFilter<Arc>());
opts.first_path = true;
ShortestPath(cfst, ofst, &distance, opts);

This example assume composing two machines. In you case fst1 could be I o T (itself delayed or not, since it is a smaller machine both might be worth tryingboth) and fst2 could be O.

This should improve things a lot although it should still be slower than dedicated tools. I am very interested in this, so let me know how things turn out.

Best, Cyril

RolandSchwarz - 16 Oct 2009 - 05:56

Thanks alot Cyril, this is really working now. For some sequences it still takes 10 seconds or so, but memory consumption is low and most sequences are aligned within 1-2 seconds. Very nice. Does the NaturalShortestFirstQueue impose any restrictions on the type of semiring I chose? Or could I use the same strategy if I'm working on the log semiring and would be interested in the shortest distance?

All the best,

Roland

 
Log In

Label pushing - unexpected behavior?

JohnS - 02 Oct 2009 - 08:52

Hello, I have a couple of questions regarding label pushing in OpenFst.
It seems like a relatively common behavior of the operation Push(), in label pushing, is to generate resulting FSTs that are a lot larger than the original FST.
For fairly large FSTs, this (for me unexpected) growth in number of states and arcs, can be so significant that it makes any further processing impossible due to practical memory limitations.

I have attached the following example that produces the same behavior, using a very small FST.

Assume we have an FST A as depicted below:

By label pushing FST A (to final state), e.g. by the command
fstpush --push_labels --to_final A.fst PA.fst,
one might expect the following FST:

But instead the result FST PA.fst is generated, depicted below:


Question 1. In many cases, as is the example, there is an obvious way of performing pushing of the output labels to a final state, without adding any new states to the FST. Can you explain this growth behavior – under what circumstances it is triggered and, possibly, how it can be avoided?

When an FST that has been label pushed, is label pushed again, one might expect it to stabilize. However, Applying Push() to the already Pushed FST A generates the following output.

Question 2. Why is not the OpenFst implementation of label pushing idempotent, (e.g. like weight pushing is) so that the result of Push(A) is identical to result of Push(Push(A)) ?
Thanks!
-John

CyrilAllauzen - 15 Oct 2009 - 16:29

Hi John,

The reason for this is that label-pushing is implemented by considering the input transducer as a weighted automaton over the string semiring and applying weight-pushing (over the string semiring). The result is then the following weighted automaton over the string semiring:

label-pushing.jpg

The issue is how to convert that weighted automaton over the string into a transducer. Currently, we use the second step of the epsilon-normalization algorithm to do this. This is also what the cause the operation to be non idempotent.

One way to partially avoid the problem, would be to first reverse the transducer, apply label-pushing towards the initial state and reverse again:

fstreverse A.fst | fstpush --push_labels | fstreverse > PA.fst

This would result in:

reverse-label-pushing.jpg

The extra epsilon transitions are due to the use of reverse.

Best, Cyril

JohnS - 23 Oct 2009 - 05:48

Hi Cyril,

Thank you for taking your time to answer this question so thoroughly. I assume there exists, at least theoretically, a generic pushing algorithm to accomplish idempotent and “non-growing” label pushing. Do you agree on this? And if so, would you endorse such an algorithm to be included in any future version of OpenFST?

Best, John

 
Log In

Problem with Determinize and Minimize

VipulMittal - 01 Oct 2009 - 02:17

Hi, I want to build an FST for the words of a language where if the word is accepted, the output will be the features of the word. For this I'm using two FSTs - one for the prefix (root) part and other for the suffix - and I concatenate them to get the fina FST. Both the input files are in AT&T format.

But the problem is when I determinize it, it gives an error that the FST is non-functional i.e. for a particulat input there are more than one possible outputs, which will be there in this case.

So can someone please help me with this ??

Thank you, Vipul

DoganCan - 01 Oct 2009 - 19:55

Current version of OpenFst library does not support determinization of non-functional transducers. You can take the "encode -> determinize -> decode" path to determinize your final fst over (input label,output label) pairs. The result will be non-deterministic over input labels but may satisfy your current needs. Check the answer to the "det(L o G) becomes more complex than L o G" question below for a pointer to the encode/decode mechanism.

Best

Dogan Can

 
Log In

Constrains for the maximum size of the FSTs?

OanaNicolov - 30 Sep 2009 - 19:53

Hi,

Could you please tell me an estimate of the maximum size (in terms of #states and/or #transitions) of the FSTs, the openfst toolkit can work with (let's say on a machine with 8G of memory)? Alternatively, what was the size of your largest FST you worked with?

Thank you. Oana

DoganCan - 01 Oct 2009 - 20:14

Hi Oana,

OpenFst is a very efficient library which scales well to large problems. The answer to your question depends on the properties of the fsts and what you would like to do with them. For instance, you may run the shortest path and pruning algorithms over very large fsts (several millions of states and transitions) while you may not be able to determinize them or compose them with other large machines. What you can do also depends on whether your input fsts are acyclic, epsilon free etc. Using lazy implementations may help if your particular task does nor require complete fsts to be loaded onto the memory upfront. In short, (in my opinion) your best bet is to try and see.

Best,

Dogan

CyrilAllauzen - 15 Oct 2009 - 14:58

Hi Oana,

There are two main concrete class in the library ConstFst and VectorFst.

For ConstFst, the memory requirement is 20 bytes per state. and 16 bytes per arc. For VectorFst, the per-state memory requirement is about twice as much due to the overhead introduced by STL data structures.

Doing the math, you will see that you can easily represent machines with hundred of millions of states and transitions in a few GB of RAM.

If memory usage is critical, the CompactFst class can also be used to provide more efficient memory representations.

This addresses the memory requirement for representing an FST in memory. But, as Dogan pointed out, your memory usage will also depends on which algorithms you want to use.

Best, Cyril

 
Log In

RmEpsilon? , AutoQueue? and Queue Disciplines

NewestUser - 11 Sep 2009 - 11:51

Hi, I would like to know how 'reliable' the AutoQueue associated with RmEpsilon is; that is, how effective is this at selecting the optimal queue discipline given a particular input fst.

Also, are there any examples of employing specific queue disciplines with RmEpsilon? I've not been able to get this to work in practice.

NewestUser - 11 Sep 2009 - 23:59

Nevermind the second part of the above question, I was looking in the wrong place.
 
Log In

a stand in symbol for any symbol; printing out the strings for FSA

AnasElghafari - 28 Aug 2009 - 12:32

Hi there,

Thank you for this great library; I am using it for my B.A. thesis project in C.L. I am using it through the shell from within my Java code.

I have a bit of difficulty with a few things. This is one of them:

- Is there a compact way to specify "any other symbol". As in: in state 0 transduce "a" to "b" and transduce any other symbol to "c"?

The problem specifically is that I am creating acceptors where the symbols are words rather than letters, in some cases I might have 1000 words in my isymbols list. I want to build a rational kernel (transducers to count sequences) to work with these acceptors. At some states I need to be able to map every non-explicitly-mentioned input symbol to epsilon with weight 1. How can I specify that? Do I really need to add a thousand arcs for each state?

AnasElghafari - 28 Aug 2009 - 12:34

The second question (I included it in the title but forgot to include it in the body): is there a way to obtain/print out all the string accepted by some FSA ? (in the case where FSA accepts a finite set, of course)

CyrilAllauzen - 01 Sep 2009 - 11:46

For your first question, you can use RhoMatcher. See the advanced usage section.

For your second question, you will need to write your own function for this but it should be rather easy. Remember that the number of strings accepted by an acyclic automaton is in the worst case exponential in the size of the automaton.

Best, Cyril

 
Log In

det(L o G) becomes more complex than L o G

ZhijianOu - 25 Aug 2009 - 03:45

The experiment setup is as follows.
A Chinese lexicon L:
# of states                                       215371
# of arcs                                         303286

A Chinese bigram G:
# of states                                        87918
# of arcs                                       16822088

Run fstcompose to get
L o G 
# of states                                       391202
# of arcs                                       17125372

After determinization by running fstdeterminize,
det(L o G)
# of states                                      3479987
# of arcs                                       20071667
more complex !

I don't know why. I follows exactly the paper "Mehryar Mohri, Fernando C. N. Pereira, and Michael Riley. Speech Recognition with Weighted Finite-State Transducers. In Handbook on Speech Processing and Speech Communication, 2008". The output word label in L is already placed on the initial transitions of the word.

The code is as follows:

fstarcsort.exe --sort_type=olabel  L.fst L.fst.sort
fstarcsort.exe --sort_type=ilabel  G.fst G.fst.sort
fstcompose.exe L.fst.sort G.fst.sort LG.fst
fstdeterminize.exe LG.fstb LG.fst.det
fstminimize.exe LG.fst.det LG.fst.min
Memory allocation failed

I run 64-bit openFST on windows 2003 server 64bit edition. The fstminimize.exe returns "Memory allocation failed" is a surprise. 64bit memory space is not sufficient?

Any suggestion that you could offer would be much appreciated.

Best, Zhijian

CyrilAllauzen - 30 Sep 2009 - 12:43

I'm not sure what the issue is. You might want to try minimizing the transducer as an acceptor: use fstencode to encode the labels, apply minimization and use fstencode to decode (see FST.EncodeDecodeDoc for more details).

Best, Cyril

 
Log In

How to deal with < / s > label in constructing ngram model

ZhijianOu - 23 Aug 2009 - 23:08

Dear all,

As mentioned in the paper "Allauzen, Mohri and Roark, Generalized algorithms for constructing statistical language models, ACL 2003." , in constructing a ngram model, there are transitions labeled with the end symbol < / s >, that lead to the final state.

To compose L with G, I need to add a psuedo word as follows to the L.

r eh d #0 read
r eh d #1 read
...
#0 </s>
 

Am I correct?

(btw: if I replace the < / s > in G with < eps >, the G is not deterministic. This is not a good way. So I propose the above way.)

Thanks for your patience. Your help is greately appreciated.

Best, Zhijian

CyrilAllauzen - 01 Sep 2009 - 12:09

As far as I remember, we were using </s> in G and adding </s> as a word with empty pronunciation (or silence) in L. The GRM library also allowed replace </s> in G by a final weight.

Best, Cyril

 
Log In

Switching weight pushing in the log semiring and in the tropical semiring

ZhijianOu - 23 Aug 2009 - 22:51

dear all,

I built a transducer of a trigram, following the paper "Mehryar Mohri, Fernando C. N. Pereira, and Michael Riley. Speech Recognition with Weighted Finite-State Transducers. In Handbook on Speech Processing and Speech Communication, 2008"

As we note, after we introduce the backoff state, the G is only an approximate representation, if there are such seen trigram that have lower probability than its backed-off bigram, i.e. p(z|x,y) < alph(x,y)*p(z|y). To my viewpoints, there are two solutions.

1) One is to splitting states before replacing failure-transition by eps-transition, as proposed in "Allauzen, Mohri and Roark, Generalized algorithms for constructing statistical language models, ACL 2003."

2) The other, as I think, is to create a backoff model from an interpolated algorithm (e.g. modified kn smoothing). Then we have p(z|x,y) > alph(x,y)*p(z|y) for all trigrams. And if we further work in the tropical semiring, the representation will be exact.

So I chose to construct G in the tropical semiring. Now the question comes. As we know, weight pushing in the tropical semiring is inferior to pusing in the log semiring. I'm not sure whether the following procedure is correct, and if it's correct, how to do it? The semiring info is embeded in the arc type. If I use shell command, how can I change the arc type?

1) Switch from representing G (or L o G) in the tropical semiring to representing in the log semiring

2) run fstminimize to get min(L o G)

3) Swtich back to representing min(L o G) in the tropical semiring.

Thanks for your patience. Your help is greately appreciated.

Best, Zhijian

DoganCan - 24 Aug 2009 - 21:08

Current shell-level map command (fstmap) does not support mapping between std and log arcs. A simple workaround is to print the fst and compile the text representation with the appropriate arc type. On the other hand, C++ library includes mappers for conversion between std and log arcs. Following code might help:
VectorFst<StdArc> ifst;
VectorFst<StdArc> ofst;
VectorFst<LogArc> lfst;

Map(ifst,&lfst,StdToLogMapper());
Minimize(&lfst);
Map(lfst,&ofst,LogToStdMapper());

Best, Dogan

ZhijianOu - 25 Aug 2009 - 03:35

Hi, thanks to Dogan.

Sorry for the above lengthy question. To summarize, the main question is: when both lexicon L and grammar G are constructed in tropical semiring. Compared with directly run fstminimize.exe (i.e. in tropical semiring), is it better to switch to log semiring, run fstminimize.exe, and then switch back ?

Best, Zhijian

 
Log In

Compiling openfst-1.1 on cygwin

HenrikKinnemann - 11 Aug 2009 - 03:28

Hi,

is there anybody who has compiled openfst-1.1 (latest version) on Cygwin successfully? Which "3rd-party" libs need to be available on Cygwin in advance?

Best regards, Henrik

PaulDixon - 11 Aug 2009 - 04:08

I have compiled it in cygwin with the gcc-4 package and without any 3rd party libraries ./configure CXX=g++-4.exe CC=gcc-4.exe make install make
 
Log In

build 64bit openFST

ZhijianOu - 06 Aug 2009 - 10:19

hi,

I tried to run fstcompile.exe on a transducer with 23M states and 41M arcs, from AT&T FSM format text file. It terminates, saying that "Memory allocation failed". The 2G address space is not sufficent for compiling such a large transducer. Does the current openFST code support 64bit compile? Could you give me some instructions to build a 64bit openFST?

It is very appreciated that you could favor me with a reply. Thank you.

Zhijian

PaulDixon - 06 Aug 2009 - 23:10

Are you running the Windows version? If so, I can try add a 64bit profile to the Visual Studio solution

ZhijianOu - 14 Aug 2009 - 00:03

Hi, PaulDixon, I have installed Windows server 2003 Enterprise x64 edition.

It's very kind of you if you can add a 64bit profile to the Visual Studio solution.

Best regards, Zhijian

PaulDixon - 17 Aug 2009 - 08:47

Hi I added an updated solution to the contrib section. To build 64bit select the Release64 x64 in your build toolbar. If you are running VS express edition you might also need the platform SDK to get the 64 bit compiler.

ZhijianOu - 23 Aug 2009 - 22:53

PaulDixon, It works! Thank you very much. Best, Zhijian
 
Log In

valgrind complaints with Replace on openfst-1.1

GonzaloIglesias - 02 Aug 2009 - 13:36

Hi again, I have discovered that valgrind starts complaining with the openfst library with many operations, but only after a replace operation is performed. This didn't happen if compiled using openfst beta. I'm using valgrind 3.2.3. Here goes a small example:



    //Building 3 trivial fsts for fst replacement                                                                                                            
    VectorFst<StdArc> fst1,fst2,fst3;
    fst1.AddState();
    fst1.AddState();
    fst1.AddArc(0, StdArc(0, 123456789, StdArc::Weight::One(), 1));
    fst1.SetFinal(1,StdArc::Weight::One());
    fst1.SetStart(0);
    fst2.AddState();
    fst2.AddState();
    fst2.AddArc(0, StdArc(0, 1234567890, StdArc::Weight::One(), 1));
    fst2.SetFinal(1,StdArc::Weight::One());
    fst2.SetStart(0);
    fst3.AddState();
    fst3.AddState();
    fst3.AddState();
    fst3.SetFinal(2,StdArc::Weight::One());
    fst3.SetStart(0);
    fst3.AddArc(0,StdArc(0, 1000, StdArc::Weight::One(), 1));
    fst3.AddArc(1,StdArc(0, 2000, StdArc::Weight::One(), 2));
    vector< pair< StdArc::Label, const Fst<StdArc> * > > pairlabelfsts;
    pairlabelfsts.push_back(pair< StdArc::Label, const Fst<StdArc> * >(1000,&fst1) );
    pairlabelfsts.push_back(pair< StdArc::Label, const Fst<StdArc> * >(2000,&fst2) );
    pairlabelfsts.push_back(pair< StdArc::Label, const Fst<StdArc> * >(3000,&fst3) );
    VectorFst<StdArc> rulefst;
    StdArc::Label slabel=3000;

#ifndef USEOPENFSTBETA
    rulefst=ReplaceFst<StdArc>(pairlabelfsts,ReplaceFstOptions<StdArc>(slabel,false));
#else
    rulefst=ReplaceFst<StdArc>(pairlabelfsts,ReplaceFstOptions(slabel,false));
#endif
    RmEpsilon(&rulefst); //valgrind complains here
    rulefst.Write(some.fst); // and valgrind complains here too
    return 0;
}

(I may submit a link to a full example code)

Cheers!

CyrilAllauzen - 11 Aug 2009 - 10:09

Hi Gonzalo,

Thanks for pointing this out! I'll look more deeply into this.

Best, Cyril

 
Log In

very slow delayed union on openfst 1.1

GonzaloIglesias - 02 Aug 2009 - 11:45

Hi,

Thanks for such a splendid tool as openfst! I'm a very happy user of the beta version for more than a year now.

I'm writing here because I have encountered a specific problem with the delayed union operation using openfst-1.1. library. Basically, I perform a delayed union of many fsts. Afterwards, I update by reinstancing to a VectorFst . I have isolated a particular case in which the version compiled with openfst-beta works things out very quickly (less than one minute), whilst it takes 40 minutes on openfst 1.1 to create the same fst. I have a small but full example including the necessary fsts to read from (~6MB) and perform union with. I don't understand this behaviour and it seems to me something could be wrong within version 1.1

Perhaps this is a known issue? May I post a link to a tar file containing the example? Please let me know how should I proceed.

Thanks in advance!

CyrilAllauzen - 11 Aug 2009 - 10:08

Hi Gonzalo,

I've started to look into this and I think I found the reason for this inefficiency. We'll fix this for the next release.

Thanks a lot for your feedback!

Best, Cyril

CyrilAllauzen - 13 Aug 2009 - 16:11

Update: there is actually no efficiency issue for delayed union. The issue is that Gonzalo was using a sub-optimal approach. The most efficient way of performing a large number of delayed union is to use Union(RationalFst<A> *fst1, const Fst<A> &fst2) as follows:
UnionFst<A> *union = new UnionFst<A>(fsts[0], fsts[1]);
for (int i = 2; i < fsts.size(); ++i)
  Union(union, fsts[i]);
This takes advantage of the fact that UnionFst derives from RationalFst and that delayed rational operations such as union can be applied destructively to a RationalFst.

Cyril

 
Log In

Transforming a collection of strings

MadhuTherani - 29 Jul 2009 - 14:35

I am trying to transduce a set of strings (defined by a Regex) into one final string..I have an FST defining the Regex and another FST defining the final string.. I need the second FST to match to any output label of the first FST in the composition fo the two FSTs..Can anyone suggest how to go about it..? Sigma matcher in the composition.? thanks

madhu

DoganCan - 30 Jul 2009 - 15:47

Depending on what you mean by "I need the second FST to match to any output label of the first FST in the composition of the two FSTs", there might be many different solutions to your problem.

Say you have the string "x y z" in the set defined by the regex and the final string is "a b".

If you want the transduction "x y z" -> "a b a b a b", one possible solution is to create a simple transducer T in the form:

0 1 X a
1 2 <eps> b
2 0 <eps> <eps>
2
where X is any symbol in your input alphabet (You need to add a transition in the form "0 1 X a" for each X in your alphabet) and compose the regex with T.


If you want the transduction "x y z" -> "a b" instead, then one possible solution is to create a transducer T in the form:

0 1 X <eps>
1 0 <eps> <eps>
1 2 <eps> a
2 3 <eps> b
3
where X is any symbol in your input alphabet and compose the regex with T.

You might achieve the same behaviour using sigma matcher in composition (At least I believe it is possible).

You might even try (for "x y z" -> "a b"):

  • replace output labels of the regex with <eps> labels
  • replace input labels of the final string with <eps> labels
  • concat two transducers
  • syncronize
  • rmepsilon

Best, Dogan

 
Log In

AT&T FsmLib? vs openFST for exe ussage

ZhijianOu - 29 Jul 2009 - 11:20

hi,

1) How to easily get the command-line help in openFST's shell-level exe? You know, in FsmLib, an argument "-?" will print help information.

2) If exe is sufficient for my application, which one do you recommend to use? In particular, I have a special emphasis on the performance of acceptor optimization operations, e.g., determinize, minimize, remove-epsilon. As we see, minimizing a transducer is not supported at the command level.

Thanks in advance.

Best regards, ZhijianOu

PaulDixon - 29 Jul 2009 - 23:41

1) fstcommand -help
 
Log In

What semiring is the default in fstminimize, fstdeterminize ?

ZhijianOu - 29 Jul 2009 - 11:07

hi,

I'd like you to point out whether my following understandings are correct.

1) If the input is an acceptor, the implementation of fstminimize is first do weight pushing, and then do (unweighted) acceptor minimization.

2) If 1) is correct, then using different semirings will have different weight pushing results, and different minimization results.

3) In the description of fstminimize in openFST (or fsmminimize in fsmLib), there is no mention of the default semiring used. What semiring is the default?

4) For a deterministic transducer, P(I, x, y, F) contains only one path, using the notation in the paper - Mehryar Mohri, Fernando C. N. Pereira, and Michael Riley. Speech Recognition with Weighted Finite-State Transducers. In Handbook on Speech Processing and Speech Communication, Springer-Verlag, 2008.

5) fstdeterminize depends on the semiring ?

6) If we create binary fst from text file using fstcompile or fsmcompile, where to specify the semiring ?

It is very appreciated that you could favor me with a reply. Thank you.

Best regards, ZhijianOu

PaulDixon - 29 Jul 2009 - 20:42

6) For fstcompile use the arc_type switch fstcompile --arc_type=log test.txt > test.fst

For fsmcompile use the -s switch fsmcompile -t -s log test.txt > test.fst

DoganCan - 30 Jul 2009 - 16:22

  1. Correct
  2. Correct
  3. No default semiring. Operation is performed in the semiring given by the arc type of the input fst.
  4. Correct
  5. Yes. Say there are distinct paths with the same labels in the input fst.
    • Tropical semiring: minimum of the path weights is assigned to the output path.
    • Log semiring: sum of the path weights is assigned to the output path.

ZhijianOu - 30 Jul 2009 - 22:04

Thank PaulDixon and DoganCan.

The -s switch for fsmcompile is to give the states textual names.

The output from "fsmcompile -?" is
FSM Version 3.7
Copyright (C) 1998 AT&T Corp. All rights reserved.
Usage: fsmcompile [-opts] [fsm]
Options:
 -i symbols       input arc symbols
 -o symbols       output arc symbols
 -s symbols       state symbols
 -t               transducer
 -V file          input potentials file
 -F file          output FSM file
 -?               info/options

Continue with my question 3. 
In openFST, we can use --arc_type to specify the semiring. But we can run fstcompile without --arc_type, such as
fstcompile --isymbols=isyms.txt --osymbols=osyms.txt text.fst binary.fst
The default --arc_type is "standard", which means tropical semiring. Am I right?

But in fsmLib, if we run fsmcompile as follow:
fsmcompile -t -i i.syms -o o.syms < T.txt > T.fst
It is not clear what semiring is used.

Thanks for your attention.

Best, ZhijianOu

PaulDixon - 30 Jul 2009 - 23:06

It seems the semiring option was added to fsmtoolkit 4.0.

Yes, standard means tropical semiring (Explained in previous post "Why_is_TropicalWeight_considered_standard?")

 
Log In

problem with compiling openfst on Windows with Visual Studio 2008

ZhijianOu - 27 Jul 2009 - 10:49

hi,

As said in the openfstwin-1.1.zip's readme.txt, I open the solution file openfstwin.sln in Visual Studio 2008 and compile. But there are errors as follows:

1>Compiling... 1>fst.cc 1>d:\openfstwin-1.1\openfst-1.1\src\include\fst\unordered_set(19) : fatal error C1083: Cannot open include file: 'unordered_set': No such file or directory
1>properties.cc
1>d:\openfstwin-1.1\openfst-1.1\src\include\fst\unordered_map(19) : fatal error C1083: Cannot open include file: 'unordered_map': No such file or directory
1>Generating Code...
1>Build log was saved at "file://d:\openfstwin-1.1\openfstwin\openfstwinlib\Debug\BuildLog.htm"
1>openfstwinlib - 2 error(s), 0 warning(s)

It is very appreciated that you could favor me with a reply. Thank you.

Best regards, ZhijianOu

CyrilAllauzen - 27 Jul 2009 - 12:31

Hi Zhijian,

First, I'll make sure that you are using Visual Studio 2008 Service Pack 1 as mentioned here.

This looks like a missing TR1 issue. You might not have a working version of TR1 currently installed. As far as I know (I have never used VS2008), an implementation TR1 is included with SP1.

Best, Cyril

ZhijianOu - 29 Jul 2009 - 11:03

Hi, Cyril,

It works. Thank you. After installing VS2008 Service Pack1, I compiled openfstwin-1.1 successfully, though still with warnings, either warning C4396 or warning C4800. For example,

d:\openfstwin-1.1\openfst-1.1\src\include\fst/pair-weight.h(195) : warning C4396: 'fst::operator >>' : the inline specifier cannot be used when a friend declaration refers to a specialization of a function template
d:\openfstwin-1.1\openfst-1.1\src\include\fst/matcher.h(317) : warning C4800: 'uint64' : forcing value to bool 'true' or 'false' (performance warning)

Best regards, ZhijianOu

PaulDixon - 29 Jul 2009 - 20:40

changing fst.Properties(kAcceptor, true) to fst.Properties(kAcceptor, true)!=0 Should stop the C4800 warning.

If you think any of the warnings are bogus you can the warning number to the pragma statement at the top of config.h

HenrikKinnemann - 25 Aug 2009 - 05:02

Hello Paul,

my build of OpenFST-1.1 with Visual Studio 2008 SP1 was successful.

Now I would like to compile/link an example test application which is shipped together with OpenFST in file 'openfst-1.1\src\test\fst_test.cc'.

Visual Studio reports the following compile errors for fst_tes.cc:

1>------ Rebuild All started: Project: TestFst, Configuration: Debug Win32 ------ 1>Deleting intermediate and output files for project 'TestFst', configuration 'Debug|Win32' 1>Compiling... 1>stdafx.cpp 1>Compiling... 1>TestFst.cpp 1>d:\usr\local\include\fst\test-properties.h(24) : error C3083: 'tr1': the symbol to the left of a '::' must be a type 1>d:\usr\local\include\fst\test-properties.h(24) : error C2039: 'unordered_set' : is not a member of 'std' 1>d:\usr\local\include\fst\test-properties.h(24) : error C2873: 'unordered_set' : symbol cannot be used in a using-declaration 1>TestFst - 3 error(s), 1 warning(s)

What do I have to do in order to use 'unordered_set'?

Thanks in advance, Henrik

RolandZimbel - 01 Feb 2010 - 02:31

Hi Paul, I had a problem to compile OpenFst with VS2008 SP1: In fst/compat.h a WIN32 macro is used in the beginning, but it is defined later on by including fst/config.h. The compilation works when I include fst/config.h before using WIN32. As the windows compiler defines the macro _WIN32 I would propose to use this macro to get code that works on all platforms.
 
Log In

OpenFst? and AT&T FSM support UTF8 or Unicode Encoding?

AbbasMalik - 24 Jul 2009 - 15:23

Dear All I have question regarding OpenFst and AT&T FSM.

Do OpenFst and AT&T FSM support UTF8 or Unicode Encoding?

Thank you in advance,

Best

CyrilAllauzen - 24 Jul 2009 - 16:45

Hi Abbas,

This question was discussed previously in the forum: link.

Let me know if that answer your questions.

Best, Cyril

 
Log In

Pattern Matching using Mohri 1997 algorithm

PhilSours - 16 Jul 2009 - 16:58

Hi,

I'm interested in pattern matching, i.e. finding occurrences of a set of strings within a text.

M. Mohri [1997] has described an algorithm to generate an efficient pattern search automaton based on failure functions in this paper:

Mohri, Mehryar. String-Matching with Automata. Nordic Journal of Computing, 4(2):217-231, Summer 1997. http://www.cs.nyu.edu/~mohri/postscript/njc.ps

This method has also been referred to by W. Skut [FSMNLP 2008].

Is there anyone who has submitted (or could submit) code implementing this algorithm that would be free for commercial use?

Thank you, Phil Sours

NewestUser - 18 Jul 2009 - 11:11

it's not all that clear what you are aiming to do, but there are quite a few generic string matching, or regular expression matching solutions out there. the following link is to a great tutorial on the subject, which presents a well known solution which is quite easy to implement using the openfst library,

http://swtch.com/~rsc/regexp/regexp1.html

although it is very simple to implement, this algorithm generates transducers which are, in some cases, not terribly well-suited to further optimization or post-processing, especially epsilon removal. the following paper provides an overview of some more efficient ( but more complex ) alternatives which may be better suited to general or more complex inputs,

www.cs.nyu.edu/~mohri/postscript/glush.pdf

dunno if that is what you are looking for, and perhaps it's overkill, but depending on what you mean by 'match strings in a text' it seems like openfst itself might be overkill.

CyrilAllauzen - 20 Jul 2009 - 15:09

Hi Phil,

A version of this algorithm is implemented in the GRM library. The binaries are available for free for non-commercial use, no source code however.

Cyril

 
Log In

Pruning without changing state ID's.

DoganCan - 12 Jul 2009 - 13:32

Hi,

I need to prune an fst while keeping the state ID's unchanged. I need this behaviour since I keep auxiliary information (i.e potentials) about the states of the input fst. Is there a convenient way of doing this?

Thanks, Dogan Can

NewestUser - 12 Jul 2009 - 21:08

can you use the 'keep_state_numbering' flag associated with fstcompile? i have not tested it in this particular situation, but it seems like it might suffice.

DoganCan - 13 Jul 2009 - 16:55

'keep_state_numbering' flag only works when compiling a binary fst from its text representation. it does not help when an fst operation like pruning is applied on that same fst.

CyrilAllauzen - 20 Jul 2009 - 15:19

Unfortunately, there is no way to do that in the current version of the library. A workaround would be "encode" the origin state of each transition in its labels.

Cyril

 
Log In

User-defined Flags

DoganCan - 12 Jul 2009 - 07:01

Hi,

I noticed the following comment in the advanced usage section.

In a user-defined binary, the command line options processing will all also work if the user calls:

SetFlags(usage, &argc, &argv, true);
In that case, the user can set his own flags as well, following the conventions in <fst/flags.h>.

My problem is that when I add a new flag to a custom fst binary, I need to put its definition along with other flag definitions in <src/bin/main.cc> or in <src/lib/flags.cc> and rebuild the library, otherwise SetFlags method does not recognize it.

I have the generic OpenFst binary setup:

fstbinary.cc -- includes binary-main.h, calls SetFlags and Arc templated BinaryMain function

binary-main.h -- includes <fst/main.h>, declares a new flag using DECLARE_type(newflag) and defines BinaryMain function

Defining the new flag using DEFINE_type(newflag,default,help_message) in fstbinary.cc or binary-main.h allows me to use the newflag with its default value, however i cannot pass it from the command line.

Is it possible to to define and use new flags in a user-defined binary without modifying the library source code? If so, could you provide an example?

Many thanks, Dogan Can

PaulDixon - 13 Jul 2009 - 06:36

I've managed to add and use flags without recompiling everything. No 100% sure if this does what you want as it is very similar to what you already tried, but here are minimalistic header and cpp files which allow the value of myflag to be changed. Compiled and tested on a standard OpenFst 1.1 install under Linux using g++ fsttest.cpp -lfstmain -lfst -ldl -o fsttest

#include "test-main.h"

namespace fst 
{
        REGISTER_FST_MAIN(TestMain, StdArc);    
        REGISTER_FST_MAIN(TestMain, LogArc);
}  

DEFINE_double(myflag, 1234, "My Flag");
using fst::CallFstMain;
int main(int argc, char **argv) 
{
  string usage = "Test flags \n\n  Usage: ";
  usage += argv[0];
  usage += "  Flags: myflag\n";
  std::set_new_handler(FailedNewHandler);
  SetFlags(usage.c_str(), &argc, &argv, true);
  if (argc !=1)
  {
    ShowUsage();
    return 1;
  }
  return CallFstMain("TestMain", argc, argv, FLAGS_arc_type);
}

#ifndef TEST_MAIN_H__
#define TEST_MAIN_H__

#include <fst/main.h>
DECLARE_double(myflag);
DECLARE_string(arc_type);

namespace fst 
{

   template <class Arc>
   int TestMain(int argc, char **argv, istream&, const FstReadOptions&)
   {
     
      cerr << "myflag=" << FLAGS_myflag << endl;
        return 0;
   }
}  
#endif  

DoganCan - 13 Jul 2009 - 17:32

Hi Paul,

This is exactly what I tried before. Seems like there is a problem with my build settings. Thanks for your help.

DoganCan - 13 Jul 2009 - 19:21

Hi,

After going through my build settings, I discovered that GCC_SYMBOLS_PRIVATE_EXTERN compiler flag was set to YES in default Release settings of XCode, and changing it to NO solved the problem. This flag sets symbol visibility and I suppose it makes flag definitions invisible to the flagregister.

 
Log In

auxiliary symbols and a speech recognition cascade

NewestUser - 10 Jul 2009 - 04:35

Hi, I'm trying to understand the proper usage of auxiliary symbols in the construction of a recognition cascade using.

unfortunately, although I've now read through,

  • "Speech Recognition with Weighted Finite State Transducers",
  • "A Generalized Construction of Integrated Speech Recognition Transducers",
  • "Generalized Optimization Algorithm for Speech Recognition Transducers", and
  • "Integrated Context-Dependent Networks in Very Large Vocabulary Speech Recognition",

this one point still eludes me, and does not seem to be described or illustrated in any of these papers ( or any others that I was able to find ). Almost every other aspect of the transducer construction and cascade composition seems to be very clearly explained and usually illustrated with simple examples, but the trick with auxiliary phones and composition continues to elude me.

At present my approach consists of the following steps,

  • construct G, a grammar transducer, which in my case happens to be determinizable. thus i perform no augmentation.
  • construct L, the closure of a lexicon transducer derived from a pronunciation dictionary.
    • i automatically supplement this transducer with auxiliary symbols/phones as necessary, as described in the papers mentioned above
    • e.g., red r eh d #1, read r eh d #2, etc.
  • construct C, an inverted, context-dependent, deterministic triphone transducer.
    • at present i am generating this transducer from a phoneme list which also includes the auxiliary symbols/phones so as to guarantee proper composition. i generate all possible logical triphones and rely on the composition process to weed out the superfluous combinations.
  • perform the composition and optimization of the cascade ( i am not dealing with the H-level at present )
    •  min( det( *C* o det( *L* o *G* ) ) ) 
  • replace all auxiliary triphones with the empty string, thus
     eh+d-#1 
    will be replaced with '-'

there are clearly a couple of problems with the last stage of my process. first, each auxiliary symbol at the context-independent level is being mapped to multiple different different auxiliary symbols at the context-dependent level. nevertheless the literature implies that this should not in fact be the case,

For further determinizations at the context-dependent phone level and distribution level, each auxiliary phone must be mapped to a distinct context-dependent phone. Thus, self-loops are added at each state of C mapping each auxiliary phone to a new auxiliary context-dependent phone. The augmented context-dependency transducer is denoted by C.

so i suppose that the mapping should be one-to-one, rather that one-to-many, which is what my current approach yields.

another clearly negative consequence of my current botched approach is that the final auxiliary-symbol removal step results in lost context wherever such auxiliary symbols appear. for example, the following transducer

0 1 SIL+r-eh red
1 2 r+eh-d -
2 3 eh+d-#2 -
3 4 d+#2-b -
4 5 #2+b-ah ball
5 6 b+ah-l -
...
...

defaults to,

0 1 SIL+r-eh red
1 2 r+eh-d -
2 3 - -
3 4 - -
4 5 - ball
5 6 b+ah-l -
...
...

which, although i can actually use it for recognition, is obviously wrong. so finally my question: are there any other papers which provide either diagrams of an augmented component context-dependent triphone transducer, or a more detailed description of the I(M) operation described in "Generalized Optimization...", For any transducer M, we also denote by I(M) the transducer augmented with extra transitions such that each new symbol on its input side is mapped to some new and distinct output symbol.

try as I might still don't quite get this. sorry if this seems a foolish question but im teaching this to myself and dont really know where else i might ask.

NewestUser - 10 Jul 2009 - 04:42

another, perhaps more obvious way of asking this might be, what exactly is a self-loop in the context of this discussion, and what does it look like in practice?

NewestUser - 11 Jul 2009 - 10:34

I believe I figured out a more appropriate solution, which seems to work, but perhaps Im still missing something,

The most trivial, canonical example of a deterministic context-dependent triphone transducer.

The same example, inverted, and augmented with 2 auxiliary phones, #0, and #1

Assuming we have an example grammar transducer of the following form,

0 1 a testa
1 2 b -
2 3 #0 -
3 4 a testb
4 5 b -
5 6 #1 -
6

Then composition with the CD transducer yields,

where the auxiliary triphone symbols can now be safely replaced with epsilons, or short-pauses, or removed without any loss of information.

CyrilAllauzen - 14 Jul 2009 - 18:33

This is indeed the correct construction. In this context, a self-loop is a transition whose destination state is the origin state.

NewestUser - 14 Jul 2009 - 23:25

thanks for the confirmation! i think there may also be at least one more valid construction, this involves augmenting the monophone symbols as needed with auxiliary symbols, then treating these as normal monophones during construction of the context-dependency transducer.

so, in contrast to the above examples where we have a list of monophones which includes auxiliary phones, 'a, b, #0, #1', which are all treated the same. and further in contrast to the self-loop approach, where we have two separate classes of symbols, monophones: 'a, b', and aux. symbols: '#0, #1' we instead of a selective combination depending on the structure of the lexicon transducer, e.g., monophones: 'a#0, b, a#1' which are all treated as normal monophones. this approach has an advantage in that there is no need to treat anything specially, and at the C-level ( assuming we've no need to worry about anything further down in the cascade ), there is no loss of information. instead of 'eps-a+b, #0c:, a-b+a' we get 'epse-a#0-b, a#0-b+a#1'. and instead of epsilon replacement, we have a simple symbol replacement.

this actually seems like a reasonable solution so long as, a. the degree of homophony is small, and b. there is no need to interact with lower levels in the cascade. if either a. or b. do not hold this alternative rapidly becomes intractable due to the increase in monophones and the increase in the size of the resulting C-level transducer.

CyrilAllauzen - 20 Jul 2009 - 15:16

This alternative approach would also work indeed.

However, I want to point out that the first approach does not really require to physically add the self-loops and can be achieved without increasing the size of C by using custom matchers? in composition.

 
Log In

Converting a "linearized transducer" into a true transducer FST

KennethRBeesley - 09 Jul 2009 - 12:17

Assume that you have an OpenFst Acceptor in which each string of the encoded Language has an even number of symbols, e.g.

a b c d e f

in which each even-numbered symbol (counting from 0) represents an Input symbol and each odd-numbered symbol represents an Output symbol. (This is the "linearized transducer".)

Question: Is there an easy way to turn this Acceptor (a kind of linearized transducer) into a real Transducer where each path like the one above is converted into a path of half the length, with these labels:

a:b c:d e:f

CyrilAllauzen - 20 Jul 2009 - 15:04

I would do it in two steps:

1) Use composition to turn a b c d e f into a:eps eps:b c:eps eps:d. This can easily be done using two two-state transducers T_1 and T_2 and computing T_1 o A o T_2. T_2 is the transducer erasing any symbol in odd position: for all a in the alphabet,

0 1 a eps
1 0 a a
1

T_1 is the inverse of the reverse of T_2

2) Apply Synchronize() followed by RmEpsilon().

Cyril

KennethRBeesley - 17 Aug 2010 - 12:27

Thanks for this solution. In the general case, for an arbitrary Fst, is there a way to test if Synchronize() will terminate?
 
Log In

A few small bugs

DoganCan - 08 Jul 2009 - 09:08

Hi,

I noticed two small bugs/problems in the library and here are my fixes for them.

1. fstcompile does not give any error message when no input file is specified.

Solution: change line 00218 in <compile-main.h>

< if (!istrm) { 
> if (!*istrm) { 

2. <lexicographic-weight.h> defines pair weight type using "_<_" symbol between component weight types. However "<" symbol creates problems say when you want to register LexicographicArc for fst binaries since you can neither supply type1_<_type2 as an arc type nor create a dynamic shared object named type1_<_type2-arc.so.

Solution: change line 00075 in <lexicographic-weight.h>

< static const string type = W1::Type() + "_<_" + W2::Type(); 
> static const string type = W1::Type() + "_L_" + W2::Type(); 

Best, Dogan Can

CyrilAllauzen - 09 Jul 2009 - 11:00

Thanks a lot. We will fix this in the next release.

Best, Cyril

 
Log In

OpenFST? vs. BOOST Graph Library for operations on language models

HenrikKinnemann - 18 Jun 2009 - 08:01

Hello,

as a software developer of an OCR team I'm looking for a library which can

(a) represent certain kind of language models (e.g., OCR error correction rules, n-grams, word dictionaries, morphology and grammar rules for German language) in a unique way, for example, as weighted finite automata

and

(b) has some high-level operations already implemented to process the language models, e.g. functions like 'compose', 'minimize' and 'shortest-path-search'.

The BOOST Graph Library (http://www.boost.org/doc/libs/1_39_0/libs/graph/doc/index.html) does also have weighted finite automata which could be used to represent language models and BOOST Graph offers lot's of functions on them.

My question: Where are the supposed differences between BOOST Graph and OpenFST in relation to my OCR application, and, what are the strenghts and weaknesses of both of them?

Best regards, Henrik

CyrilAllauzen - 19 Jun 2009 - 12:09

Hi Henrik,

The Boost Graph Library focuses on directed and undirected graphs. As far as we know, it does not provide specific operations for weighted automata and transducers such as determinization, minimization or composition. You can probably use the BGL to represent language models, but you'll have to re-implement these algorithms. If you know otherwise, please point us to the relevant documents as there is no mention of automata-specific algorithms at the URL you provided.

The OpenFst library was especially designed for handling weighted automata and transducers. It provides a large selection of algorithms (including minimization and composition), some of them being implemented as on-the-fly (lazy) operations. The library also includes a general semiring framework. OpenFst provides some of the basic graph algorithms offered by the BGL such as generic visitors (depth-first, breadth-first, shortest-first, ...), generic shortest-path algorithms (Dijkstra, Bellman-Ford, ...), topological sort, strongly-connected components, ... However, OpenFst does not offer some other algorithms provided by BGL (spamming tree algorithms for instance).

The OpenFst library has been successful used by several natural language processing applications: speech recognition, speech synthesis, machine translation and OCR among others. It is for instance one of the core components of the speech recognition and speech synthesis efforts at Google. In that particular example, it is used among other things to represent (and combine) n-gram language models, word dictionaries and regular grammars.

To conclude, the Boost Graph library could provide a) but not b). The OpenFst library provides a) and b).

Best regards, Cyril

 
Log In

N-way Composition Support

GrahamNeubig - 25 May 2009 - 08:06

Hello,

Are there any plans to support the three-way composition mentioned in the technical report "N-way composition of weighted finite-state transducers" (by Cyril Allauzen and Mehryar Mohri) in the near future? It would come in handy for one of my projects and I would be willing to help with the implementation if no-one is working on it, but if it's already being worked on I'll wait for the next release.

GregDhuse - 07 Jul 2010 - 12:27

Hello,

Does anyone know of an existing implementation of the generalized n-way composition algorithm from that paper? I'm considering working on my own implementation if not.

Best regards, Greg

 
Log In

New behaviour of SetInputSymbols? /SetOutputSymbols ?

ChristopherKermorvant - 07 May 2009 - 08:04

Hi,

I am working with the 1.1 version and it seems that there is a change in behavior of SetInputSymbols/SetOutputSymbols. As from version 1.1, it seems that these two functions make a deep copy of the SymbolTable given in argument. This is due to the addition of a copy constructor in symbol-table.h :

explicit SymbolTableImpl(const SymbolTableImpl& impl)
      : name_(impl.name_),
        available_key_(0),
        dense_key_limit_(0),
        check_sum_finalized_(false) {
    for (size_t i = 0; i < impl.symbols_.size(); ++i) {
      AddSymbol(impl.symbols_[i], impl.Find(impl.symbols_[i]));
    }
  } 
* Am I right ?

* if yes, i don't see the use of having a reference counting on a deep copy

* if yes, the following comment in symbol-table.h is not true anymore :

// SymbolTables are reference counted and can therefore be shared across
// multiple machines. For example a language model grammar G, with a
// SymbolTable for the words in the language model can share this symbol
// table with the lexical representation L o G.
 
Thank you for your answer and for this great library !

CyrilAllauzen - 07 May 2009 - 11:14

Hi,

This is inexact. SetInputSymbols and SetOutputSymbols still do a shallow copy of the SymbolTable in 1.1. They both call SymbolTable::Copy that in turn calls the copy constructor of SymbolTable:

  SymbolTable(const SymbolTable& table) : impl_(table.impl_) {
    impl_->IncrRefCount();
  }
The copy constructor of SymbolTableImpl is not called since impl_ is a pointer to a SymbolTableImpl.

Best, Cyril

 
Log In

Sat vocabulary building software

AllenSam - 05 May 2009 - 06:34

hi all looking for an English student to work on our ...


Off-topic post deleted.
MichaelRiley
.

Mindtuning for the Encode/Determinize/Decode workaround

KennethRBeesley - 06 Apr 2009 - 17:22

Cyril,

Back on 14 August 2007, you wrote:

"The library currently only supports the determinization of functional transducers (if two successful paths have the same input label, they need to also have the same output label). The reason for that is that we use the weighted automata determinization algorithm viewing the output labels as weights in the string semiring ....

A workaround is to use Encode/Decode to view the transducer as an acceptor, considering the pair (ilabel, olabel) as one symbol.

1. Encode: EncodeMapper encoder(kEncodeLabels, ENCODE); Encode(fst, &encoder);

2. Determinize.

3. Decode: Decode(fst, encoder);

*** End of Quotation ***

I need a bit of mind-tuning.

Question: Is the Encode/Determinize/Decode workaround recommended ONLY when determinizing a NON-functional Fst? If so, what's the best way to determine if an Fst candidate for determization is functional or not?

Background: In my application, Fsts are created from arbitrarily complex regular expressions, and the resulting Fsts may be functional or non-functional. Without the workaround, statements like

$v = a | a:b ;

currently cause a crash when the result Fst is determinized (without the Encode/Decode workaround), because the Fst maps "a" to "a" and to "b"--i.e. in this case the Fst is not functional.

Thanks,

Ken

MichaelRiley - 08 Apr 2009 - 00:01

You can always determinize and minimize an 'encoded' machine. The result may not be fully deterministic or minimal when 'decoded' but at least redundant transitions and states in the encoded version have been dealt with. Probably the safe thing to do in your application when a transduction is specified.
 
Log In

config.h

PaulDixon - 17 Mar 2009 - 08:14

Hi,

After following the INSTALL instructions using the defaults settings and using the library in my own program as described in the README file. The compiler gives and error looking for config.h (included compat.h). Is config.h copied as part of the installation? Everything compiles fine if I add another include path to the location where the configure script was run from.

MichaelRiley - 21 Mar 2009 - 11:45

Thanks. It's been fixed in version 1.1, now uploaded. Autotools growing pains.

 
Log In

Minor bug in assignment operator

DanEgnor - 16 Mar 2009 - 04:29

MutableFst declares an operator=(const Fst &), but it doesn't declare an operator=(const MutableFst &). You'd think that would be OK because MutableFst is a subclass of Fst, but in fact what it means is that C++ defines an implicit operator=(const MutableFst &), which does a shallow copy, which does nothing (because MutableFst is an abstract base class with no data members).

That means that if you then try to assign one MutableFst to another, nothing happens.

Recommended fix: add an operator=(const MutableFst &fst) method that simply calls the existing (abstract virtual) assignment operator, via static_cast or something. I can submit a diff if this isn't clear.

CyrilAllauzen - 16 Mar 2009 - 11:17

Hi Dan,

Indeed, we found out about this bug and fixed it a while back. The latest release of the library (version 1.0) included that fix.

Best, Cyril

DanEgnor - 17 Mar 2009 - 22:00

Zounds, I was three days out of date! Thanks.
 
Log In

When is it safe to RmEpsilon(), Determinize(), Minimize() When is it safe to automatically RmEpsilon? (), Determinize() and/or Minimize()

KennethRBeesley - 03 Mar 2009 - 16:59

Sorry to belabor this point, but I'm still struggling to understand when an Fst can be safely and automatically processed by RmEpsilon(), Determinize() and/or Minimize(). The challenge arises in my Kleene programming language, which uses OpenFst as the underlying library. In Kleene you can enter statements like

$myfst  =   a* b+ ((c|d|e|f) - (m|n|d)){2} ((dog|cat|rat) & (elephant|dog|pig))? ;

After the regular expression is parsed, the interpreter walks the parse tree and makes multiple calls to OpenFst library functions--Closure, Compose, Concat, Difference, Intersect, etc--building many intermediate Fsts and combining them using the various operations until the final Fst is created and (in this case) bound to the variable $myfst. Obviously, all this processing gets done automatically, without human intervention. The user is interested only in the final result, $myfst, which then might get used in subsequent regular expressions.

Ideally, the final result would be automatically determinized and minimized.

I understand, with help from Cyril , that if an Fst is tested and found to be 1) acyclic, or 2) both (unweighted AND an Acceptor) then it can be safely Determinized and Minimized. (Corrections would be welcome.) So after each intermediate or final Fst is created by the Kleene interpreter, it might be sent to the following clean-up function

// first-draft pseudo-code
if ( the fst is acyclic || (the fst is unweighted &&  the fst is an acceptor) ) {
      RmEpsilon()
      Determinize()
      Minimize()
} else {
      // this is questionable, see below
      RmEpsilon()
}

However, I'm aware that this could still be problematic. In a response on this forum, Cyril stated

"Indeed, in some situations you cannot remove epsilons because it will lead to a space blow up. In that case, you might still want to optimize the machine with epsilons [i.e. without first invoking RmEpsilon] using determinization and minimization."

In another response, Cyril stated

"Indeed, union introduces epsilon-transitions and epsilon-transitions will slow down composition. Non-determinism will also slow down composition. So, I would recommend you first epsilon-remove and then determinize your fst. However, there is always a risk with determinization (it might blow up or even not terminate). Hence, if what I suggest does not work better, you should investigate using determinize only, or rmepsilon only or determinize followed by rmepsilon."

This puts in question the 'else' clause in the pseudo-code above, where RmEpsilon() is automatically called on Fsts that don't satisfy the requirements for safe determinization and minimization.

Question: Assuming my Kleene scenario, where regular expressions are being interpreted and many Fsts (intermediate and final) are being created automatically, when can RmEpsilon(), Determinize() and Minimize() be safely invoked automatically? I want the interpreter to do all the optimizations that are mathematically safe, but to attempt nothing more.

We can assume that the interpreter can interrogate various features of the candidate Fst (cyclic vs. acyclic, weighted vs. unweighted, acceptor vs. transducer, etc.). The interpreter also knows exactly which operation has just been performed to produce the Fst in question (Union, Concat, Closure, etc.).

Thanks,

Ken

CyrilAllauzen - 03 Mar 2009 - 17:27

Hi Ken,

To clarify, when talking about determinization:

  • safe means a condition under which the algorithm will always terminate assuming you have enough memory and wait long enough;
  • unsafe means the algorithm might not terminate ever.

Assuming this definition of safe/unsafe, determinization is safe for acyclic FSTs and for unweighted acceptors and rmepsilon and minimization are always safe.

However, safe does not guarantee that the algorithms will terminate on your machine because you have a finite amount of memory and don't want to wait more than x minutes.

The complexity of determinization in the safe case is in the worst case exponential in the size of the input machine, and quadratic in the size of the result (see DeterminizeDoc).

The complexity of epsilon removal is in the worst case quadratic in the size of the input (see RmEpsilonDoc).

There is no way you can ensure these algorithms will never blowup of memory. However, you can ensure that for most "reasonable" input they will behave "reasonnably".

Due to potentially large cost of these algorithms, it is always recommended you try to reduce the size of the input Fst before hand. This is why I would recommend first optimizing the machine with epsilon, applying epsilon-removal and then optimizing the machine without epsilons, Hence, the "safest" optimization function would be:

// first-draft pseudo-code
if ( the fst is acyclic || (the fst is unweighted &&  the fst is an acceptor) ) {
      Determinize()
      Minimize()
      RmEpsilon()
      Determinize()
      Minimize()
} else {
      // this is questionable, see below
      RmEpsilon()
}

 
Log In

Compiling openfst-1.0 on cygwin

PaulDixon - 26 Feb 2009 - 18:17

I have successfully compiled openfst-1.0 on cygwin. It was necessary to install a newer version of gcc to get TR1 support. I also had to add this line to main.cc
#include <ctime>

I should also have Visual Studio build instructions ready soon.

Paul

PaulDixon - 17 Mar 2009 - 19:37

To compile on the latest verion of cygwin with the cygwin gcc-4 package. Add the #include to the main.cc file and Run configure with the following arguments. ./configure CXX=g++-4.exe CC=gcc-4.exe Build and install as described in the readme

HenrikKinnemann - 11 Aug 2009 - 03:24

 
Log In

fstcompose memory consumption?

DavidHugginsDaines - 19 Jan 2009 - 11:45

Hi,

I'm running into some trouble trying to build static recognition transducers using OpenFst. Specifically, for even a fairly trivial language model G:

# of states                     42980
# of arcs                       183468
# of final states               923
# of input/output epsilons      43867

and (determinized and minimized) dictionary L~:

# of states                     1180
# of arcs                       2424
# of final states               1
# of input/output epsilons      1

when I try to compose these with fstcompose, it rapidly uses all available memory. I haven't let it run long enough to see if it actually terminates.

I'm not sure this is specific to OpenFst, because I find that the FSM library has the same problem. It's possible that I'm building the language model incorrectly, though I'm not sure why this would cause composition to blow up. I am planning to do a sanity check using the GRM library but haven't figured this out yet.

Is it necessary to use on-demand composition when building a transducer of this sort?

CyrilAllauzen - 21 Jan 2009 - 18:10

Hi,

You should try not to determinize/minimize L and make sure that the output labels (i.e words) are on the first non-input epsilon on a path. Also, make sure that L is output label-sorted. (I assumed you do fstcompose L G > LG)

Let me know if this works.

Cyril

DavidHugginsDaines - 26 Jan 2009 - 13:30

Thanks for the advice. It is working quite well now. I had been putting words at the end of words, on the word boundary symbols in L~, under the assumption that epsilon normalization would push them forward eventually (which it does), and also including entirely unnecessary epsilon transitions from the start state to the first state of each word.

I assume the problem with a minimized L is that its maximum out-degree is much higher than an un-minimized one?

CyrilAllauzen - 30 Jan 2009 - 13:37

Hi David,

I'm glad to read that this works quite well now. The issue is with determinization and not minimization. Determinization pushes back the output labels creating a lot of long epsilons paths. In composition, while trying to match a label x at a state in G, all this epsilon paths in L are going to be followed in order to find an x. A lot of this work is wasteful since there is only a few of these paths at the end of which an x can be found.

Cyril

BingbingLeng - 03 Dec 2009 - 11:59

Hi Cryil,

when the output labels (i.e words) are on the first non-input epsilon on a path,I cannot determinize the Lexicon.Could you please tell me how to determinize it?

 
Log In

possibility of facing infinite loop in fstdeterminize

IlbinLee - 18 Jan 2009 - 21:47

Is ever possible for "fstdeterminize" to fall into infinite loop?

I'm trying to determinize an FST and the amount of memory it uses gradually increases and it reaches to the limit, results "memory allocation failed" error. However, after removing 2-3 certain arcs in the FST, the process ends pretty quickly. Those arcs don't have any particular characteristics, so I guess it's due to a structure of the FST which consists of more than 2-3 arcs.

Could you give me any solution or hint?

MarkusD - 20 Jan 2009 - 14:25

Yes, the determinization algorithm does not halt for certain weighted finite-state machines. This is described in Mohri 2004 (http://www.cs.nyu.edu/~mohri/postscript/fla.pdf). If the machine contains 'sibling states' then they must be 'twins'. Otherwise, the machine is not determinizable (under the commonly used semirings).

Roughly, two states are sibling states if they are reachable from the start state on different paths that have the same label sequence, and there are cycles at these two states that have similar labels. If these two cycles have the same weight then they are twins.

On this machine, for example, fstdeterminize doesn't terminate: 0 1 1 1 0 2 1 2 1 1 2 3 1 3 3 5 2 2 2 4 2 3 4 6 3

MarkusD - 20 Jan 2009 - 14:27

Here is the finite-state machine again (this time in "pre" tags):

0 1  1 1 0 2  1 2 1 1  2 3 1 3  3 5 2 2  2 4 2 3  4 6 3 
 
Log In

Minor fixes needed to compile with GCC 4.3

DavidHugginsDaines - 06 Jan 2009 - 13:46

As usual, the C++ "standard" got reinterpreted in GCC 4.3, and a few things need to be changed for OpenFst-20080422 to compile. They are mostly trivial, just adding std:: to C string functions and including to get INT_MAX. A patch is at http://www.cs.cmu.edu/~dhuggins/OpenFst-gcc-4.3.diff

 
Log In

Ignore(A, B, &C) algorithm?

KennethRBeesley - 05 Jan 2009 - 17:30

If you're taking requests: It would be convenient to have an Ignore(A, B, &C) algorithm, creating a network (C) that encodes the language/relation A, ignoring any occurrences of B.

 
Log In

Documentation of OpenFst? binary file format?

KennethRBeesley - 05 Jan 2009 - 14:46

Is there any available documentation on OpenFst's binary file format?

Thanks,

Ken

DavidHugginsDaines - 06 Jan 2009 - 14:24

I've been wondering this too, as I'm interested in reading and writing the binary file format without having to link in the OpenFst library (for no good reason aside from not wanting to deal with C++ compiler and library issues). It appears that it's slightly machine specific (but only in terms of endianness), so perhaps it's mainly meant to be used as an internal, intermediate format?

The definition of the header can be found in fst/lib/fst.h, and the symbol table format can be gleaned from fst/lib/symbol-table.cc. How the subsequent state and arc definitions are stored depends entirely on what particular Fst and Arc classes are being used.

In the case of VectorFst, I believe the whole thing looks like this, in native byte order. Strings are stored as a 32-bit length field followed by the raw character data. I may be a bit wrong about this as I haven't actually implemented it yet. Code for this should show up in the CMU Sphinx repository at some point in the near future though.

int32 magic_number = 2125659606;
string fst_type = "vector";
string arc_type = "standard";
int32 file_version = 2; /* file format version */
int32 flags; /* bitfield: HAS_ISYMBOLS = 1, HAS_OSYMBOLS = 2 */
uint64 properties; /* bitfield, defined in fst/lib/properties.h */
int64 start; /* start state ID */
int64 num_states;
int64 num_arcs;

/* input symbol table, if flags & HAVE_ISYMBOLS */
int32 magic_number = 2125658996;
string name;
int64 next_available_key; /* Next available key (number) */
int64 size;
struct {
    string symbol;
    int64 key;
} symbols[size];

/* output symbol table, if flags & HAVE_OSYMBOLS - see above... */

/* states and arcs */
struct {
    float final_weight;
    int64 num_arcs;
    struct {
        int ilabel;
        int olabel;
        float weight;
        int next_state;
    } arcs[num_arcs];
} states[num_states];

DavidHugginsDaines - 06 Jan 2009 - 14:26

That should say VectorFst<StdArc> ... forgot to format it correctly.
 
Log In

Patch for GNU autotools including libtool for OpenFst? 20080422

DavidShao - 28 Dec 2008 - 00:41

I have been developing patches to enable an open source software stack of OpenFst 20080422, leptonlib 1.58, iulib subversion revision 123, Tesseract subversion revision 205, and OCRopus subversion revision 1307 to all use libtool to create dynamic linked libraries on all of Mac OS X 10.5.6 using Macports, Ubuntu 8.10, FreeBSD 7, and Windows XP using Cygwin. The patches can be found at OCRopus files with filenames beginning with toautotools. As of today, for OpenFst, the corresponding file would be
toautotools_openfst_20080422_20081227.diff
As I have stated in a previous post, to apply and use the patch, say to a user subdirectory busr so that one does not need administrator privileges to install,

patch -p1 -E < ../toautotools_openfst_20080422_20081227.diff
autoreconf --install --force
./configure --prefix=$HOME/busr CPPFLAGS="-I/opt/local/include" LDFLAGS="-L/opt/local/lib"
make
make check
make install

I have enabled make check using tests already provided in OpenFst so that one can test whether modules can be dlopened. All tests pass for all platforms except Cygwin on Windows XP. Even for Cygwin on Windows XP, this patch enables a dynamic linked library to be built such that OCRopus is able to use it to pass its own OpenFst tests. (There is an additional failure of one OCRopus test, test-fst-search2.lua, on Mac OS X and FreeBSD.)

My preliminary investigation for Cygwin on Windows XP appears to show that something is going wrong with reinterpret_cast-ing the read() function so that it can be recovered using a map in both fst/lib/register.h and fst/bin/main.h. The pointer to this function appears to be immediately recovered if the map is accessed right after it is stored, but a later GetEntry() returns a null pointer.

I believe that using GNU autotools to provide dynamic linked libraries is an essential step to helping more wide distribution of OpenFst through standard platform package or port managers. I think the current patch just about finishes this task other than the above problem for Cygwin. I also have not tested this patch on any 64-bit platforms other than an Intel Macbook running Mac OS X.

DavidShao - 28 Dec 2008 - 00:42

 
Log In

Determinize, Sequentialize, Subsequentialize

KennethRBeesley - 11 Nov 2008 - 14:17

My colleagues and I are trying to understand the relationship between the OpenFst Determinize() algorithm and the sequentialization and subsequentialization discussed in Mohri's 1997 paper "Finite-State Transducers in Language and Speech Processing".

Any mind-tuning that you could offer would be much appreciated.

Ken

CyrilAllauzen - 13 Nov 2008 - 14:15

The Determinize() algorithm is the algorithm described by Mohri in 1997 Computational Linguistic paper.

In the weighted automaton case, the implementation follows exactly Mohri's paper. The weighted transducer is dealt by considering weighted transducer over seminiring K as weighted automaton over the semiring obtained by taking the cross product of the string semiring and the semiring K.

Cyril

 
Log In

Determinize and functional transducers

KennethRBeesley - 11 Nov 2008 - 14:10

The documentation of the Determinize() function in OpenFst states "The transducer must be functional." and deeper down, "N.B. Current version only determinizes functional transducers". Can we expect a future version of Determinize() that is not limited to functional transducers?

Thanks,

Ken

CyrilAllauzen - 13 Nov 2008 - 16:35

Yes, we indeed plan to have determinization of non-functional transducers in a future version.

Best, Cyril

ThomasDeniau - 11 Nov 2009 - 00:06

Has anybody written a version of determinization which works for non-functional transducers ? Cyril, would you have a preliminary version of that code to give to people looking to implement it a headstart ? Thanks
 
Log In

Request: Determinize in place

KennethRBeesley - 10 Sep 2008 - 23:44

Minimize(&A) minimizes a network A "in place", but Determinize(A, &B) takes A and produces as output a new determinized Fst B. Would it be possible to provide a new version of Determinize that determinizes in place, i.e. Determinize(&A)?

Thanks,

Ken

CyrilAllauzen - 24 Sep 2008 - 17:27

We could add a Determinize(&A). The reason we chose not to so was that the underlying algorithm would still be constructive: create B as the determinization of A, then delete A and assign B to A. Hence, it won't be any more (memory) efficient than using Determinize(A, &B).

KennethRBeesley - 24 Sep 2008 - 21:46

Whatever goes on underneath, I venture to suggest that it would be (to some of us at least) intuitive and useful to add a new Determinize(&A) that is parallel to RmEpsilon(&A) and Minimize(&A). From a user's point of view, it often makes sense to perform such operations "in place" on an existing Fst object.

In my own work (a programming language built on top of OpenFst) an assignment statement like

$net = a*b+[w-z]? ;

causes the right-hand side to be evaluated (by calling various functions in the OpenFst library) and in a symbol table the name $net is bound to a Java Fst object that stores (as a Java long int) the pointer to the resulting C++ Fst object. That pointer is my handle from the Java world to the Fst object in the C++ universe. If I then type

$net2 = $net ;

then $net2 becomes an alias for the same Java Fst object that contains the pointer to the original C++ Fst object. (If I really want to make a copy, I've got a copy function that relies on the Map() function in the OpenFst library.)

With RmEpsilon(&A), I can remove the epsilons from the original C++ Fst object in place, and both $net and $net2 still reference the same Java Fst object that still contains the pointer to the original C++ Fst object. If I could (at least superficially) also Determinize() that original Fst object in place, life would be a bit easier for me. With the existing Determinize(A, &B), a new C++ Fst object B is created, which usually causes my language to create a new Java Fst object to wrap it; and I've got to worry about changing the pointer in the original Java Fst object (the value, in the symbol table, of $net and any aliases like $net2) and deleting the original C++ object A to prevent a kind of memory leak.

So I now understand why Determinize(A, &B) is as it is, but I do still suggest that I and perhaps others would find a new Determinize(&A) useful and intuitively parallel to RmEpsilon(&A) and Minimize(&A), so that all could be performed "in place".

Best,

Ken

CyrilAllauzen - 14 Oct 2008 - 11:59

Hi Ken,

I understand that what you describe is a rather unsastisfying way of wrapping. However, I not sure I understand why this is the only of wrapping. Anyhow, you can easily add a Determinize(&A) as follows:

template <class Arc>
void Determinize(MutableFst<Arc> *fst) {
  *fst = DeterminizeFst<Arc>(*fst, 
                             DeterminizeFstOptions(CacheOptions(true, 0)));
}

Best, Cyril

 
Log In

StateIterator? and the Start state

KennethRBeesley - 21 Aug 2008 - 14:28

When you iterate through the states in an Fst:

for (StateIterator<StdFst> siter(fst); !siter.Done(); siter.Next()) {
     StateId state_id = siter.Value() ;
     ...
}


is the first state_id returned from the Iterator always the Start state of the Fst?

Thanks,

Ken

CyrilAllauzen - 02 Sep 2008 - 11:39

The first state returned by the iterator is not guaranteed to be the start state. You should always use the fst.Start() method to determinne the start state,

Best, Cyril

 
Log In

Failure transitions

ChrisHarris - 21 Jul 2008 - 14:56

Hey guys -

Great effort to build this library - there are so many uses - it's very nice to have such knowledgeable people building this for everyone. Thank you!

I'm interested in doing some language modeling with this library which I anticipate will require backoff weights that are commonly going to build paths which will be "better" than the proper path for certain n-grams.

I know you guys implemented failure transitions in the GRM library for LM purposes, and later an algorithm to replace them with epsilon transitions along with some compression heuristics to reduce the size of the final machines to address this exact problem.

My question is: What's the best route to take to implement this for the OpenFST project? If I take this on myself... should I create a new FST class to implement failure transitions? Or can I get away with just building a new Arc class which can trick the search functions for composition?

Thanks again for your help!

Chris

CyrilAllauzen - 21 Jul 2008 - 16:56

Hi Chris,

There is an undocumented support for failure transitions in composition. This feature is not documented and the handling of failure transitions will change significantly in the next version of the library.

For now, you can label failure transitions with kPhiLabel (you will need to use the C++ library for this, Fsts containing transitions labeled by kPhiLabel cannot be written to a file).

To perform composition with failure transitions, you need to use

ComposeFst<Arc>(fst1, fst2, ComposeFstOptions<COMPOSE_FST1_PHI>())
when fst1 contains transitions labeled by kPhiLabel or
ComposeFst<Arc>(fst1, fst2, ComposeFstOptions<COMPOSE_FST2_PHI>())
when fst2 contains transitions labeled by kPhiLabel.

Best, Cyril

ChrisHarris - 21 Jul 2008 - 20:03

Nice... thanks for the head's up... I'm glad I asked!

Out of curiosity, what's the new implementaiton going to be? Something along the lines of a custom Arc/State/Fsm class?

Chris

ChrisHarris - 23 Jul 2008 - 19:39

Cyril -

I noticed that during composition with failure transitions, the composition does not chase down the transitive closure of both epsilon and failure transitions.

You can reproduce the behavior by having a state with a failure transition lead to a state with an epsilon transition - which in turn leads to a state with a real symbol arc on it. For example an FSA with the following makeup: 0 1 -2 -2 1 2 0 0 2 3 1 1 3

When you do composition with an FST containing only that real symbol (symbol 1 in the above case) - I would expect it to yield a connected machine (failure -> epsilon -> symbol) - which it does not. Instead it looks through the state with the failure transition and sees no matching symbol (only an epsilon transition) and aborts composition.

I tried to fix the problem, but couldn't quite figure out a good way to do so. The logic to chase down the transitive closure of failure transitions is making assumptions on the structure of the FST (deterministic w.r.t. failure transitions, or at least that they're all equally good, because it takes the first one it finds which yields a match) which probably means any fix I make will be short lived anyway.

If you'd like more information, please let me know, otherwise I'll just try another workaround - but I don't think fixing it in the current implementation is worthwhile... do you?

Chris

CyrilAllauzen - 31 Jul 2008 - 18:42

Hi Chris,

Indeed, I should have mentioned this: failure and epsilon transitions do not work together. You should also have at most 1 failure transition at a given state.

Cyril

 
Log In

Interactive manual testing of an FST

KennethRBeesley - 07 Jul 2008 - 11:53

In the context of an interactive GUI, it might sometimes be useful to build an FST and then test it interactively by typing in individual strings to be generated (or analyzed), getting back the output string(s). If the FST to be tested is MainFST, then one obvious way to do this would be the following:

1. Get the input string typed by the user.
2. Build the input string into a one-path acceptor (call it InputACC)
3. StdVectorFst ResultFST ;
Compose(InputACC, MainFST, &ResultFST) ; // compose for generation
Project(&ResultFST, PROJECT_OUTPUT) ; // take the output projection
if (number of paths in ResultFST not infinite or huge) {
// print strings encoded by ResultFST
}

This testing would be put in a loop, allowing the user to manually enter as many test input strings as desired before terminating the loop. To implement analysis, one could 1) invert the MainFST before performing the steps above, or 2) do the following:

Compose(MainFST, InputACC, &ResultFST) ; // compose for analysis
Project(&ResultFST, PROJECT_INPUT) ; // take the input projection

QUESTION: Is there a better, more efficient, way to implement such manual testing? Perhaps using the lazy operations ComposeFst() and ProjectFst() ?

Thanks,

Ken

CyrilAllauzen - 09 Jul 2008 - 18:11

This approach looks good to me. Using lazy operations shouldn't improve the efficiency as is.

You could benefit from lazy operations when the number of strings in ResultFst is large and you don't want to display all these strings (or even don't want to compute all of ResultFst). Instead, you could rewrite your code to generate all paths to work for a non-expanded FST and to stop once a fixed number of strings have been found (you could even ask to the user if he wants to see more strings then).

Best, Cyril

 
Log In

Easy way to query the number of paths in an FST?

If you have an FST, is there an easy way to query the number of paths?

Thanks,
Ken
KennethRBeesley 13 Jun 2008 - 14:10
The natural way is to use the shortest-distance algorithm:
1. Use Map to turn your FST into an FST over the log semiring setting the weight of every transition and the final weight of final states to LogWeight? ::One().
2. Call ShortestDistance? with reverse == true. Let d be the value return in the distance vector for the initial state. d is then the -log of the number of successful paths in the FST.

Best,
Cyril
CyrilAllauzen 16 Jun 2008 - 12:56
Thanks, Cyril. I suspect that my terminology was wrong and/or my C++ skills are still shaky. What I really need is the number of complete paths in an FST, from the start state to a final state. Following these instructions (as best I can), I seem to get the number of paths from the start state to any state (final or not). I hesitate to post my code here, but I'd be more than happy to send it to you or anyone else.

Ken
KennethRBeesley 17 Jun 2008 - 16:48