You can use the formatting commands describes in TextFormattingRules in your comment.
If you want to post some code, surround it with
<verbatim>and</verbatim>tags.
Auto-linking of WikiWords is now disabled in comments, so you can type VectorFst and it won't result in a broken link.
You now need to use
<br>to force new lines in your comment (unless insideverbatimtags). However, a blank line will automatically create a new paragraph.
... StdVectorFst* m_transA2 = new StdVectorFst(); Map(m_transA, m_transA2, LogToStdMapper()); // .. some more processing with m_transA2 StdVectorFst* m_transA3 = new StdVectorFst(); vector<Weight> *distance; SingleShortestPath(*m_transA2, m_transA3, distance) ...However, when trying to compile this code, I get an error that this signature does not match the defined SingleShortestPath signature. What am I doing wrong? However, w
typedef StdArc::Weight Weight; StdVectorFst* m_transA2 = new StdVectorFst(); StdVectorFst* m_transA3 = new StdVectorFst(); AnyArcFilter<StdArc> filter; vector<Weight> distance; AutoQueue<StdArc::StateId> Q(*m_transA2, &distance, filter); ShortestPathOptions<StdArc, AutoQueue<StdArc::StateId>, AnyArcFilter<StdArc> > opts(&Q, filter); SingleShortestPath(*m_transA2, m_transA3, &distance, opts); //This should do the same ShortestPath(*m_transA2, m_transA3);
echo "0 1" | farcompilestrings | farinfo - ERROR: Error reading FAR: - ERROR: GenericRegister::GetEntry : -arc.so: cannot open shared object file: No such file or directory FATAL: No operation found for "FarInfo" on arc typeHas anyone come across this problem? Cheers, Dogan
RmEpsilon<StdArc>(t);
EncodeMapper<StdArc> encode_mapper(kEncodeLabels,ENCODE);
Encode(t, &encode_mapper);
StdVectorFst * det = new StdVectorFst();
Determinize<StdArc>(*t, det);
Minimize<StdArc>(det);
Decode(det, encode_mapper);
return det;
Is there something in the above code that could be optimized? I already tried (kEncodeLabels | kEncodeWeights) instead of (kEncodeLabels), but it didn't have any effect.
regards,
Erik
template <class T> const T FloatLimits<T>::kPosInfinity = numeric_limits<T>::infinity();I am now wondering if it could not be an issue with static order initialization (see e.g. http://yosefk.com/c++fqa/ctors.html#fqa-10.12), i.e. that the templated class numeric_limits
vector<pair<LogArc::Label, LogArc::Label> >* ipairs; vector<pair<LogArc::Label, LogArc::Label> >* opairs; ipairs = new vector<pair<LogArc::Label, LogArc::Label> >; opairs = new vector<pair<LogArc::Label, LogArc::Label> >; pair<LogArc::Label,LogArc::Label> mypair(0,-2);Now I declare the matchers in the ComposeFst options object:
opts.gc_limit = 0; opts.matcher1 = new PM(*lm, MATCH_OUTPUT, -2); opts.matcher2 = new PM(*noise, MATCH_NONE, kNoLabel);Now, I try to compose the two FSTs:
composition = new ComposeFst<LogArc>(*lm, *fst2, opts);This is where I get the error. I stepped into the compose code and noticed that while I declared the matchers as MATCH_OUPUT/MATCH_NONE, the match_type variable is set to MATCH_NONE for both FSTs and thus failing to set a match_type.
ArcSort(lm, OLabelCompare<LogArc>()); ArcSort(fst2, ILabelCompare<LogArc>());
.... VectorFst<StdArc> A, B; typedef ArcLookAheadMatcher< SortedMatcher<StdVectorFst> > LAM; typedef SequenceComposeFilter< LAM, LAM > SCF; typedef LookAheadComposeFilter< SCF, LAM, LAM, MATCH_BOTH> LCF; LAM* lam1 = new LAM(A, MATCH_OUTPUT); LAM* lam2 = new LAM(B, MATCH_INPUT); LCF* laf1 = new LCF( A, B, lam1, lam2, MATCH_BOTH ); //here is the compilation problem ...How do I use LookAheadComposeFilter?
StdFst* C = Convert(A, "arc_lookahead");Not sure if the arc_lookahead type needs B to be relabelled.
line1: ComposeFst<StdArc> C(A, B, opts);
line2: ProjectFst< StdArc >(C, PROJECT_INPUT);
X1 is usually a probability weight like LogWeight X2 is usually a random variable or vector see SignedLogWeight or SparsePowerWeightI tried several combination to no effect, including ExpecationArc<StdArc, vector
If X1 is distinct from X2, it is required that there is an external product between X1 and X2 and if both semriring are commutative, or left or right semirings, then result must have those properties.Addresses my problem, but can someone tell me what I must do to satisfy this requirement? Do I need to implement another Arc subclass? Thanks in advance!
ERROR: SymbolTable::Write: write failedI'm sure there's a nice recursive algorithm for computing this, but is there some way to do this via some command line operations? Thanks!
Making all in bin make[3]: Entering directory `/home/anakreon/tmp/d/openfst-1.2.7/src/bin' g++ -DHAVE_CONFIG_H -I./../include -I./../script -g -O2 -MT fstarcsort.o -MD -MP -MF .deps/fstarcsort.Tpo -c -o fstarcsort.o fstarcsort.cc mv -f .deps/fstarcsort.Tpo .deps/fstarcsort.Po /bin/bash ../../libtool --tag=CXX --mode=link g++ -g -O2 -lm -ldl -o fstarcsort fstarcsort.o ../script/libfstscript.la ../lib/libfst.la libtool: link: g++ -g -O2 -o .libs/fstarcsort fstarcsort.o -lm -ldl ../script/.libs/libfstscript.so ../lib/.libs/libfst.so ../script/.libs/libfstscript.so: undefined reference to `dlopen' ../script/.libs/libfstscript.so: undefined reference to `dlerror' collect2: ld returned 1 exit status make[3]: *** [fstarcsort] Error 1 make[3]: Leaving directory `/home/anakreon/tmp/d/openfst-1.2.7/src/bin' make[2]: *** [all-recursive] Error 1 make[2]: Leaving directory `/home/anakreon/tmp/d/openfst-1.2.7/src' make[1]: *** [all-recursive] Error 1 make[1]: Leaving directory `/home/anakreon/tmp/d/openfst-1.2.7' make: *** [all] Error 2
for(MutableArcIterator<CustomFST> aiter(&fst, state_id);!aiter.Done(); aiter.Next())
{
GallicArc<LogArc, STRING_LEFT> arc = aiter.Value();
if(arc.ilabel==input_symbol)
{
arc.ilabel = label;
aiter.SetValue(arc);
break;
}
}
All i need to do here is to modify one single arc, and my code should work in constant time. I am sure openfst has better ways that I am missing.
fstcompose A.fst B.fst out.fstand run a fstshortestpath on the output
fstshortestpath out.fst out.shortest.fstBut the final fat (out.shortest.fst) outputs a reversed string as it should be. Any ideas? Thanks!
fstprint will print the arcs in reverse but the machine should be well formed.
0 1 <epsilon> <epsilon> 2.29587674 1 1 <epsilon> <epsilon> 1 2 that that 4.43530607 2 3 <epsilon> <epsilon> 1.30640042 2 4 <epsilon> <epsilon> 2 5 is is 5.14776754 3 3 <epsilon> <epsilon> 3 6 is is 6.12910652When I run RmEpsilon in code and generate the FST again, I get something like this:
0 1 that that -0.200286627 1 3 is is 0.504037499 1 2 is is -1.78467798What am I missing? Why are negative weights being produced in this case?
0 1 <epsilon> <epsilon> 2.29587674 0 2 <epsilon> <epsilon> 0 3 But But 3.66515589 1 4 But But 6.58336163 2 3 But But 3.66515589 3 4 <epsilon> <epsilon> 1.03407383 4 5 <epsilon> <epsilon> 1.04145384 5 6 through through 6.26490784 6 7 <epsilon> <epsilon> 1.45167708 7 8 I I 3.89830112 8 9 <epsilon> <epsilon> 1.6464082 8 10 <epsilon> <epsilon> 8 4.20359564 9 3.0302887 10 4.20359564Running RmEpsilon I came up with the following FST:
0 2 But But 8.87923813 0 1 But But 2.97200871 1 3 through through 8.34043503 2 3 through through 7.30636168 3 4 I I 5.34997845 4 3.23925138I traced this on paper to verify that the weights were being consolidated correctly (from the elimination of epsilon transitions), and noticed that state 0 -> 1 on the new FST is approximately 0.7 less than it should be. State 0 -> 3 (or 0 -> 2 -> 3) is 3.66 pre-RmEpsilon and is 2.97 in the post-RmEpsilon FST. First, have I missed something in manually verifying this? Second, is there an easy way to eliminate self-loops programmatically to prepare an FST for running RmEpsilon, or is it just a matter of walking through the arcs and somehow removing them from the FST?
-l(e(-3.66515589) + e(-3.66515589)) is 2.97200870944005469070 so the output of RmEpsilon looks fine to me.
Cyril
$ uname -a
Linux ono 3.0-ARCH #1 SMP PREEMPT Tue Aug 30 08:53:25 CEST 2011 x86_64 Intel(R) Core(TM)2 Duo CPU U9600 @ 1.60GHz GenuineIntel GNU/Linux
$ g++ --version
g++ (GCC) 4.6.1 20110819 (prerelease)
System 2 (Mac OS X):
$ uname -a
Darwin -------------.local 10.8.0 Darwin Kernel Version 10.8.0: Tue Jun 7 16:33:36 PDT 2011; root:xnu-1504.15.3~1/RELEASE_I386 i386
$ g++ --version
i686-apple-darwin10-g++-4.2.1 (GCC) 4.2.1 (Apple Inc. build 5666) (dot 3)
Input FST (a deterministic acceptor).
Output of fstminimize on system 1.
Output of fstminimize on system 2.
Matcherequivalent as mutable arc, where I could find the mutable arc more efficiently?
for(MutableArcIterator<CustomFST> aiter_r(&fst, red);!aiter_r.Done();aiter_r.Next())
{
GallicArc<LogArc, STRING_LEFT> arc_r = aiter_r.Value();
if(arc_r.ilabel==e_red.ilabel)
{
GallicWeight<StdArc::Label, LogWeight, STRING_LEFT> gw_r = GallicWeight<StdArc::Label, LogWeight, STRING_LEFT>(lcp, arc_r.weight.Value2());
arc_r.weight = gw_r;
aiter_r.SetValue(arc_r);
}
}
(0) --0:0/inf-->(1)--1:1/1.234-->((2)) \__2:2/5.67__/Expected result:
(0) --2:2/5.67-->(1)--1:1/1.234-->((2)) \_____________1:1/inf___________/Actual result:
(O) --2:2/5.67-->(1)--1:1/1.234-->((2)) \___________1:1/1.234____________/ <= PROBLEM !!The infinite weight has not been taken into account when adding the 0->2 arc. I digged into the code and it may be linked to stale data in ShortestDistanceState::distance_ that is not reset when encountering an arc with infinite weight, but if a more knowledgeable person could take a look that would be great. I am using an older version of OpenFst (1.1.something, I think) but I didn't see any major difference in the code. FYI, I'm calling directly the C++ functions. Thanks for any help, Romain
StringWeight<StdArc::Label, STRING_LEFT> w1;when ever i enter an epsilon or 0, using: w1.PushBack(0), the String is an epsilon, no matter how many times I call w1.PushBack(0);. However, when I do it on the right side of another input symbol, for example:
w1.PushBack(1); w1.PushBack(0);the output is 1_0. My question is, epsilon is the multiplicative identity of String semiring, so no matter how many times i call w1.PushBack(0); the output should be 1 and not 1_0. Can anybody explain me if I am misunderstanding anything here?
StdAcrwe simply do the following:
StdVectorFst fst; fst.AddState(); fst.SetStart(0); fst.AddArc(0, StdArc(1, 1, 0.5, 1)); fst.AddArc(0, StdArc(2, 2, 1.5, 1));How should the same thing be done in case of the following:
typedef VectorFst<GallicArc<LogArc, STRING_LEFT>> CustomFST;
void MakeGallicFST()
{
CustomFST fst;
fst.AddState();
GallicWeight<StdArc::Label, LogWeight, STRING_LEFT> gw;
GallicArc<LogWeight, STRING_LEFT> ga;
}
ga.weight = ???So, the questions are: 1. What should be the format of the weight? 2. I can certainly put a GallicWeight object here, now the question is how to build a GallicWeight. For example, in the following code:
gw.Read(???);What should be the format of the input file? 3. Can we manipulate
gw.Value1();or
gw.Value2();? If yes, how?
void MakeGallicFST()
{
CustomFST fst;
fst.AddState();
fst.SetStart(0);
fst::StringWeight<StdArc::Label, STRING_LEFT> sw;
sw.PushFront(1);
sw.PushFront(2);
sw.PushFront(3);
sw.PushFront(4);
LogWeight lw = LogWeight(0.5);
GallicWeight<StdArc::Label, LogWeight, STRING_LEFT> gw = GallicWeight<StdArc::Label, LogWeight, STRING_LEFT>(sw, lw);
GallicArc<LogArc, STRING_LEFT> ga;
ga.ilabel = 1;
ga.olabel = 1;
ga.nextstate = 0;
ga.weight = gw;
fst.AddArc(0, ga);
fst.AddState();
ga.ilabel = 1;
ga.olabel = 1;
ga.nextstate = 1;
ga.weight = gw;
fst.AddArc(0, ga);
}
0 1 a 1_1_1,0.5 1 0 b 2_2_2,0.5if not, what would be the right format?
#include <fst/const-fst.h>
#include <fst/edit-fst.h>
#include <fst/vector-fst.h>
#include <fst/script/register.h>
#include <fst/script/fstscript.h>
using namespace fst;
namespace fst {
namespace script {
typedef GallicArc<LogArc> LogGallicArc;
REGISTER_FST(VectorFst, LogGallicArc);
REGISTER_FST(ConstFst, LogGallicArc);
REGISTER_FST(EditFst, LogGallicArc);
REGISTER_FST_CLASSES(LogGallicArc);
REGISTER_FST_OPERATIONS(LogGallicArc);
}
}
Compile and make sure the .so is on LD_LIBRARY_PATH
g++ -fPIC gallic-arc.cc -o gallic-arc.so -shared -O2
NaturalLess, it is defined in weight.h.
LogArc to GallicArc<LogArc, STRING_LEFT> , you need to use ToGallicMapper<LogArc, STRING_LEFT> .
typedef VectorFst<GallicArc<LogArc, STRING_LEFT>> CustomFST;
typedef VectorFst<LogArc> LogVectorFst;
CustomFST gfst;
StdVectorFst *fst = StdVectorFst::Read("fst.fst");
Map(fst, &gfst, ToGallicMapper<LogArc, STRING_LEFT>());
the error msg is the following:
Error 1 error C2040: 'fst' : 'fst::StdVectorFst *' differs in levels of indirection from 'fst::StdVectorFst'Is it also possible to push the string weights along the arc?
Error 1 error C2664: 'fst::GallicArc<A,S> fst::ToGallicMapper<A,STRING_LEFT>::operator ()(const A &) const' : cannot convert parameter 1 from 'const fst::ArcTpl<W>' to 'const fst::LogArc &'
fst is a StdArc Fst. As I said, ToGallicMapper<LogArc, STRING_LEFT> converts a LogArc Fst into a GallicArc<LogArc, STRING_LEFT>.
<eps> 0 x 1 y 2as far as undersatnd, the internal representation of a label x is 1. However, now i can have output label xx, xyx, xxx etc. How will these labels be internally represented? Is there any normalization function to bring the labels as close to the initial state as possible? Its not only the
1,1. The operation you want is pushing toward the initial state (FST.PushDoc), pushing the weights in the gallic semiring. If you are going to the gallic just for that, you could have simply used label pushing instead.
typedef VectorFst<GallicArc<LogArc, STRING_LEFT>> CustomFST;how exactly am i supposed to add the arc, e.g., how can you give xx as a parameter? How to define x as a symbol, and not xx. In the example that I am working on, all the output label are
-1when I display them using an iterator. Any hint why is that happening? 4. Another problem is, I cant draw the fst using fstdraw when i map the fst to Gallic. However, it works with StdVectorFst.
fst/push.h when pushing labels only.
3. In your case, you want to set the olabel to be the same as the ilabel. The string in \Delta^* should be part of the weight. Look at the fst/string-weight.h and fst/pair-product.h to see how to build a GallicWeight.
4. The GallicArcs are not registered by default to avoid code bloat. If you want the command line utilities to work you need to create a DSO file as described here. In you case MyArc is GallicArc<LogArc, STRING_LEFT> and my_arc is gallic_log.
0 1 1 1 200 0 100 1 1 1 1 100 1and expect to get as a result the minimal automaton
0 0 1 1 100 0 100However, calling
fstminimize produces the same non-minimal automaton. I also get a negative result from fstequivalent if I compare the non-minimal automaton and the equivalent minimal automaton. I tried to modify the non-minimal automaton with
fstpush --push_weights and --to_finalto see if it would produce the correct output for
fstminimize or fstequivalence but the results where the same as earlier.
I'm using OpenFst version 1.2.7.
Regards,
Erik
fstpush --push_weights --to_finalwithout the word "and".
fstpush --push_weights to have the required normalized property (the ⊕-sum of the outgoing transition weight and final weight at each state is 1) would require the use of initial weight, which the library does not support. This leads to the creation of an additional state to represent the initial weight. Hence in both machines, calling minimize lead to a machine with 2 states (instead of 1). We should update the documentation to make this clear.
The fact that equivalent returns that the machines are not equivalent is a bug due to the way we deal with the initial weight. Since the first machine is acyclic at the initial state, the initial weight is added to all the outgoing transition of the initial state. The second machine is cyclic at the initial state, so we cannot do the same thing. Instead a new initial state is created with an epsilon transition to the old initial state with weight the initial weight. Equivalent then treats this epsilon transition as a regular symbol and conclude that the machine are not equivalent. We will fix this bug in the next release of the library.
Best,
Cyril
ArcSort(&fst, StdOLabelCompare()); // sort locally created StdVectorFst cout << "number of states in model: " << (*model).NumStates() << endl; // runs fine ArcSort(model, StdILabelCompare()); // generates segfault cout << "check" << endl; // doesn't get printed StdVectorFst result; // create local FST to store result Compose(fst, *model, &result); // if second ArcSort is not performed, this generates segfault
Verify(*model) to make sure the model Fst is well-formed.
StdVectorFst *ifst = StdVectorFst::Read(iPath); EncodeMapper<StdArc> encoder(kEncodeLabels|kEncodeWeights, ENCODE); Encode(ifst, &encoder); Minimize(ifst); Decode(ifst, encoder); ifst->Write(oPath);
= 0 and Start() = -1
If you the ComposeFst or the Compose method with connect set to false. You Fst may have dead-end states. In this case you will need to look for paths that have final states or run the Connect algorithm and see if the WFST has NumStates() = 0 and Start() = -1.
>I'll try and generate some comparison curves for Juicer later in the week to see whether there are still issues there.fstcompile a.txt > a.fst fstcompile b.txt > b.fst (this is the fst representing the "some input word" you are referring to) fstcompose b.fst a.fst > c.fst (this is the fst representing the "output string" you are looking for) fstprint c.fstBest, Dogan
#include <fst/fstlib.h>
int main(int argc, char** argv){
string lm=argv[1];
fst::StdVectorFst *fstlm=fst::StdVectorFst::Read(lm);
delete fstlm;
cout << "Done!" << endl;
while (true) {}
return 0;
}
When reading a large (~1Gb) transducer, the delete operation produces memory leaks that I noticed using 'top'.
#include <fst/fstlib.h>
#define N 2
int main(int argc, char** argv){
string lm=argv[1];
fst::StdVectorFst *fstlm;
for (int i = 0; i < N; i++) {
fstlm=fst::StdVectorFst::Read(lm);
delete fstlm;
cout << "Done!" << endl;
}
while (true) {}
return 0;
}
On my 64-bit linux machine, top reports much more memory used during the while(true) {} loop for N=1 than for N=2.
Best, Cyril
fst/lock.h to make it thread-safe and prevent crashes:
//int Incr() const { return ++count_; }
//int Decr() const { return --count_; }
int Incr() const { return __sync_add_and_fetch(&count_, 1); }
int Decr() const { return __sync_sub_and_fetch(&count_, 1); }
These functions are available in GCC 4.1 or higher. Otherwise use mutex locks there.
Just wanted to share this info, in case other people needs threads too. Mike tells me something like that is used internally at Google but it's not included in the tarball for better code portability.
fstinfo your_fst | grep epsilons on each of your 3 Fsts and post the output below (between verbatim tags preferably)?
Cyril
fst_ed_comp2.fst ?
a:epsilon corresponding to deletions into input/output epsilon transitions.
0 1 a x 0 2 a y 1 3 b eps 2 3 c z 2 3 d w 3 4 eps a 4 5 s eps 3 5The resulting transducer after determinization is:
0 1 a eps 1 2 b x 1 3 c y 1 4 d y 2 5 eps a 2 3 6 eps z 3 7 eps z 4 6 eps w 4 7 eps w 5 8 s eps 6 8 s a 7 8Specifically in this case, I don't understand why the determinization algorithm creates an extra final state (7), while it can just make the state 6 final. Is this an expected behaviour? Thanks
--subsequential_label option to specify an input label other than epsilon for the transition form 3 to 7 leading to a machine that is indeed deterministic.
In your example, it would indeed be enough to make 6 final but in general this is difficult to determine since there might be other paths to 6 that would be affected by making 6 final.
If you do not want to use the --subsequential_label, an other solution to obtain a deterministic transducer would be to subsequently apply to the output of determinization: Encode (labels),
Determinize, Decode.
Best, Cyril
cd openfst-1.2.5/src/test g++ -DHAVE_CONFIG_H -I./../include -D_REENTRANT -g -O2 -MT fst_test.o -MD -MP -MF .deps/fst_test.Tpo -c -o fst_test.o fst_test.cc -Wall
fstcompose automatically connects the output so you should use --connect=false if you want to see the non coaccessible paths.
Cyril
#include <iostream>
#include <fstream>
#include <stdlib.h>
#include <string>
#include <cstring>
#include <fst/fstlib.h>
#include <fst/compact-fst.h>
#include <fst/extensions/far/farlib.h>
using namespace fst;
int main()
{
int fstnum = 3;
string fstnames[fstnum];
fstnames[0]="TA";
fstnames[1]="TF";
fstnames[2]="TB";
//------------------Generate [population] of FSTs--------------------//
// A vector FST is a general mutable FST
StdVectorFst TA;
// Adds state 0 to the initially empty FST and make it the start state.
TA.AddState(); // 1st state will be state 0 (returned by AddState)
TA.SetStart(0); // arg is state ID
// Adds two arcs exiting state 0.
// Arc constructor args: ilabel, olabel, weight, dest state ID.
TA.AddArc(0, StdArc(0, 0, 0, 0)); // 1st arg is src state ID
TA.SetFinal(0,0); // SetFinal (StateID, weight)
//We can save this FST to a file with:
TA.Write("onestate/TA.fst");
StdVectorFst TF;
// Adds state 0 to the initially empty FST and make it the start state.
TF.AddState(); // 1st state will be state 0 (returned by AddState)
TF.SetStart(0); // arg is state ID
// Adds two arcs exiting state 0.
// Arc constructor args: ilabel, olabel, weight, dest state ID.
TF.AddArc(0, StdArc(0, 1, 0, 0)); // 1st arg is src state ID
TF.AddArc(0, StdArc(1, 0, 0, 0)); // 1st arg is src state ID
TF.SetFinal(0,0); // SetFinal (StateID, weight)
//We can save this FST to a file with:
TF.Write("onestate/TF.fst");
//------------------randomly select 2 FSTs----------//
//--------------------Compose 2 FSTs----------------//
// The FSTs must be sorted along the dimensions they will be joined.
// In fact, only one needs to be so sorted.
// This could have instead been done for "model.fst" when it was created.
ArcSort(&TA, StdOLabelCompare());
ArcSort(&TF, StdILabelCompare());
// Container for composition result.
StdVectorFst result;
// Create the composed FST.
Compose(TA, TF, &result);
result.Write("onestate/TB.fst");
//---------compute and store interaction network complexity-------//
//---------print interaction network---------------//
int cmdtst = system(NULL);
if (cmdtst != 0 )
{
for (int i=0; i<fstnum; i++)
{
string sh1="fstdraw onestate/.fst onestate/.dot";
string sh2="dot -Tps onestate/.dot > onestate/.ps";
string sh3="evince onestate/.ps &";
string fstname = fstnames[i];
sh1.insert(17,fstname); sh1.insert(33,fstname);
sh2.insert(18,fstname); sh2.insert(36,fstname);
sh3.insert(16,fstname);
system(sh1.c_str()); system(sh2.c_str()); system(sh3.c_str());
}
} else { cout << "No command interpreter available \n"; };
return 0;
}
src/include/extensions/far/sttable.h to:
WriteType(stream_, static_cast<int64>(positions_.size()));
This will be fixed in the next release. Thanks to Richard for his help on this!
FAST_LOG_PROB_ARC_SELECTORwas accidentally included as option for fstrandgen.cc. It is not defined anywhere else as far as I see. My first compilation succeeded on Cygwin after I commented out that option. Cheers
FAST_LOG_PROB_ARC_SELECTOR should have been defined in the enum in script/randgen.h.
hashtable header file is available at FST.CompilingOnMacOSX.
./configure CXX=g++-4.exe CC=gcc-4.exe --prefix=$HOME/local make #make install # <--- I haven't made it here though.Make aborts with the error message:
g++-4.exe -g -O2 -o fstarcsort.exe fstarcsort.o -ldl ../lib/.libs/libfst.a ../script/.libs/libfstscript.a /usr/lib/gcc/i686-pc-cygwin/4.3.4/libstdc++.dll.a -L/usr/lib/gcc/i686-pc-cygwin/4.3.4 -L/usr/lib/gcc/i686-pc-cygwin/4.3.4 ../script/.libs/libfstscript.a(fst-class.o):fst-class.cc:(.text$_ZN3fst16CompatPropertiesEyy[fst::CompatProperties(unsigned long long, unsigned long long)]+0x143): undefined reference to `fst::PropertyNames' ~~~----ETCETERA----~~~ collect2: ld returned 1 exit status make[3]: *** [fstarcsort.exe] Error 1 make[2]: *** [all-recursive] Error 1 make[1]: *** [all-recursive] Error 1 make: *** [all] Error 2so it looks as if the linker doesn't find the definition of CompatProperties.
libtool: link: warning: undefined symbols not allowed in i686-pc-cygwin shared librariesOnce here:
/bin/sh ../../libtool --tag=CXX --mode=link g++-4.exe -g -O2 -version-info 0:0:0 -o libfst.la -rpath /usr/local/lib compat.lo flags.lo fst.lo properties.lo symbol-table.lo util.loand once again here:
/bin/sh ../../libtool --tag=CXX --mode=link g++-4.exe -g -O2 -version-info 0:0:0 -o libfstscript.la -rpath /usr/local/lib arcsort.lo closure.lo compile.lo compose.lo concat.lo connect.lo convert.lo decode.lo determinize.lo difference.lo draw.lo encode.lo epsnormalize.lo equal.lo equivalent.lo fst-class.lo info.lo intersect.lo invert.lo map.lo minimize.lo print.lo project.lo prune.lo push.lo randequivalent.lo randgen.lo relabel.lo replace.lo reverse.lo reweight.lo rmepsilon.lo script-impl.lo shortest-distance.lo shortest-path.lo synchronize.lo text-io.lo topsort.lo union.lo weight-class.lo
if (ReplaceFst<StdArc>(pairlabelfsts, slabel).CyclicDependencies()) { ...}
returns true, apparently detecting the cyclic dependencies correctly.
Question: Have I used the proper idioms for testing for cyclic dependencies and doing the actual replacement?
#include <fst/fstlib.h>
#include <iostream>
using namespace fst ;
int main() {
std::cout << "Hello, world!\n" ;
int a = 97 ; // code point values
int b = 98 ;
// example with cyclic dependencies
// StdVectorFst is a typedef for VectorFst<StdArc>
StdVectorFst fst1, fst2, fst3 ;
// define fst1, a subnetwork (will be arbitrarily associated with -1)
fst1.AddState() ; // return 0
fst1.SetStart(0) ;
fst1.AddState() ; // returns 1
fst1.AddState() ; // returns 2
fst1.SetFinal(2, StdArc::Weight::One()) ;
fst1.AddArc(0, StdArc(a, a, StdArc::Weight::One(), 1)) ;
// fst1 refers to fst2 (associated below with -2)
fst1.AddArc(1, StdArc(0, -2, StdArc::Weight::One(), 2)) ;
// src i o w dest
// define fst2, a subnetwork (will be arbitrarily associated with -2)
fst2.AddState() ; // return 0
fst2.SetStart(0) ;
fst2.AddState() ; // return 1
fst2.AddState() ; // return 2
fst2.SetFinal(2, StdArc::Weight::One()) ;
fst2.AddArc(0, StdArc(b, b, StdArc::Weight::One(), 1)) ;
// fst2 refers to fst1 (associated below with -1)
fst2.AddArc(1, StdArc(0, -1, StdArc::Weight::One(), 2)) ;
// define fst3, the base network (will be arbitrarily associated with -3)
fst3.AddState() ; // return 0
fst3.SetStart(0) ;
fst3.AddState() ; // return 1
fst3.AddState() ; // return 2
fst3.SetFinal(2, StdArc::Weight::One()) ;
// add arcs with "references" to subnetworks
// output-side -1 will be associated with fst1
// output-side -2 will be associated with fst2
// The label references to subnetworks are on the Output Side.
// The corresponding Input-Side labels can be anything (here 0 for epsilon).
fst3.AddArc(0, StdArc(0, -1, StdArc::Weight::One(), 1)) ;
fst3.AddArc(1, StdArc(0, -2, StdArc::Weight::One(), 2)) ;
// src i o w dest
// set up the vector associating arbitrary numbers with the three networks
// -1 refers to fst1
vector< pair< StdArc::Label, const Fst<StdArc> * > > pairlabelfsts ;
pairlabelfsts.push_back(pair< StdArc::Label, const Fst<StdArc> *>(-1, &fst1) ) ;
// -2 refers to fst2
pairlabelfsts.push_back(pair< StdArc::Label, const Fst<StdArc> *>(-2, &fst2) ) ;
// -3 is the base network
pairlabelfsts.push_back(pair< StdArc::Label, const Fst<StdArc> *>(-3, &fst3) ) ;
StdArc::Label slabel = -3 ; // specifies the "base" Fst, which is fst3
fst3.Write("before.fst") ; // the network with references
// Using delayed ReplaceFst
if (ReplaceFst<StdArc>(pairlabelfsts, slabel).CyclicDependencies()) {
std::cout << "Has cyclic dependencies.\n" ;
} else {
std::cout << "No cyclic dependencies.\n" ;
StdVectorFst expandedFst ; // the result
// slabel (here -3) is used to find the base Fst supplied in pairlabelfsts
expandedFst = ReplaceFst<StdArc>(pairlabelfsts, slabel) ;
expandedFst.Write("intermediate.fst") ; // the network just after ReplaceFst()
RmEpsilon(&expandedFst) ;
expandedFst.Write("after.fst") ; // the network after RmEpsilon()
}
std::cout << "Good-bye, world!\n" ;
}
fstmap -map_type=rmweight s.fst | fstproject > s.in.fst fstmap -map_type=rmweight s.fst | fstproject -project_output > s.out.fst2. compose shortest input with F and project the result onto the output labels to get the candidate paths in F compose shortest output with H and project the result onto the input labels to get the candidate paths in H
fstcompose s.in.fst f.fst | fstproject -project_output > f.out.fst fstcompose h.fst s.out.fst | fstproject > h.in.fst3. compose the candidate paths with G (F_out o G o H_in) to obtain the paths in G that survive during the composition
fstcompose f.out.fst g.fst | fstcompose - h.in.fst > internal.fst4. finally apply shortestpath to this fst to obtain the path in G internal to the shortest path in F o G o H
fstshortestpath internal.fst > shortest.internal.fstThis final transducer should provide the internal path you need. Note that the weights of this transducer are those obtained during the composition not the original weights of G. Best, Dogan
Thanks a lot in advance,
Stefan
LogArc is not compiled into the fstshortestpath utility and the code tries to locate a shared object containg the definition of that arc type.
Best, Cyril
my-arc.cc (the source for my-arc.so) should look like:
#include "nlp/fst/lib/fst-inl.h"
#include "nlp/fst/lib/const-fst-inl.h"
#include "nlp/fst/script/register.h"
#include "nlp/fst/script/fstscript.h"
#include "my-arc.h"
namespace fst {
namespace script {
REGISTER_FST(VectorFst, MyArc);
REGISTER_FST(ConstFst, MyArc);
REGISTER_FST_CLASSES(MyArc);
REGISTER_FST_OPERATIONS(MyArc);
}
}
Hope this will work for you.
Cyril
-inl 's, the lib/ 's and the nlp/ 's. This gives:
#include <fst/fst.h> #include <fst/const-fst.h> #include <fst/script/register.h> #include <fst/script/fstscript.h>
/usr/local/lib to your LD_LIBRARY_PATH environment variable:
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/libBest, Cyril
potentials.txt input is the ouptut of fstshortestdistance.
fstshortestdistance a.fst > potentials.txt fstreweight --to_initial a.fst potentials.txt > b.fstThis is equivalent to:
fstpush --push_weights --to_initial a.fst > b.fstThis is actually the way pushing is implemented in the library, shortest-distance followed by reweighting.
Ld build/Debug/fstprintallpaths normal x86_64
cd /.../fstprintallpaths
setenv MACOSX_DEPLOYMENT_TARGET 10.6
/Developer/usr/bin/g++-4.2 -arch x86_64 -isysroot /Developer/SDKs/MacOSX10.6.sdk -L/Users/michael/code/fstprintallpaths/build/Debug -L/usr/local/lib -L/usr/local/lib/Cabal-1.6.0.2 -L/usr/local/lib/ImageMagick-6.4.1 -L/usr/local/lib/pkgconfig -L/usr/local/lib/Cabal-1.6.0.2/ghc-6.10.4 -L/usr/local/lib/ImageMagick-6.4.1/config -L/usr/local/lib/ImageMagick-6.4.1/modules-Q16 -L/usr/local/lib/Cabal-1.6.0.2/ghc-6.10.4/Distribution -L/usr/local/lib/Cabal-1.6.0.2/ghc-6.10.4/Language -L/usr/local/lib/ImageMagick-6.4.1/modules-Q16/coders -L/usr/local/lib/ImageMagick-6.4.1/modules-Q16/filters -L/usr/local/lib/Cabal-1.6.0.2/ghc-6.10.4/Distribution/Compat -L/usr/local/lib/Cabal-1.6.0.2/ghc-6.10.4/Distribution/PackageDescription -L/usr/local/lib/Cabal-1.6.0.2/ghc-6.10.4/Distribution/Simple -L/usr/local/lib/Cabal-1.6.0.2/ghc-6.10.4/Language/Haskell -L/usr/local/lib/Cabal-1.6.0.2/ghc-6.10.4/Distribution/Simple/Build -L/usr/local/lib/Cabal-1.6.0.2/ghc-6.10.4/Distribution/Simple/GHC -L. -F/Users/michael/research/projects/lattice_parsing/code/fstprintallpaths/build/Debug -filelist /Users/michael/research/projects/lattice_parsing/code/fstprintallpaths/build/fstprintallpaths.build/Debug/fstprintallpaths.build/Objects-normal/x86_64/fstprintallpaths.LinkFileList -mmacosx-version-min=10.6 -o /Users/michael/research/projects/lattice_parsing/code/fstprintallpaths/build/Debug/fstprintallpaths
Undefined symbols:
"fst::Int64ToStr(long long, std::basic_string<char, std::char_traits<char>, std::allocator<char> >*)", referenced from:
fst::FloatWeightTpl<float>::GetPrecisionString() in fstprintallpaths.o
"fst::FstHeader::Read(std::basic_istream<char, std::char_traits<char> >&, std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool)", referenced from:
fst::Fst<fst::ArcTpl<fst::TropicalWeightTpl<float> > >::Read(std::basic_istream<char, std::char_traits<char> >&, fst::FstReadOptions const&)in fstprintallpaths.o
ld: symbol(s) not found
collect2: ld returned 1 exit status
-----------------------------------------------------------------
fstprintallpaths.cpp:
#include "fstprintallpaths.h"
#include <iostream>
#include <vector>
#include <fst/fstlib.h>
#include <fst/fst-decl.h>
#include <fst/vector-fst.h>
#include <fst/weight.h>
using namespace fst;
int main (int argc, char * const argv[]) {
// insert code here...
std::cout << "Hello, World!\n";
StdFst *input = StdFst::Read("blah.fst");
//StdFst *model = StdFst::Read("blah.fst");
//printAllStrings(&fst);
return 0;
}
string itos(int i) { // convert int to string; from Bjarne Stroustrup's FAQ
stringstream s;
s << i;
return s.str();
}
string vectorToString(vector<int> v) {
if(v.size() == 0)
return "<>";
string result = "<" + itos(v[0]);
for(int i = 1; i < v.size(); i++) {
result + "," + itos(v[i]);
}
return result + ">";
}
void printAllStringsHelper(VectorFst<StdArc> &fst, int state, vector<int> str, TropicalWeight cost) {
if(fst.Final(state) == TropicalWeight::Zero())
cout << vectorToString(str) << " with cost " << (Times(cost,fst.Final(state))) << endl;
for(ArcIterator <VectorFst <StdArc> > aiter(fst,state); ! aiter.Done(); aiter.Next()) {
StdArc arc = aiter.Value();
str.push_back(arc.ilabel);
printAllStringsHelper(fst, arc.nextstate, str, Times(cost, arc.weight.Value()));
str.pop_back();
}
}
// a bad idea if there are loops :)
void printAllStrings(VectorFst<StdArc> &fst) {
vector<int> str(0);
TropicalWeight tw(TropicalWeight::One());
printAllStringsHelper(fst, 0, str, tw);
}
-----------------------------------------------------------------
fstprintallpaths.h
/*
* fstprintallpaths.h
* fstprintallpaths
*
*/
#include <iostream>
#include <vector>
#include <fst/fstlib.h>
#include <fst/fst-decl.h>
#include <fst/vector-fst.h>
#include <fst/weight.h>
using namespace fst;
int main(int argc, char * const argv[]);
string itos(int i);
string vectorToString(vector<int> v);
void printAllStringsHelper(VectorFst<StdArc> &fst, int state, vector<int> str, TropicalWeight cost);
void printAllStrings(VectorFst<StdArc> &fst);
$ g++ fstprintallpaths.cpp fstprintallpaths.h -I /usr/local/include
Undefined symbols:
"fst::FstHeader::Read(std::basic_istream<char, std::char_traits<char> >&, std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool)", referenced from:
fst::Fst<fst::ArcTpl<fst::TropicalWeightTpl<float> > >::Read(std::basic_istream<char, std::char_traits<char> >&, fst::FstReadOptions const&)in ccspqZtg.o
"fst::Int64ToStr(long long, std::basic_string<char, std::char_traits<char>, std::allocator<char> >*)", referenced from:
fst::FloatWeightTpl<float>::GetPrecisionString() in ccspqZtg.o
ld: symbol(s) not found
collect2: ld returned 1 exit status
$ fstinfo a_binary.fst fst type vector arc type standard input symbol table ia.txt output symbol table oa.txt # of states 24 # of arcs 9095 initial state 0 # of final states 4 # of input/output epsilons 0 # of input epsilons 0 # of output epsilons 9095 # of accessible states 24 # of coaccessible states 24 # of connected states 24 # of strongly conn components 24 expanded y mutable y acceptor n input deterministic n output deterministic n input/output epsilons n input epsilons n output epsilons y input label sorted n output label sorted y weighted y cyclic n cyclic at initial state n top sorted y accessible y coaccessible y string n $ fstinfo a_rmeps.fst fst type vector arc type standard input symbol table ia.txt output symbol table oa.txt # of states 24 # of arcs 645 initial state 0 # of final states 4 # of input/output epsilons 0 # of input epsilons 0 # of output epsilons 645 # of accessible states 24 # of coaccessible states 24 # of connected states 24 # of strongly conn components 24 expanded y mutable y acceptor n input deterministic n output deterministic n input/output epsilons n input epsilons n output epsilons y input label sorted n output label sorted y weighted y cyclic n cyclic at initial state n top sorted y accessible y coaccessible y string n $ fstinfo a_determinized.fst fst type vector arc type standard input symbol table ia.txt output symbol table oa.txt # of states 1026 # of arcs 22054 initial state 0 # of final states 906 # of input/output epsilons 0 # of input epsilons 0 # of output epsilons 22054 # of accessible states 1026 # of coaccessible states 1026 # of connected states 1026 # of strongly conn components 1026 expanded y mutable y acceptor n input deterministic y output deterministic n input/output epsilons n input epsilons n output epsilons y input label sorted y output label sorted y weighted y cyclic n cyclic at initial state n top sorted y accessible y coaccessible y string n
#include "fst/fstlib.h"
using namespace std;
using namespace fst;
int main()
{
StdVectorFst fst1;
StdVectorFst fst2;
StdVectorFst fst3;
ArcSort(fst1, StdOLabelCompare());
fst3 = ComposeFst(fst1, fst2);
return 0;
}
ArcSort(&fst1, StdOLabelCompare()); fst3 = StdComposeFst(fst1, fst2);
PhiMatcher (or maybe RhoMatcher, I'm not completely sure from your message). See the matcher documentation here.
Best,
Cyril
verbatim tags.
As for your original question, you want to use Composition.
Thanks!
Camille
ShortestPath expands the full machine by default. However, there is a way to avoid this behaviour: by using the shortest-first queue discipline and setting the option first_path to true.
ComposeFstOptions opts(); opts.gc_limit=0; // This disable the caching of the result of composition // and should lower the memory usage at no computational cost since // shortest-path should visit each state in the composed machine at most once. ComposeFst<Arc> cfst(fst1, fst2, opt); NaturalShortestFirstQueue<StateId, Weight> state_queue(distance); ShortestPathOptions<Arc, NaturalShortestFirstQueue<StateId, Weight>, AnyArcFilter<Arc> > opts(&state_queue, AnyArcFilter<Arc>()); opts.first_path = true; ShortestPath(cfst, ofst, &distance, opts);This example assume composing two machines. In you case
fst1 could be I o T (itself delayed or not, since it is a smaller machine both might be worth tryingboth) and fst2 could be O.
This should improve things a lot although it should still be slower than dedicated tools. I am very interested in this, so let me know how things turn out.
Best,
Cyril
The issue is how to convert that weighted automaton over the string into a transducer. Currently, we use the second step of the epsilon-normalization algorithm to do this. This is also what the cause the operation to be non idempotent.
One way to partially avoid the problem, would be to first reverse the transducer, apply label-pushing towards the initial state and reverse again:
fstreverse A.fst | fstpush --push_labels | fstreverse > PA.fstThis would result in:
The extra epsilon transitions are due to the use of reverse.
Best,
Cyril
ConstFst and VectorFst.
For ConstFst, the memory requirement is 20 bytes per state. and 16 bytes per arc. For VectorFst, the per-state memory requirement is about twice as much due to the overhead introduced by STL data structures.
Doing the math, you will see that you can easily represent machines with hundred of millions of states and transitions in a few GB of RAM.
If memory usage is critical, the CompactFst class can also be used to provide more efficient memory representations.
This addresses the memory requirement for representing an FST in memory. But, as Dogan pointed out, your memory usage will also depends on which algorithms you want to use.
Best,
Cyril
A Chinese lexicon L: # of states 215371 # of arcs 303286 A Chinese bigram G: # of states 87918 # of arcs 16822088 Run fstcompose to get L o G # of states 391202 # of arcs 17125372 After determinization by running fstdeterminize, det(L o G) # of states 3479987 # of arcs 20071667 more complex !I don't know why. I follows exactly the paper "Mehryar Mohri, Fernando C. N. Pereira, and Michael Riley. Speech Recognition with Weighted Finite-State Transducers. In Handbook on Speech Processing and Speech Communication, 2008". The output word label in L is already placed on the initial transitions of the word. The code is as follows:
fstarcsort.exe --sort_type=olabel L.fst L.fst.sort fstarcsort.exe --sort_type=ilabel G.fst G.fst.sort fstcompose.exe L.fst.sort G.fst.sort LG.fst fstdeterminize.exe LG.fstb LG.fst.det fstminimize.exe LG.fst.det LG.fst.min Memory allocation failedI run 64-bit openFST on windows 2003 server 64bit edition. The fstminimize.exe returns "Memory allocation failed" is a surprise. 64bit memory space is not sufficient? Any suggestion that you could offer would be much appreciated. Best, Zhijian
fstencode to encode the labels, apply minimization and use fstencode to decode (see FST.EncodeDecodeDoc for more details).
Best,
Cyril
r eh d #0 read r eh d #1 read ... #0 </s>Am I correct? (btw: if I replace the < / s > in G with < eps >, the G is not deterministic. This is not a good way. So I propose the above way.) Thanks for your patience. Your help is greately appreciated. Best, Zhijian
</s> in G and adding </s> as a word with empty pronunciation (or silence) in L. The GRM library also allowed replace </s> in G by a final weight.
Best,
Cyril
VectorFst<StdArc> ifst; VectorFst<StdArc> ofst; VectorFst<LogArc> lfst; Map(ifst,&lfst,StdToLogMapper()); Minimize(&lfst); Map(lfst,&ofst,LogToStdMapper());Best, Dogan
//Building 3 trivial fsts for fst replacement
VectorFst<StdArc> fst1,fst2,fst3;
fst1.AddState();
fst1.AddState();
fst1.AddArc(0, StdArc(0, 123456789, StdArc::Weight::One(), 1));
fst1.SetFinal(1,StdArc::Weight::One());
fst1.SetStart(0);
fst2.AddState();
fst2.AddState();
fst2.AddArc(0, StdArc(0, 1234567890, StdArc::Weight::One(), 1));
fst2.SetFinal(1,StdArc::Weight::One());
fst2.SetStart(0);
fst3.AddState();
fst3.AddState();
fst3.AddState();
fst3.SetFinal(2,StdArc::Weight::One());
fst3.SetStart(0);
fst3.AddArc(0,StdArc(0, 1000, StdArc::Weight::One(), 1));
fst3.AddArc(1,StdArc(0, 2000, StdArc::Weight::One(), 2));
vector< pair< StdArc::Label, const Fst<StdArc> * > > pairlabelfsts;
pairlabelfsts.push_back(pair< StdArc::Label, const Fst<StdArc> * >(1000,&fst1) );
pairlabelfsts.push_back(pair< StdArc::Label, const Fst<StdArc> * >(2000,&fst2) );
pairlabelfsts.push_back(pair< StdArc::Label, const Fst<StdArc> * >(3000,&fst3) );
VectorFst<StdArc> rulefst;
StdArc::Label slabel=3000;
#ifndef USEOPENFSTBETA
rulefst=ReplaceFst<StdArc>(pairlabelfsts,ReplaceFstOptions<StdArc>(slabel,false));
#else
rulefst=ReplaceFst<StdArc>(pairlabelfsts,ReplaceFstOptions(slabel,false));
#endif
RmEpsilon(&rulefst); //valgrind complains here
rulefst.Write(some.fst); // and valgrind complains here too
return 0;
}
(I may submit a link to a full example code)
Cheers!
Union(RationalFst<A> *fst1, const Fst<A> &fst2) as follows:
UnionFst<A> *union = new UnionFst<A>(fsts[0], fsts[1]); for (int i = 2; i < fsts.size(); ++i) Union(union, fsts[i]);This takes advantage of the fact that UnionFst derives from RationalFst and that delayed rational operations such as union can be applied destructively to a RationalFst. Cyril
0 1 X a 1 2 <eps> b 2 0 <eps> <eps> 2where
X is any symbol in your input alphabet (You need to add a transition in the form "0 1 X a" for each X in your alphabet) and compose the regex with T.
0 1 X <eps> 1 0 <eps> <eps> 1 2 <eps> a 2 3 <eps> b 3where
X is any symbol in your input alphabet and compose the regex with T.
You might achieve the same behaviour using sigma matcher in composition (At least I believe it is possible).
You might even try (for "x y z" -> "a b"): The output from "fsmcompile -?" is FSM Version 3.7 Copyright (C) 1998 AT&T Corp. All rights reserved. Usage: fsmcompile [-opts] [fsm] Options: -i symbols input arc symbols -o symbols output arc symbols -s symbols state symbols -t transducer -V file input potentials file -F file output FSM file -? info/options Continue with my question 3. In openFST, we can use --arc_type to specify the semiring. But we can run fstcompile without --arc_type, such as fstcompile --isymbols=isyms.txt --osymbols=osyms.txt text.fst binary.fst The default --arc_type is "standard", which means tropical semiring. Am I right? But in fsmLib, if we run fsmcompile as follow: fsmcompile -t -i i.syms -o o.syms < T.txt > T.fst It is not clear what semiring is used. Thanks for your attention.Best, ZhijianOu
SetFlags(usage, &argc, &argv, true);In that case, the user can set his own flags as well, following the conventions in <fst/flags.h>. My problem is that when I add a new flag to a custom fst binary, I need to put its definition along with other flag definitions in <src/bin/main.cc> or in <src/lib/flags.cc> and rebuild the library, otherwise
SetFlags method does not recognize it.
I have the generic OpenFst binary setup:
fstbinary.cc -- includes binary-main.h, calls SetFlags and Arc templated BinaryMain function
binary-main.h -- includes <fst/main.h>, declares a new flag using DECLARE_type(newflag) and defines BinaryMain function
Defining the new flag using DEFINE_type(newflag,default,help_message) in fstbinary.cc or binary-main.h allows me to use the newflag with its default value, however i cannot pass it from the command line.
Is it possible to to define and use new flags in a user-defined binary without modifying the library source code? If so, could you provide an example?
Many thanks, Dogan Can
#include "test-main.h"
namespace fst
{
REGISTER_FST_MAIN(TestMain, StdArc);
REGISTER_FST_MAIN(TestMain, LogArc);
}
DEFINE_double(myflag, 1234, "My Flag");
using fst::CallFstMain;
int main(int argc, char **argv)
{
string usage = "Test flags \n\n Usage: ";
usage += argv[0];
usage += " Flags: myflag\n";
std::set_new_handler(FailedNewHandler);
SetFlags(usage.c_str(), &argc, &argv, true);
if (argc !=1)
{
ShowUsage();
return 1;
}
return CallFstMain("TestMain", argc, argv, FLAGS_arc_type);
}
#ifndef TEST_MAIN_H__
#define TEST_MAIN_H__
#include <fst/main.h>
DECLARE_double(myflag);
DECLARE_string(arc_type);
namespace fst
{
template <class Arc>
int TestMain(int argc, char **argv, istream&, const FstReadOptions&)
{
cerr << "myflag=" << FLAGS_myflag << endl;
return 0;
}
}
#endif
min( det( *C* o det( *L* o *G* ) ) )
eh+d-#1will be replaced with '-'
For further determinizations at the context-dependent phone level and distribution level, each auxiliary phone must be mapped to a distinct context-dependent phone. Thus, self-loops are added at each state of C mapping each auxiliary phone to a new auxiliary context-dependent phone. The augmented context-dependency transducer is denoted by C.
so i suppose that the mapping should be one-to-one, rather that one-to-many, which is what my current approach yields.
another clearly negative consequence of my current botched approach is that the final auxiliary-symbol removal step results in lost context wherever such auxiliary symbols appear. for example, the following transducer
0 1 SIL+r-eh red 1 2 r+eh-d - 2 3 eh+d-#2 - 3 4 d+#2-b - 4 5 #2+b-ah ball 5 6 b+ah-l - ... ...defaults to,
0 1 SIL+r-eh red 1 2 r+eh-d - 2 3 - - 3 4 - - 4 5 - ball 5 6 b+ah-l - ... ...which, although i can actually use it for recognition, is obviously wrong. so finally my question: are there any other papers which provide either diagrams of an augmented component context-dependent triphone transducer, or a more detailed description of the I(M) operation described in "Generalized Optimization...",
For any transducer M, we also denote by I(M) the transducer augmented with extra transitions such that each new symbol on its input side is mapped to some new and distinct output symbol.
try as I might still don't quite get this. sorry if this seems a foolish question but im teaching this to myself and dont really know where else i might ask.
The same example, inverted, and augmented with 2 auxiliary phones, #0, and #1
Assuming we have an example grammar transducer of the following form,
0 1 a testa 1 2 b - 2 3 #0 - 3 4 a testb 4 5 b - 5 6 #1 - 6
Then composition with the CD transducer yields,
where the auxiliary triphone symbols can now be safely replaced with epsilons, or short-pauses, or removed without any loss of information.
a b c d e f into a:eps eps:b c:eps eps:d.
This can easily be done using two two-state transducers T_1 and T_2 and computing T_1 o A o T_2. T_2 is the transducer erasing any symbol in odd position: for all a in the alphabet,
0 1 a eps 1 0 a a 1
T_1 is the inverse of the reverse of T_2
2) Apply Synchronize() followed by RmEpsilon().
Cyril
< if (!istrm) {
> if (!*istrm) {
2. <lexicographic-weight.h> defines pair weight type using "_<_" symbol between component weight types. However "<" symbol creates problems say when you want to register LexicographicArc for fst binaries since you can neither supply type1_<_type2 as an arc type nor create a dynamic shared object named type1_<_type2-arc.so.
Solution: change line 00075 in <lexicographic-weight.h>
< static const string type = W1::Type() + "_<_" + W2::Type();
> static const string type = W1::Type() + "_L_" + W2::Type();Best, Dogan Can
explicit SymbolTableImpl(const SymbolTableImpl& impl)
: name_(impl.name_),
available_key_(0),
dense_key_limit_(0),
check_sum_finalized_(false) {
for (size_t i = 0; i < impl.symbols_.size(); ++i) {
AddSymbol(impl.symbols_[i], impl.Find(impl.symbols_[i]));
}
}
* Am I right ?
* if yes, i don't see the use of having a reference counting on a deep copy
* if yes, the following comment in symbol-table.h is not true anymore :
// SymbolTables are reference counted and can therefore be shared across // multiple machines. For example a language model grammar G, with a // SymbolTable for the words in the language model can share this symbol // table with the lexical representation L o G.Thank you for your answer and for this great library !
SymbolTable(const SymbolTable& table) : impl_(table.impl_) {
impl_->IncrRefCount();
}
The copy constructor of SymbolTableImpl is not called since impl_ is a pointer to a SymbolTableImpl.
Best,
Cyril
$myfst = a* b+ ((c|d|e|f) - (m|n|d)){2} ((dog|cat|rat) & (elephant|dog|pig))? ;
After the regular expression is parsed, the interpreter walks the parse tree and makes multiple calls to OpenFst library functions--Closure, Compose, Concat, Difference, Intersect, etc--building many intermediate Fsts and combining them using the various operations until the final Fst is created and (in this case) bound to the variable $myfst. Obviously, all this processing gets done automatically, without human intervention. The user is interested only in the final result, $myfst, which then might get used in subsequent regular expressions.
Ideally, the final result would be automatically determinized and minimized.
I understand, with help from Cyril , that if an Fst is tested and found to be 1) acyclic, or 2) both (unweighted AND an Acceptor) then it can be safely Determinized and Minimized. (Corrections would be welcome.) So after each intermediate or final Fst is created by the Kleene interpreter, it might be sent to the following clean-up function
// first-draft pseudo-code
if ( the fst is acyclic || (the fst is unweighted && the fst is an acceptor) ) {
RmEpsilon()
Determinize()
Minimize()
} else {
// this is questionable, see below
RmEpsilon()
}
However, I'm aware that this could still be problematic. In a response on this forum, Cyril stated
"Indeed, in some situations you cannot remove epsilons because it will lead to a space blow up. In that case, you might still want to optimize the machine with epsilons [i.e. without first invoking RmEpsilon] using determinization and minimization."
In another response, Cyril stated
"Indeed, union introduces epsilon-transitions and epsilon-transitions will slow down composition. Non-determinism will also slow down composition. So, I would recommend you first epsilon-remove and then determinize your fst. However, there is always a risk with determinization (it might blow up or even not terminate). Hence, if what I suggest does not work better, you should investigate using determinize only, or rmepsilon only or determinize followed by rmepsilon."
This puts in question the 'else' clause in the pseudo-code above, where RmEpsilon() is automatically called on Fsts that don't satisfy the requirements for safe determinization and minimization.
Question: Assuming my Kleene scenario, where regular expressions are being interpreted and many Fsts (intermediate and final) are being created automatically, when can RmEpsilon(), Determinize() and Minimize() be safely invoked automatically? I want the interpreter to do all the optimizations that are mathematically safe, but to attempt nothing more.
We can assume that the interpreter can interrogate various features of the candidate Fst (cyclic vs. acyclic, weighted vs. unweighted, acceptor vs. transducer, etc.). The interpreter also knows exactly which operation has just been performed to produce the Fst in question (Union, Concat, Closure, etc.).
Thanks,
Ken
// first-draft pseudo-code
if ( the fst is acyclic || (the fst is unweighted && the fst is an acceptor) ) {
Determinize()
Minimize()
RmEpsilon()
Determinize()
Minimize()
} else {
// this is questionable, see below
RmEpsilon()
}
#include <ctime>
# of states 42980 # of arcs 183468 # of final states 923 # of input/output epsilons 43867and (determinized and minimized) dictionary L~:
# of states 1180 # of arcs 2424 # of final states 1 # of input/output epsilons 1when I try to compose these with fstcompose, it rapidly uses all available memory. I haven't let it run long enough to see if it actually terminates. I'm not sure this is specific to OpenFst, because I find that the FSM library has the same problem. It's possible that I'm building the language model incorrectly, though I'm not sure why this would cause composition to blow up. I am planning to do a sanity check using the GRM library but haven't figured this out yet. Is it necessary to use on-demand composition when building a transducer of this sort?
0 1 1 1 0 2 1 2 1 1 2 3 1 3 3 5 2 2 2 4 2 3 4 6 3
int32 magic_number = 2125659606;
string fst_type = "vector";
string arc_type = "standard";
int32 file_version = 2; /* file format version */
int32 flags; /* bitfield: HAS_ISYMBOLS = 1, HAS_OSYMBOLS = 2 */
uint64 properties; /* bitfield, defined in fst/lib/properties.h */
int64 start; /* start state ID */
int64 num_states;
int64 num_arcs;
/* input symbol table, if flags & HAVE_ISYMBOLS */
int32 magic_number = 2125658996;
string name;
int64 next_available_key; /* Next available key (number) */
int64 size;
struct {
string symbol;
int64 key;
} symbols[size];
/* output symbol table, if flags & HAVE_OSYMBOLS - see above... */
/* states and arcs */
struct {
float final_weight;
int64 num_arcs;
struct {
int ilabel;
int olabel;
float weight;
int next_state;
} arcs[num_arcs];
} states[num_states];
toautotools_openfst_20080422_20081227.diff patch -p1 -E < ../toautotools_openfst_20080422_20081227.diff autoreconf --install --force ./configure --prefix=$HOME/busr CPPFLAGS="-I/opt/local/include" LDFLAGS="-L/opt/local/lib" make make check make installI have enabled make check using tests already provided in OpenFst so that one can test whether modules can be dlopened. All tests pass for all platforms except Cygwin on Windows XP. Even for Cygwin on Windows XP, this patch enables a dynamic linked library to be built such that OCRopus is able to use it to pass its own OpenFst tests. (There is an additional failure of one OCRopus test, test-fst-search2.lua, on Mac OS X and FreeBSD.) My preliminary investigation for Cygwin on Windows XP appears to show that something is going wrong with reinterpret_cast-ing the read() function so that it can be recovered using a map in both fst/lib/register.h and fst/bin/main.h. The pointer to this function appears to be immediately recovered if the map is accessed right after it is stored, but a later GetEntry() returns a null pointer. I believe that using GNU autotools to provide dynamic linked libraries is an essential step to helping more wide distribution of OpenFst through standard platform package or port managers. I think the current patch just about finishes this task other than the above problem for Cygwin. I also have not tested this patch on any 64-bit platforms other than an Intel Macbook running Mac OS X.
template <class Arc>
void Determinize(MutableFst<Arc> *fst) {
*fst = DeterminizeFst<Arc>(*fst,
DeterminizeFstOptions(CacheOptions(true, 0)));
}
Best, Cyril
for (StateIterator<StdFst> siter(fst); !siter.Done(); siter.Next()) {
StateId state_id = siter.Value() ;
...
}
kPhiLabel (you will need to use the C++ library for this, Fsts containing transitions labeled by kPhiLabel cannot be written to a file).
To perform composition with failure transitions, you need to use
ComposeFst<Arc>(fst1, fst2, ComposeFstOptions<COMPOSE_FST1_PHI>())when
fst1 contains transitions labeled by kPhiLabel or ComposeFst<Arc>(fst1, fst2, ComposeFstOptions<COMPOSE_FST2_PHI>())when
fst2 contains transitions labeled by kPhiLabel.
Best, Cyril
| If you have an FST, is there an easy way to query the number of paths? Thanks, Ken |
KennethRBeesley | 13 Jun 2008 - 14:10 |
| The natural way is to use the shortest-distance algorithm: 1. Use Map to turn your FST into an FST over the log semiring setting the weight of every transition and the final weight of final states to LogWeight? ::One(). 2. Call ShortestDistance? with reverse == true. Let d be the value return in the distance vector for the initial state. d is then the -log of the number of successful paths in the FST. Best, Cyril |
CyrilAllauzen | 16 Jun 2008 - 12:56 |
| Thanks, Cyril. I suspect that my terminology was wrong and/or my C++ skills are still shaky. What I really need is the number of complete paths in an FST, from the start state to a final state. Following these instructions (as best I can), I seem to get the number of paths from the start state to any state (final or not). I hesitate to post my code here, but I'd be more than happy to send it to you or anyone else. Ken |
KennethRBeesley | 17 Jun 2008 - 16:48 |