Skip to content

Commit

Permalink
Documentation improvements (#236)
Browse files Browse the repository at this point in the history
* docs: fix unintentional link

* docs: add initial helpful list of differences from OpenFST

* docs: sync readme

* docs: document a few undocumented things

* docs: add more differences

* docs: add a quick start section

* docs: document everything on the front page and algorithms

* docs: sync readme

* docs: clean up list of differences

* docs: minor fix

* docs: note about not-generic weights

* docs: small corrections
  • Loading branch information
dhdaines authored Feb 23, 2023
1 parent 4f45d2b commit fe492c0
Show file tree
Hide file tree
Showing 18 changed files with 286 additions and 31 deletions.
103 changes: 101 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,8 +20,7 @@

<!-- cargo-sync-readme start -->

This repo contains a **Rust** implementation of Weighted Finite States Transducers.
Along with a **Python** binding.
Rust implementation of Weighted Finite States Transducers.

Rustfst is a library for constructing, combining, optimizing, and searching weighted
finite-state transducers (FSTs). Weighted finite-state transducers are automata where
Expand All @@ -43,6 +42,64 @@ results can be selected by shortest-path algorithms.

![fst](https://raw.githubusercontent.com/Garvys/rustfst-images-doc/master/images/project_in.svg?sanitize=true)

## Overview

For a basic [example](#example) see the section below.

Some simple and commonly encountered types of FSTs can be easily
created with the macro [`fst`] or the functions
[`acceptor`](utils::acceptor) and
[`transducer`](utils::transducer).

For more complex cases you will likely start with the
[`VectorFst`](fst_impls::VectorFst) type, which will be imported
in the [`prelude`] along with most everything else you need.
[`VectorFst<TropicalWeight>`](fst_impls::VectorFst) corresponds
directly to the OpenFST `StdVectorFst`, and can be used to load
its files using [`read`](fst_traits::SerializableFst::read) or
[`read_text`](fst_traits::SerializableFst::read_text).

Because "iteration" over an FST can mean many different things,
there are a variety of different iterators. To iterate over state
IDs you may use
[`states_iter`](fst_traits::StateIterator::states_iter), while to
iterate over transitions out of a state, you may use
[`get_trs`](fst_traits::CoreFst::get_trs). Since it is common to
iterate over both, this can be done using
[`fst_iter`](fst_traits::FstIterator::fst_iter) or
[`fst_into_iter`](fst_traits::FstIntoIterator::fst_into_iter). It
is also very common to iterate over paths accepted by an FST,
which can be done with
[`paths_iter`](fst_traits::Fst::paths_iter), and as a convenience
for generating text,
[`string_paths_iter`](fst_traits::Fst::string_paths_iter).
Alternately, in the case of a linear FST, you may retrieve the
only possible path with
[`decode_linear_fst`](utils::decode_linear_fst).

Note that iterating over paths is not the same thing as finding
the *shortest* path or paths, which is done with
[`shortest_path`](algorithms::shortest_path) (for a single path)
or
[`shortest_path_with_config`](algorithms::shortest_path_with_config)
(for N-shortest paths).

For the complete list of algorithms, see the [`algorithms`] module.

You may now be wondering, especially if you have previously used
such linguist-friendly tools as
[pyfoma](https://github.com/mhulden/pyfoma), "what if I just want
to *transduce some text*???" The unfriendly answer is that
rustfst is a somewhat lower-level library, designed for
implementing things like speech recognizers. The somewhat more
helpful answer is that you would do this by constructing an
[`acceptor`](utils::acceptor) for your input, which you will
[`compose`](algorithms::compose) with a
[`transducer`](utils::transducer), then
[`project`](algorithms::project) the result [to itsoutput](algorithms::ProjectType::ProjectOutput), and finally
[iterate over the paths](fst_traits::Fst::string_paths_iter) in
the resulting FST.

## References

Implementation heavily inspired from Mehryar Mohri's, Cyril Allauzen's and Michael Riley's work :
Expand All @@ -51,6 +108,10 @@ Implementation heavily inspired from Mehryar Mohri's, Cyril Allauzen's and Micha
- [OpenFst: A general and efficient weighted finite-state transducer library](https://link.springer.com/chapter/10.1007%2F978-3-540-76336-9_3)
- [Weighted finite-state transducers in speech recognition](https://repository.upenn.edu/cgi/viewcontent.cgi?article=1010&context=cis_papers)

The API closely resembles that of OpenFST, with some
simplifications and changes to make it more idiomatic in Rust, notably
the use of `Tr` instead of `Arc`. See [Differences fromOpenFST](#differences-from-openfst) for more information.

## Example

```rust
Expand Down Expand Up @@ -108,6 +169,44 @@ fn main() -> Result<()> {
}
```

## Differences from OpenFST

Here is a non-exhaustive list of ways in which Rustfst's API
differs from OpenFST:

- The default epsilon symbol is `<eps>` and not `<epsilon>`.
- Functions and methods follow Rust naming conventions,
e.g. `add_state` rather than `AddState`, but are otherwise mostly
equivalent, except that:
- Transitions are called `Tr` and not `Arc`, because `Arc` has a
rather different and well-established meaning in Rust, and rustfst
uses it (`std::sync::Arc`, that is) to reference-count symbol
tables. All associated functions also use `tr`.
- Final states are not indicated by a final weight of `zero`. You
can test for finality using [`is_final`](fst_traits::CoreFst::is_final), and
[`final_weight`](fst_traits::CoreFst::final_weight) returns an [`Option`]. This
requires some care when converting OpenFST code.
- Transitions can be accessed directly as a slice rather than requiring
an iterator.
- Semiring operations are expressed as plain old methods rather
than strange C++ things. So write `w1.plus(w2)` rather than
`Plus(w1, w2)`, for instance.
- Weights have in-place operations for ⊕
([`plus_assign`](Semiring::plus_assign)) and ⊗
([`times_assign`](Semiring::times_assign)).
- Most of the type aliases (which would be trait aliases in Rust) such
as `StdArc`, `StdFst`, and so forth, are missing, but type inference
allows us to avoid explicit type arguments in most cases, such as
when calling [`Tr::new`], for instance.
- State IDs are unsigned, with [`NO_STATE_ID`] used for a missing value.
They are also 32 bits by default (presumably, 4 billion states
is enough for most applications). This means you must take care to
cast them to [`usize`] when using them as indices, and vice-versa,
preferably checking for overflows
- Symbol IDs are also unsigned and 32-bits, with [`NO_LABEL`] used
for a missing value.
- Floating-point weights are not generic, so are always single-precision.

<!-- cargo-sync-readme end -->

## Benchmark with OpenFST
Expand Down
2 changes: 1 addition & 1 deletion rustfst/src/algorithms/all_pairs_shortest_distance.rs
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ use crate::fst_traits::Fst;
use crate::semirings::StarSemiring;
use crate::Trs;

/// This operation computes the shortest distance from each state to every other states.
/// Compute the shortest distance from each state to every other states.
/// The shortest distance from `p` to `q` is the ⊕-sum of the weights
/// of all the paths between `p` and `q`.
///
Expand Down
8 changes: 5 additions & 3 deletions rustfst/src/algorithms/condense.rs
Original file line number Diff line number Diff line change
Expand Up @@ -7,9 +7,11 @@ use crate::fst_traits::{ExpandedFst, Fst, MutableFst};
use crate::semirings::Semiring;
use crate::{StateId, Trs};

// Returns an acyclic FST where each SCC in the input FST has been condensed to
// a single state with transitions between SCCs retained and within SCCs
// dropped. Also populates 'scc' with a mapping from input to output states.
/// Return an acyclic FST where each SCC in the input FST has been condensed to
/// a single state with transitions between SCCs retained and within SCCs
/// dropped.
///
/// Also populates 'scc' with a mapping from input to output states.
pub fn condense<W: Semiring, FI: Fst<W> + ExpandedFst<W>, FO: MutableFst<W>>(
ifst: &FI,
) -> Result<(Vec<i32>, FO)> {
Expand Down
2 changes: 1 addition & 1 deletion rustfst/src/algorithms/connect.rs
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ use crate::StateId;
use crate::Tr;
use crate::NO_STATE_ID;

/// This operation trims an Fst, removing states and trs that are not on successful paths.
/// Trim an Fst, removing states and trs that are not on successful paths.
///
/// # Example 1
/// ```
Expand Down
2 changes: 1 addition & 1 deletion rustfst/src/algorithms/inversion.rs
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ use crate::fst_properties::FstProperties;
use crate::fst_traits::MutableFst;
use crate::semirings::Semiring;

/// This operation inverts the transduction corresponding to an FST
/// Invert the transduction corresponding to an FST
/// by exchanging the FST's input and output labels.
///
/// # Example 1
Expand Down
10 changes: 6 additions & 4 deletions rustfst/src/algorithms/isomorphic.rs
Original file line number Diff line number Diff line change
Expand Up @@ -158,6 +158,7 @@ impl<'a, W: Semiring, F1: ExpandedFst<W>, F2: ExpandedFst<W>> Isomorphism<'a, W,
}
}

/// Configuration for isomorphic comparison.
pub struct IsomorphicConfig {
delta: f32,
}
Expand All @@ -174,7 +175,7 @@ impl IsomorphicConfig {
}
}

/// This operation determines if two transducers with a certain required determinism
/// Determine if two transducers with a certain required determinism
/// have the same states, irrespective of numbering, and the same transitions with
/// the same labels and weights, irrespective of ordering.
///
Expand All @@ -189,9 +190,10 @@ where
isomorphic_with_config(fst_1, fst_2, IsomorphicConfig::default())
}

/// This operation determines if two transducers with a certain required determinism
/// have the same states, irrespective of numbering, and the same transitions with
/// the same labels and weights, irrespective of ordering.
/// Determine, with configurable comparison delta, if two transducers with a
/// certain required determinism have the same states, irrespective of
/// numbering, and the same transitions with the same labels and
/// weights, irrespective of ordering.
///
/// In other words, Isomorphic(A, B) is true if and only if the states of A can
/// be renumbered and the transitions leaving each state reordered so that Equal(A, B) is true.
Expand Down
1 change: 1 addition & 0 deletions rustfst/src/algorithms/minimize.rs
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,7 @@ use itertools::Itertools;
use std::cell::RefCell;
use std::rc::Rc;

/// Configuration for minimization.
#[derive(Clone, Copy, PartialOrd, PartialEq)]
pub struct MinimizeConfig {
delta: f32,
Expand Down
12 changes: 10 additions & 2 deletions rustfst/src/algorithms/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -31,15 +31,21 @@ pub use self::{

mod add_super_final_state;
mod all_pairs_shortest_distance;
/// Functions to compute Kleene closure (star or plus) of an FST.
pub mod closure;
#[allow(clippy::type_complexity)]
/// Functions to compose FSTs.
pub mod compose;
/// Functions to concatenate FSTs.
pub mod concat;
mod condense;
mod connect;
/// Functions to determinize FSTs.
pub mod determinize;
pub(crate) mod dfs_visit;
/// Functions to encode FSTs as FSAs and vice versa.
pub mod encode;
/// Functions to factor various weight types.
pub mod factor_weight;
mod fst_convert;
mod inversion;
Expand All @@ -51,14 +57,15 @@ mod projection;
mod push;
mod queue;

/// Module providing functions to randomly generate paths through an Fst. A static and a delayed version are available.
/// Functions to randomly generate paths through an Fst. A static and a delayed version are available.
pub mod randgen;
mod relabel_pairs;
/// Functions for lazy replacing transitions in an FST.
pub mod replace;
mod reverse;
mod reweight;

/// Module providing functions to remove epsilon transitions from an Fst. A static and a delayed version are available.
/// Functions to remove epsilon transitions from an Fst. A static and a delayed version are available.
pub mod rm_epsilon;
mod rm_final_epsilon;
mod shortest_distance;
Expand All @@ -69,6 +76,7 @@ mod tr_map;
mod tr_sort;
mod tr_sum;
pub(crate) mod tr_unique;
/// Functions to compute the union of FSTs.
pub mod union;
mod weight_convert;

Expand Down
1 change: 1 addition & 0 deletions rustfst/src/algorithms/optimize.rs
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ use crate::semirings::{SemiringProperties, WeaklyDivisibleSemiring, WeightQuanti
use crate::Semiring;
use anyhow::Result;

/// General optimization (determinization and minimization) of a WFST
pub fn optimize<
W: Semiring + WeaklyDivisibleSemiring + WeightQuantize,
F: MutableFst<W> + AllocableFst<W>,
Expand Down
17 changes: 14 additions & 3 deletions rustfst/src/algorithms/push.rs
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@ bitflags! {
}
}

/// Configuration for [`push_weights_with_config`].
#[derive(Clone, Debug, Copy, PartialOrd, PartialEq)]
pub struct PushWeightsConfig {
delta: f32,
Expand Down Expand Up @@ -65,6 +66,12 @@ impl PushWeightsConfig {
}
}

/// Push the weights in an FST.
///
/// If pushing towards the initial state, the sum of the weight of the
/// outgoing transitions and final weight at a non-initial state is
/// equal to One() in the resulting machine. If pushing towards the
/// final state, the same property holds on the reverse machine.
pub fn push_weights<W, F>(fst: &mut F, reweight_type: ReweightType) -> Result<()>
where
F: MutableFst<W>,
Expand All @@ -73,8 +80,9 @@ where
push_weights_with_config(fst, reweight_type, PushWeightsConfig::default())
}

/// Pushes the weights in FST in the direction defined by TYPE. If
/// pushing towards the initial state, the sum of the weight of the
/// Push the weights in an FST, optionally removing the total weight.
///
/// If pushing towards the initial state, the sum of the weight of the
/// outgoing transitions and final weight at a non-initial state is
/// equal to One() in the resulting machine. If pushing towards the
/// final state, the same property holds on the reverse machine.
Expand Down Expand Up @@ -223,6 +231,7 @@ macro_rules! m_labels_pushing {
}};
}

/// Configuration for [`push_with_config`].
#[derive(Clone, Copy, Debug, PartialOrd, PartialEq)]
pub struct PushConfig {
delta: f32,
Expand All @@ -244,6 +253,8 @@ impl PushConfig {
}
}

/// Push the weights and/or labels of the input FST into the output
/// mutable FST by pushing weights and/or labels towards the initial state or final states.
pub fn push<W, F1, F2>(ifst: &F1, reweight_type: ReweightType, push_type: PushType) -> Result<F2>
where
F1: ExpandedFst<W>,
Expand All @@ -254,7 +265,7 @@ where
push_with_config(ifst, reweight_type, push_type, PushConfig::default())
}

/// Pushes the weights and/or labels of the input FST into the output
/// Push the weights and/or labels of the input FST into the output
/// mutable FST by pushing weights and/or labels towards the initial state or final states.
pub fn push_with_config<W, F1, F2>(
ifst: &F1,
Expand Down
2 changes: 1 addition & 1 deletion rustfst/src/algorithms/relabel_pairs.rs
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ where
Ok(map_labels)
}

/// Replaces input and/or output labels using pairs of labels.
/// Replace input and/or output labels using pairs of labels.
///
/// This operation destructively relabels the input and/or output labels of the
/// FST using pairs of the form (old_ID, new_ID); omitted indices are
Expand Down
4 changes: 3 additions & 1 deletion rustfst/src/algorithms/reverse.rs
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,9 @@ use crate::semirings::Semiring;
use crate::tr::Tr;
use crate::{StateId, Trs, EPS_LABEL};

/// Reverses an FST. The reversed result is written to an output mutable FST.
/// Reverse an FST.
///
/// The reversed result is written to an output mutable FST.
/// If A transduces string x to y with weight a, then the reverse of A
/// transduces the reverse of x to the reverse of y with weight a.Reverse().
///
Expand Down
3 changes: 2 additions & 1 deletion rustfst/src/algorithms/reweight.rs
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,8 @@ pub enum ReweightType {
ReweightToFinal,
}

/// Reweights an FST according to a vector of potentials in a given direction.
/// Reweight an FST according to a vector of potentials in a given direction.
///
/// The weight must be left distributive when reweighting towards the initial
/// state and right distributive when reweighting towards the final states.
///
Expand Down
13 changes: 8 additions & 5 deletions rustfst/src/algorithms/shortest_distance.rs
Original file line number Diff line number Diff line change
Expand Up @@ -252,6 +252,7 @@ pub(crate) fn shortest_distance_with_internal_config<
sd_state.shortest_distance::<F, _>(source, fst)
}

/// Configuration for shortest distance computation
#[derive(Debug, Clone, Copy, PartialOrd, PartialEq)]
pub struct ShortestDistanceConfig {
delta: f32,
Expand All @@ -271,11 +272,7 @@ impl ShortestDistanceConfig {
}
}

pub fn shortest_distance<W: Semiring, F: ExpandedFst<W>>(fst: &F, reverse: bool) -> Result<Vec<W>> {
shortest_distance_with_config(fst, reverse, ShortestDistanceConfig::default())
}

/// This operation computes the shortest distance from the initial state to every state.
/// Compute the shortest distance from the initial state to every state.
/// The shortest distance from `p` to `q` is the ⊕-sum of the weights
/// of all the paths between `p` and `q`.
///
Expand Down Expand Up @@ -308,6 +305,12 @@ pub fn shortest_distance<W: Semiring, F: ExpandedFst<W>>(fst: &F, reverse: bool)
/// # Ok(())
/// # }
/// ```
pub fn shortest_distance<W: Semiring, F: ExpandedFst<W>>(fst: &F, reverse: bool) -> Result<Vec<W>> {
shortest_distance_with_config(fst, reverse, ShortestDistanceConfig::default())
}

/// Compute the shortest distance from the initial state to every
/// state, with configurable delta for comparison.
pub fn shortest_distance_with_config<W: Semiring, F: ExpandedFst<W>>(
fst: &F,
reverse: bool,
Expand Down
Loading

0 comments on commit fe492c0

Please sign in to comment.