Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Documentation improvements #236

Merged
merged 12 commits into from
Feb 23, 2023
103 changes: 101 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,8 +20,7 @@

<!-- cargo-sync-readme start -->

This repo contains a **Rust** implementation of Weighted Finite States Transducers.
Along with a **Python** binding.
Rust implementation of Weighted Finite States Transducers.

Rustfst is a library for constructing, combining, optimizing, and searching weighted
finite-state transducers (FSTs). Weighted finite-state transducers are automata where
Expand All @@ -43,6 +42,64 @@ results can be selected by shortest-path algorithms.

![fst](https://raw.githubusercontent.com/Garvys/rustfst-images-doc/master/images/project_in.svg?sanitize=true)

## Overview

For a basic [example](#example) see the section below.

Some simple and commonly encountered types of FSTs can be easily
created with the macro [`fst`] or the functions
[`acceptor`](utils::acceptor) and
[`transducer`](utils::transducer).

For more complex cases you will likely start with the
[`VectorFst`](fst_impls::VectorFst) type, which will be imported
in the [`prelude`] along with most everything else you need.
[`VectorFst<TropicalWeight>`](fst_impls::VectorFst) corresponds
directly to the OpenFST `StdVectorFst`, and can be used to load
its files using [`read`](fst_traits::SerializableFst::read) or
[`read_text`](fst_traits::SerializableFst::read_text).

Because "iteration" over an FST can mean many different things,
there are a variety of different iterators. To iterate over state
IDs you may use
[`states_iter`](fst_traits::StateIterator::states_iter), while to
iterate over transitions out of a state, you may use
[`get_trs`](fst_traits::CoreFst::get_trs). Since it is common to
iterate over both, this can be done using
[`fst_iter`](fst_traits::FstIterator::fst_iter) or
[`fst_into_iter`](fst_traits::FstIntoIterator::fst_into_iter). It
is also very common to iterate over paths accepted by an FST,
which can be done with
[`paths_iter`](fst_traits::Fst::paths_iter), and as a convenience
for generating text,
[`string_paths_iter`](fst_traits::Fst::string_paths_iter).
Alternately, in the case of a linear FST, you may retrieve the
only possible path with
[`decode_linear_fst`](utils::decode_linear_fst).

Note that iterating over paths is not the same thing as finding
the *shortest* path or paths, which is done with
[`shortest_path`](algorithms::shortest_path) (for a single path)
or
[`shortest_path_with_config`](algorithms::shortest_path_with_config)
(for N-shortest paths).

For the complete list of algorithms, see the [`algorithms`] module.

You may now be wondering, especially if you have previously used
such linguist-friendly tools as
[pyfoma](https://github.com/mhulden/pyfoma), "what if I just want
to *transduce some text*???" The unfriendly answer is that
rustfst is a somewhat lower-level library, designed for
implementing things like speech recognizers. The somewhat more
helpful answer is that you would do this by constructing an
[`acceptor`](utils::acceptor) for your input, which you will
[`compose`](algorithms::compose) with a
[`transducer`](utils::transducer), then
[`project`](algorithms::project) the result [to itsoutput](algorithms::ProjectType::ProjectOutput), and finally
[iterate over the paths](fst_traits::Fst::string_paths_iter) in
the resulting FST.

## References

Implementation heavily inspired from Mehryar Mohri's, Cyril Allauzen's and Michael Riley's work :
Expand All @@ -51,6 +108,10 @@ Implementation heavily inspired from Mehryar Mohri's, Cyril Allauzen's and Micha
- [OpenFst: A general and efficient weighted finite-state transducer library](https://link.springer.com/chapter/10.1007%2F978-3-540-76336-9_3)
- [Weighted finite-state transducers in speech recognition](https://repository.upenn.edu/cgi/viewcontent.cgi?article=1010&context=cis_papers)

The API closely resembles that of OpenFST, with some
simplifications and changes to make it more idiomatic in Rust, notably
the use of `Tr` instead of `Arc`. See [Differences fromOpenFST](#differences-from-openfst) for more information.

## Example

```rust
Expand Down Expand Up @@ -108,6 +169,44 @@ fn main() -> Result<()> {
}
```

## Differences from OpenFST

Here is a non-exhaustive list of ways in which Rustfst's API
differs from OpenFST:

- The default epsilon symbol is `<eps>` and not `<epsilon>`.
- Functions and methods follow Rust naming conventions,
e.g. `add_state` rather than `AddState`, but are otherwise mostly
equivalent, except that:
- Transitions are called `Tr` and not `Arc`, because `Arc` has a
rather different and well-established meaning in Rust, and rustfst
uses it (`std::sync::Arc`, that is) to reference-count symbol
tables. All associated functions also use `tr`.
- Final states are not indicated by a final weight of `zero`. You
can test for finality using [`is_final`](fst_traits::CoreFst::is_final), and
[`final_weight`](fst_traits::CoreFst::final_weight) returns an [`Option`]. This
requires some care when converting OpenFST code.
- Transitions can be accessed directly as a slice rather than requiring
an iterator.
- Semiring operations are expressed as plain old methods rather
than strange C++ things. So write `w1.plus(w2)` rather than
`Plus(w1, w2)`, for instance.
- Weights have in-place operations for ⊕
([`plus_assign`](Semiring::plus_assign)) and ⊗
([`times_assign`](Semiring::times_assign)).
- Most of the type aliases (which would be trait aliases in Rust) such
as `StdArc`, `StdFst`, and so forth, are missing, but type inference
allows us to avoid explicit type arguments in most cases, such as
when calling [`Tr::new`], for instance.
- State IDs are unsigned, with [`NO_STATE_ID`] used for a missing value.
They are also 32 bits by default (presumably, 4 billion states
is enough for most applications). This means you must take care to
cast them to [`usize`] when using them as indices, and vice-versa,
preferably checking for overflows
- Symbol IDs are also unsigned and 32-bits, with [`NO_LABEL`] used
for a missing value.
- Floating-point weights are not generic, so are always single-precision.

<!-- cargo-sync-readme end -->

## Benchmark with OpenFST
Expand Down
2 changes: 1 addition & 1 deletion rustfst/src/algorithms/all_pairs_shortest_distance.rs
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ use crate::fst_traits::Fst;
use crate::semirings::StarSemiring;
use crate::Trs;

/// This operation computes the shortest distance from each state to every other states.
/// Compute the shortest distance from each state to every other states.
/// The shortest distance from `p` to `q` is the ⊕-sum of the weights
/// of all the paths between `p` and `q`.
///
Expand Down
8 changes: 5 additions & 3 deletions rustfst/src/algorithms/condense.rs
Original file line number Diff line number Diff line change
Expand Up @@ -7,9 +7,11 @@ use crate::fst_traits::{ExpandedFst, Fst, MutableFst};
use crate::semirings::Semiring;
use crate::{StateId, Trs};

// Returns an acyclic FST where each SCC in the input FST has been condensed to
// a single state with transitions between SCCs retained and within SCCs
// dropped. Also populates 'scc' with a mapping from input to output states.
/// Return an acyclic FST where each SCC in the input FST has been condensed to
/// a single state with transitions between SCCs retained and within SCCs
/// dropped.
///
/// Also populates 'scc' with a mapping from input to output states.
pub fn condense<W: Semiring, FI: Fst<W> + ExpandedFst<W>, FO: MutableFst<W>>(
ifst: &FI,
) -> Result<(Vec<i32>, FO)> {
Expand Down
2 changes: 1 addition & 1 deletion rustfst/src/algorithms/connect.rs
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ use crate::StateId;
use crate::Tr;
use crate::NO_STATE_ID;

/// This operation trims an Fst, removing states and trs that are not on successful paths.
/// Trim an Fst, removing states and trs that are not on successful paths.
///
/// # Example 1
/// ```
Expand Down
2 changes: 1 addition & 1 deletion rustfst/src/algorithms/inversion.rs
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ use crate::fst_properties::FstProperties;
use crate::fst_traits::MutableFst;
use crate::semirings::Semiring;

/// This operation inverts the transduction corresponding to an FST
/// Inverts the transduction corresponding to an FST
dhdaines marked this conversation as resolved.
Show resolved Hide resolved
/// by exchanging the FST's input and output labels.
///
/// # Example 1
Expand Down
10 changes: 6 additions & 4 deletions rustfst/src/algorithms/isomorphic.rs
Original file line number Diff line number Diff line change
Expand Up @@ -158,6 +158,7 @@ impl<'a, W: Semiring, F1: ExpandedFst<W>, F2: ExpandedFst<W>> Isomorphism<'a, W,
}
}

/// Configuration for isomorphic comparison.
pub struct IsomorphicConfig {
delta: f32,
}
Expand All @@ -174,7 +175,7 @@ impl IsomorphicConfig {
}
}

/// This operation determines if two transducers with a certain required determinism
/// Determine if two transducers with a certain required determinism
/// have the same states, irrespective of numbering, and the same transitions with
/// the same labels and weights, irrespective of ordering.
///
Expand All @@ -189,9 +190,10 @@ where
isomorphic_with_config(fst_1, fst_2, IsomorphicConfig::default())
}

/// This operation determines if two transducers with a certain required determinism
/// have the same states, irrespective of numbering, and the same transitions with
/// the same labels and weights, irrespective of ordering.
/// Determine, with configurable comparison delta, if two transducers with a
/// certain required determinism have the same states, irrespective of
/// numbering, and the same transitions with the same labels and
/// weights, irrespective of ordering.
///
/// In other words, Isomorphic(A, B) is true if and only if the states of A can
/// be renumbered and the transitions leaving each state reordered so that Equal(A, B) is true.
Expand Down
1 change: 1 addition & 0 deletions rustfst/src/algorithms/minimize.rs
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,7 @@ use itertools::Itertools;
use std::cell::RefCell;
use std::rc::Rc;

/// Configuration for minimization.
#[derive(Clone, Copy, PartialOrd, PartialEq)]
pub struct MinimizeConfig {
delta: f32,
Expand Down
12 changes: 10 additions & 2 deletions rustfst/src/algorithms/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -31,15 +31,21 @@ pub use self::{

mod add_super_final_state;
mod all_pairs_shortest_distance;
/// Functions to compute Kleene closure (star or plus) of an FST.
pub mod closure;
#[allow(clippy::type_complexity)]
/// Functions to compose FSTs.
pub mod compose;
/// Functions to concatenate FSTs.
pub mod concat;
mod condense;
mod connect;
/// Functions to determinize FSTs.
pub mod determinize;
pub(crate) mod dfs_visit;
/// Functions to encode FSTs as FSAs and vice versa.
pub mod encode;
/// Functions to factor various weight types.
pub mod factor_weight;
mod fst_convert;
mod inversion;
Expand All @@ -51,14 +57,15 @@ mod projection;
mod push;
mod queue;

/// Module providing functions to randomly generate paths through an Fst. A static and a delayed version are available.
/// Functions to randomly generate paths through an Fst. A static and a delayed version are available.
pub mod randgen;
mod relabel_pairs;
/// Functions for lazy replacing transitions in an FST.
pub mod replace;
mod reverse;
mod reweight;

/// Module providing functions to remove epsilon transitions from an Fst. A static and a delayed version are available.
/// Functions to remove epsilon transitions from an Fst. A static and a delayed version are available.
pub mod rm_epsilon;
mod rm_final_epsilon;
mod shortest_distance;
Expand All @@ -69,6 +76,7 @@ mod tr_map;
mod tr_sort;
mod tr_sum;
pub(crate) mod tr_unique;
/// Functions to compute the union of FSTs.
pub mod union;
mod weight_convert;

Expand Down
1 change: 1 addition & 0 deletions rustfst/src/algorithms/optimize.rs
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ use crate::semirings::{SemiringProperties, WeaklyDivisibleSemiring, WeightQuanti
use crate::Semiring;
use anyhow::Result;

/// General optimization (determinization and minimiaztion) of a WFST
dhdaines marked this conversation as resolved.
Show resolved Hide resolved
pub fn optimize<
W: Semiring + WeaklyDivisibleSemiring + WeightQuantize,
F: MutableFst<W> + AllocableFst<W>,
Expand Down
17 changes: 14 additions & 3 deletions rustfst/src/algorithms/push.rs
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@ bitflags! {
}
}

/// Configuration for [`push_weights_with_config`].
#[derive(Clone, Debug, Copy, PartialOrd, PartialEq)]
pub struct PushWeightsConfig {
delta: f32,
Expand Down Expand Up @@ -65,6 +66,12 @@ impl PushWeightsConfig {
}
}

/// Push the weights in an FST.
///
/// If pushing towards the initial state, the sum of the weight of the
/// outgoing transitions and final weight at a non-initial state is
/// equal to One() in the resulting machine. If pushing towards the
/// final state, the same property holds on the reverse machine.
pub fn push_weights<W, F>(fst: &mut F, reweight_type: ReweightType) -> Result<()>
where
F: MutableFst<W>,
Expand All @@ -73,8 +80,9 @@ where
push_weights_with_config(fst, reweight_type, PushWeightsConfig::default())
}

/// Pushes the weights in FST in the direction defined by TYPE. If
/// pushing towards the initial state, the sum of the weight of the
/// Push the weights in an FST, optionally removing the total weight.
///
/// If pushing towards the initial state, the sum of the weight of the
/// outgoing transitions and final weight at a non-initial state is
/// equal to One() in the resulting machine. If pushing towards the
/// final state, the same property holds on the reverse machine.
Expand Down Expand Up @@ -223,6 +231,7 @@ macro_rules! m_labels_pushing {
}};
}

/// Configuration for [`push_with_config`].
#[derive(Clone, Copy, Debug, PartialOrd, PartialEq)]
pub struct PushConfig {
delta: f32,
Expand All @@ -244,6 +253,8 @@ impl PushConfig {
}
}

/// Push the weights and/or labels of the input FST into the output
/// mutable FST by pushing weights and/or labels towards the initial state or final states.
pub fn push<W, F1, F2>(ifst: &F1, reweight_type: ReweightType, push_type: PushType) -> Result<F2>
where
F1: ExpandedFst<W>,
Expand All @@ -254,7 +265,7 @@ where
push_with_config(ifst, reweight_type, push_type, PushConfig::default())
}

/// Pushes the weights and/or labels of the input FST into the output
/// Push the weights and/or labels of the input FST into the output
/// mutable FST by pushing weights and/or labels towards the initial state or final states.
pub fn push_with_config<W, F1, F2>(
ifst: &F1,
Expand Down
2 changes: 1 addition & 1 deletion rustfst/src/algorithms/relabel_pairs.rs
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ where
Ok(map_labels)
}

/// Replaces input and/or output labels using pairs of labels.
/// Replace input and/or output labels using pairs of labels.
///
/// This operation destructively relabels the input and/or output labels of the
/// FST using pairs of the form (old_ID, new_ID); omitted indices are
Expand Down
4 changes: 3 additions & 1 deletion rustfst/src/algorithms/reverse.rs
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,9 @@ use crate::semirings::Semiring;
use crate::tr::Tr;
use crate::{StateId, Trs, EPS_LABEL};

/// Reverses an FST. The reversed result is written to an output mutable FST.
/// Reverse an FST.
///
/// The reversed result is written to an output mutable FST.
/// If A transduces string x to y with weight a, then the reverse of A
/// transduces the reverse of x to the reverse of y with weight a.Reverse().
///
Expand Down
3 changes: 2 additions & 1 deletion rustfst/src/algorithms/reweight.rs
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,8 @@ pub enum ReweightType {
ReweightToFinal,
}

/// Reweights an FST according to a vector of potentials in a given direction.
/// Reweight an FST according to a vector of potentials in a given direction.
///
/// The weight must be left distributive when reweighting towards the initial
/// state and right distributive when reweighting towards the final states.
///
Expand Down
13 changes: 8 additions & 5 deletions rustfst/src/algorithms/shortest_distance.rs
Original file line number Diff line number Diff line change
Expand Up @@ -252,6 +252,7 @@ pub(crate) fn shortest_distance_with_internal_config<
sd_state.shortest_distance::<F, _>(source, fst)
}

/// Configuration for shortest distance computation
#[derive(Debug, Clone, Copy, PartialOrd, PartialEq)]
pub struct ShortestDistanceConfig {
delta: f32,
Expand All @@ -271,11 +272,7 @@ impl ShortestDistanceConfig {
}
}

pub fn shortest_distance<W: Semiring, F: ExpandedFst<W>>(fst: &F, reverse: bool) -> Result<Vec<W>> {
shortest_distance_with_config(fst, reverse, ShortestDistanceConfig::default())
}

/// This operation computes the shortest distance from the initial state to every state.
/// Compute the shortest distance from the initial state to every state.
/// The shortest distance from `p` to `q` is the ⊕-sum of the weights
/// of all the paths between `p` and `q`.
///
Expand Down Expand Up @@ -308,6 +305,12 @@ pub fn shortest_distance<W: Semiring, F: ExpandedFst<W>>(fst: &F, reverse: bool)
/// # Ok(())
/// # }
/// ```
pub fn shortest_distance<W: Semiring, F: ExpandedFst<W>>(fst: &F, reverse: bool) -> Result<Vec<W>> {
shortest_distance_with_config(fst, reverse, ShortestDistanceConfig::default())
}

/// Compute the shortest distance from the initial state to every
/// state, with configurable delta for comparison.
pub fn shortest_distance_with_config<W: Semiring, F: ExpandedFst<W>>(
fst: &F,
reverse: bool,
Expand Down
Loading