diff --git a/README.md b/README.md index 59f3c2eb5..ed85a8ec7 100644 --- a/README.md +++ b/README.md @@ -20,8 +20,7 @@ -This repo contains a **Rust** implementation of Weighted Finite States Transducers. -Along with a **Python** binding. +Rust implementation of Weighted Finite States Transducers. Rustfst is a library for constructing, combining, optimizing, and searching weighted finite-state transducers (FSTs). Weighted finite-state transducers are automata where @@ -43,6 +42,64 @@ results can be selected by shortest-path algorithms. ![fst](https://raw.githubusercontent.com/Garvys/rustfst-images-doc/master/images/project_in.svg?sanitize=true) +## Overview + +For a basic [example](#example) see the section below. + +Some simple and commonly encountered types of FSTs can be easily +created with the macro [`fst`] or the functions +[`acceptor`](utils::acceptor) and +[`transducer`](utils::transducer). + +For more complex cases you will likely start with the +[`VectorFst`](fst_impls::VectorFst) type, which will be imported +in the [`prelude`] along with most everything else you need. +[`VectorFst`](fst_impls::VectorFst) corresponds +directly to the OpenFST `StdVectorFst`, and can be used to load +its files using [`read`](fst_traits::SerializableFst::read) or +[`read_text`](fst_traits::SerializableFst::read_text). + +Because "iteration" over an FST can mean many different things, +there are a variety of different iterators. To iterate over state +IDs you may use +[`states_iter`](fst_traits::StateIterator::states_iter), while to +iterate over transitions out of a state, you may use +[`get_trs`](fst_traits::CoreFst::get_trs). Since it is common to +iterate over both, this can be done using +[`fst_iter`](fst_traits::FstIterator::fst_iter) or +[`fst_into_iter`](fst_traits::FstIntoIterator::fst_into_iter). It +is also very common to iterate over paths accepted by an FST, +which can be done with +[`paths_iter`](fst_traits::Fst::paths_iter), and as a convenience +for generating text, +[`string_paths_iter`](fst_traits::Fst::string_paths_iter). +Alternately, in the case of a linear FST, you may retrieve the +only possible path with +[`decode_linear_fst`](utils::decode_linear_fst). + +Note that iterating over paths is not the same thing as finding +the *shortest* path or paths, which is done with +[`shortest_path`](algorithms::shortest_path) (for a single path) +or +[`shortest_path_with_config`](algorithms::shortest_path_with_config) +(for N-shortest paths). + +For the complete list of algorithms, see the [`algorithms`] module. + +You may now be wondering, especially if you have previously used +such linguist-friendly tools as +[pyfoma](https://github.com/mhulden/pyfoma), "what if I just want +to *transduce some text*???" The unfriendly answer is that +rustfst is a somewhat lower-level library, designed for +implementing things like speech recognizers. The somewhat more +helpful answer is that you would do this by constructing an +[`acceptor`](utils::acceptor) for your input, which you will +[`compose`](algorithms::compose) with a +[`transducer`](utils::transducer), then +[`project`](algorithms::project) the result [to itsoutput](algorithms::ProjectType::ProjectOutput), and finally +[iterate over the paths](fst_traits::Fst::string_paths_iter) in +the resulting FST. + ## References Implementation heavily inspired from Mehryar Mohri's, Cyril Allauzen's and Michael Riley's work : @@ -51,6 +108,10 @@ Implementation heavily inspired from Mehryar Mohri's, Cyril Allauzen's and Micha - [OpenFst: A general and efficient weighted finite-state transducer library](https://link.springer.com/chapter/10.1007%2F978-3-540-76336-9_3) - [Weighted finite-state transducers in speech recognition](https://repository.upenn.edu/cgi/viewcontent.cgi?article=1010&context=cis_papers) +The API closely resembles that of OpenFST, with some +simplifications and changes to make it more idiomatic in Rust, notably +the use of `Tr` instead of `Arc`. See [Differences fromOpenFST](#differences-from-openfst) for more information. + ## Example ```rust @@ -108,6 +169,44 @@ fn main() -> Result<()> { } ``` +## Differences from OpenFST + +Here is a non-exhaustive list of ways in which Rustfst's API +differs from OpenFST: + +- The default epsilon symbol is `` and not ``. +- Functions and methods follow Rust naming conventions, + e.g. `add_state` rather than `AddState`, but are otherwise mostly + equivalent, except that: +- Transitions are called `Tr` and not `Arc`, because `Arc` has a + rather different and well-established meaning in Rust, and rustfst + uses it (`std::sync::Arc`, that is) to reference-count symbol + tables. All associated functions also use `tr`. +- Final states are not indicated by a final weight of `zero`. You + can test for finality using [`is_final`](fst_traits::CoreFst::is_final), and + [`final_weight`](fst_traits::CoreFst::final_weight) returns an [`Option`]. This + requires some care when converting OpenFST code. +- Transitions can be accessed directly as a slice rather than requiring + an iterator. +- Semiring operations are expressed as plain old methods rather + than strange C++ things. So write `w1.plus(w2)` rather than + `Plus(w1, w2)`, for instance. +- Weights have in-place operations for ⊕ + ([`plus_assign`](Semiring::plus_assign)) and ⊗ + ([`times_assign`](Semiring::times_assign)). +- Most of the type aliases (which would be trait aliases in Rust) such + as `StdArc`, `StdFst`, and so forth, are missing, but type inference + allows us to avoid explicit type arguments in most cases, such as + when calling [`Tr::new`], for instance. +- State IDs are unsigned, with [`NO_STATE_ID`] used for a missing value. + They are also 32 bits by default (presumably, 4 billion states + is enough for most applications). This means you must take care to + cast them to [`usize`] when using them as indices, and vice-versa, + preferably checking for overflows +- Symbol IDs are also unsigned and 32-bits, with [`NO_LABEL`] used + for a missing value. +- Floating-point weights are not generic, so are always single-precision. + ## Benchmark with OpenFST diff --git a/rustfst/src/algorithms/all_pairs_shortest_distance.rs b/rustfst/src/algorithms/all_pairs_shortest_distance.rs index 6feba1aaf..4d866c40b 100644 --- a/rustfst/src/algorithms/all_pairs_shortest_distance.rs +++ b/rustfst/src/algorithms/all_pairs_shortest_distance.rs @@ -5,7 +5,7 @@ use crate::fst_traits::Fst; use crate::semirings::StarSemiring; use crate::Trs; -/// This operation computes the shortest distance from each state to every other states. +/// Compute the shortest distance from each state to every other states. /// The shortest distance from `p` to `q` is the ⊕-sum of the weights /// of all the paths between `p` and `q`. /// diff --git a/rustfst/src/algorithms/condense.rs b/rustfst/src/algorithms/condense.rs index 18105f1bf..7a049859e 100644 --- a/rustfst/src/algorithms/condense.rs +++ b/rustfst/src/algorithms/condense.rs @@ -7,9 +7,11 @@ use crate::fst_traits::{ExpandedFst, Fst, MutableFst}; use crate::semirings::Semiring; use crate::{StateId, Trs}; -// Returns an acyclic FST where each SCC in the input FST has been condensed to -// a single state with transitions between SCCs retained and within SCCs -// dropped. Also populates 'scc' with a mapping from input to output states. +/// Return an acyclic FST where each SCC in the input FST has been condensed to +/// a single state with transitions between SCCs retained and within SCCs +/// dropped. +/// +/// Also populates 'scc' with a mapping from input to output states. pub fn condense + ExpandedFst, FO: MutableFst>( ifst: &FI, ) -> Result<(Vec, FO)> { diff --git a/rustfst/src/algorithms/connect.rs b/rustfst/src/algorithms/connect.rs index 76a680f8a..30c461fc1 100644 --- a/rustfst/src/algorithms/connect.rs +++ b/rustfst/src/algorithms/connect.rs @@ -13,7 +13,7 @@ use crate::StateId; use crate::Tr; use crate::NO_STATE_ID; -/// This operation trims an Fst, removing states and trs that are not on successful paths. +/// Trim an Fst, removing states and trs that are not on successful paths. /// /// # Example 1 /// ``` diff --git a/rustfst/src/algorithms/inversion.rs b/rustfst/src/algorithms/inversion.rs index 38bdc5d4f..169d25a8a 100644 --- a/rustfst/src/algorithms/inversion.rs +++ b/rustfst/src/algorithms/inversion.rs @@ -3,7 +3,7 @@ use crate::fst_properties::FstProperties; use crate::fst_traits::MutableFst; use crate::semirings::Semiring; -/// This operation inverts the transduction corresponding to an FST +/// Invert the transduction corresponding to an FST /// by exchanging the FST's input and output labels. /// /// # Example 1 diff --git a/rustfst/src/algorithms/isomorphic.rs b/rustfst/src/algorithms/isomorphic.rs index 367cb9482..0651e0d7a 100644 --- a/rustfst/src/algorithms/isomorphic.rs +++ b/rustfst/src/algorithms/isomorphic.rs @@ -158,6 +158,7 @@ impl<'a, W: Semiring, F1: ExpandedFst, F2: ExpandedFst> Isomorphism<'a, W, } } +/// Configuration for isomorphic comparison. pub struct IsomorphicConfig { delta: f32, } @@ -174,7 +175,7 @@ impl IsomorphicConfig { } } -/// This operation determines if two transducers with a certain required determinism +/// Determine if two transducers with a certain required determinism /// have the same states, irrespective of numbering, and the same transitions with /// the same labels and weights, irrespective of ordering. /// @@ -189,9 +190,10 @@ where isomorphic_with_config(fst_1, fst_2, IsomorphicConfig::default()) } -/// This operation determines if two transducers with a certain required determinism -/// have the same states, irrespective of numbering, and the same transitions with -/// the same labels and weights, irrespective of ordering. +/// Determine, with configurable comparison delta, if two transducers with a +/// certain required determinism have the same states, irrespective of +/// numbering, and the same transitions with the same labels and +/// weights, irrespective of ordering. /// /// In other words, Isomorphic(A, B) is true if and only if the states of A can /// be renumbered and the transitions leaving each state reordered so that Equal(A, B) is true. diff --git a/rustfst/src/algorithms/minimize.rs b/rustfst/src/algorithms/minimize.rs index 2247dc8df..baf214b6c 100644 --- a/rustfst/src/algorithms/minimize.rs +++ b/rustfst/src/algorithms/minimize.rs @@ -39,6 +39,7 @@ use itertools::Itertools; use std::cell::RefCell; use std::rc::Rc; +/// Configuration for minimization. #[derive(Clone, Copy, PartialOrd, PartialEq)] pub struct MinimizeConfig { delta: f32, diff --git a/rustfst/src/algorithms/mod.rs b/rustfst/src/algorithms/mod.rs index 3d4fc45b1..05bdf3c89 100644 --- a/rustfst/src/algorithms/mod.rs +++ b/rustfst/src/algorithms/mod.rs @@ -31,15 +31,21 @@ pub use self::{ mod add_super_final_state; mod all_pairs_shortest_distance; +/// Functions to compute Kleene closure (star or plus) of an FST. pub mod closure; #[allow(clippy::type_complexity)] +/// Functions to compose FSTs. pub mod compose; +/// Functions to concatenate FSTs. pub mod concat; mod condense; mod connect; +/// Functions to determinize FSTs. pub mod determinize; pub(crate) mod dfs_visit; +/// Functions to encode FSTs as FSAs and vice versa. pub mod encode; +/// Functions to factor various weight types. pub mod factor_weight; mod fst_convert; mod inversion; @@ -51,14 +57,15 @@ mod projection; mod push; mod queue; -/// Module providing functions to randomly generate paths through an Fst. A static and a delayed version are available. +/// Functions to randomly generate paths through an Fst. A static and a delayed version are available. pub mod randgen; mod relabel_pairs; +/// Functions for lazy replacing transitions in an FST. pub mod replace; mod reverse; mod reweight; -/// Module providing functions to remove epsilon transitions from an Fst. A static and a delayed version are available. +/// Functions to remove epsilon transitions from an Fst. A static and a delayed version are available. pub mod rm_epsilon; mod rm_final_epsilon; mod shortest_distance; @@ -69,6 +76,7 @@ mod tr_map; mod tr_sort; mod tr_sum; pub(crate) mod tr_unique; +/// Functions to compute the union of FSTs. pub mod union; mod weight_convert; diff --git a/rustfst/src/algorithms/optimize.rs b/rustfst/src/algorithms/optimize.rs index 084a980c9..263fd0c08 100644 --- a/rustfst/src/algorithms/optimize.rs +++ b/rustfst/src/algorithms/optimize.rs @@ -7,6 +7,7 @@ use crate::semirings::{SemiringProperties, WeaklyDivisibleSemiring, WeightQuanti use crate::Semiring; use anyhow::Result; +/// General optimization (determinization and minimization) of a WFST pub fn optimize< W: Semiring + WeaklyDivisibleSemiring + WeightQuantize, F: MutableFst + AllocableFst, diff --git a/rustfst/src/algorithms/push.rs b/rustfst/src/algorithms/push.rs index aa48e3b2b..c10ae4381 100644 --- a/rustfst/src/algorithms/push.rs +++ b/rustfst/src/algorithms/push.rs @@ -30,6 +30,7 @@ bitflags! { } } +/// Configuration for [`push_weights_with_config`]. #[derive(Clone, Debug, Copy, PartialOrd, PartialEq)] pub struct PushWeightsConfig { delta: f32, @@ -65,6 +66,12 @@ impl PushWeightsConfig { } } +/// Push the weights in an FST. +/// +/// If pushing towards the initial state, the sum of the weight of the +/// outgoing transitions and final weight at a non-initial state is +/// equal to One() in the resulting machine. If pushing towards the +/// final state, the same property holds on the reverse machine. pub fn push_weights(fst: &mut F, reweight_type: ReweightType) -> Result<()> where F: MutableFst, @@ -73,8 +80,9 @@ where push_weights_with_config(fst, reweight_type, PushWeightsConfig::default()) } -/// Pushes the weights in FST in the direction defined by TYPE. If -/// pushing towards the initial state, the sum of the weight of the +/// Push the weights in an FST, optionally removing the total weight. +/// +/// If pushing towards the initial state, the sum of the weight of the /// outgoing transitions and final weight at a non-initial state is /// equal to One() in the resulting machine. If pushing towards the /// final state, the same property holds on the reverse machine. @@ -223,6 +231,7 @@ macro_rules! m_labels_pushing { }}; } +/// Configuration for [`push_with_config`]. #[derive(Clone, Copy, Debug, PartialOrd, PartialEq)] pub struct PushConfig { delta: f32, @@ -244,6 +253,8 @@ impl PushConfig { } } +/// Push the weights and/or labels of the input FST into the output +/// mutable FST by pushing weights and/or labels towards the initial state or final states. pub fn push(ifst: &F1, reweight_type: ReweightType, push_type: PushType) -> Result where F1: ExpandedFst, @@ -254,7 +265,7 @@ where push_with_config(ifst, reweight_type, push_type, PushConfig::default()) } -/// Pushes the weights and/or labels of the input FST into the output +/// Push the weights and/or labels of the input FST into the output /// mutable FST by pushing weights and/or labels towards the initial state or final states. pub fn push_with_config( ifst: &F1, diff --git a/rustfst/src/algorithms/relabel_pairs.rs b/rustfst/src/algorithms/relabel_pairs.rs index fd342ecef..4c0be2949 100644 --- a/rustfst/src/algorithms/relabel_pairs.rs +++ b/rustfst/src/algorithms/relabel_pairs.rs @@ -23,7 +23,7 @@ where Ok(map_labels) } -/// Replaces input and/or output labels using pairs of labels. +/// Replace input and/or output labels using pairs of labels. /// /// This operation destructively relabels the input and/or output labels of the /// FST using pairs of the form (old_ID, new_ID); omitted indices are diff --git a/rustfst/src/algorithms/reverse.rs b/rustfst/src/algorithms/reverse.rs index 84b82ee3d..3870d6bfb 100644 --- a/rustfst/src/algorithms/reverse.rs +++ b/rustfst/src/algorithms/reverse.rs @@ -7,7 +7,9 @@ use crate::semirings::Semiring; use crate::tr::Tr; use crate::{StateId, Trs, EPS_LABEL}; -/// Reverses an FST. The reversed result is written to an output mutable FST. +/// Reverse an FST. +/// +/// The reversed result is written to an output mutable FST. /// If A transduces string x to y with weight a, then the reverse of A /// transduces the reverse of x to the reverse of y with weight a.Reverse(). /// diff --git a/rustfst/src/algorithms/reweight.rs b/rustfst/src/algorithms/reweight.rs index a4237cb9f..fe59afe50 100644 --- a/rustfst/src/algorithms/reweight.rs +++ b/rustfst/src/algorithms/reweight.rs @@ -15,7 +15,8 @@ pub enum ReweightType { ReweightToFinal, } -/// Reweights an FST according to a vector of potentials in a given direction. +/// Reweight an FST according to a vector of potentials in a given direction. +/// /// The weight must be left distributive when reweighting towards the initial /// state and right distributive when reweighting towards the final states. /// diff --git a/rustfst/src/algorithms/shortest_distance.rs b/rustfst/src/algorithms/shortest_distance.rs index 5400ef28e..40b886560 100644 --- a/rustfst/src/algorithms/shortest_distance.rs +++ b/rustfst/src/algorithms/shortest_distance.rs @@ -252,6 +252,7 @@ pub(crate) fn shortest_distance_with_internal_config< sd_state.shortest_distance::(source, fst) } +/// Configuration for shortest distance computation #[derive(Debug, Clone, Copy, PartialOrd, PartialEq)] pub struct ShortestDistanceConfig { delta: f32, @@ -271,11 +272,7 @@ impl ShortestDistanceConfig { } } -pub fn shortest_distance>(fst: &F, reverse: bool) -> Result> { - shortest_distance_with_config(fst, reverse, ShortestDistanceConfig::default()) -} - -/// This operation computes the shortest distance from the initial state to every state. +/// Compute the shortest distance from the initial state to every state. /// The shortest distance from `p` to `q` is the ⊕-sum of the weights /// of all the paths between `p` and `q`. /// @@ -308,6 +305,12 @@ pub fn shortest_distance>(fst: &F, reverse: bool) /// # Ok(()) /// # } /// ``` +pub fn shortest_distance>(fst: &F, reverse: bool) -> Result> { + shortest_distance_with_config(fst, reverse, ShortestDistanceConfig::default()) +} + +/// Compute the shortest distance from the initial state to every +/// state, with configurable delta for comparison. pub fn shortest_distance_with_config>( fst: &F, reverse: bool, diff --git a/rustfst/src/algorithms/shortest_path.rs b/rustfst/src/algorithms/shortest_path.rs index 86a2a3f55..fb6a3293a 100644 --- a/rustfst/src/algorithms/shortest_path.rs +++ b/rustfst/src/algorithms/shortest_path.rs @@ -21,6 +21,7 @@ use crate::{StateId, Trs, KSHORTESTDELTA}; use bitflags::_core::fmt::Formatter; use std::fmt::Debug; +/// Configuration for N-shortest path computation #[derive(Debug, Clone, Copy, PartialOrd, PartialEq)] pub struct ShortestPathConfig { pub delta: f32, @@ -60,6 +61,20 @@ impl ShortestPathConfig { } } +/// Create an FST containing the single shortest path in the input +/// FST. The shortest path is the lowest weight paths w.r.t. the +/// natural semiring order. +/// +/// # Example +/// +/// ## Input +/// +/// ![shortestpath_in](https://raw.githubusercontent.com/Garvys/rustfst-images-doc/master/images/shortestpath_in.svg?sanitize=true) +/// +/// ## Output +/// +/// ![shortestpath_out_n_1](https://raw.githubusercontent.com/Garvys/rustfst-images-doc/master/images/shortestpath_out_n_1.svg?sanitize=true) +/// pub fn shortest_path(ifst: &FI) -> Result where FI: ExpandedFst, @@ -73,8 +88,9 @@ where shortest_path_with_config(ifst, ShortestPathConfig::default()) } -/// Creates an FST containing the n-shortest paths in the input FST. The n-shortest paths are the -/// n-lowest weight paths w.r.t. the natural semiring order. +/// Create an FST containing the n-shortest paths in the input +/// FST. The n-shortest paths are the n-lowest weight paths w.r.t. the +/// natural semiring order. /// /// # Example /// diff --git a/rustfst/src/algorithms/state_sort.rs b/rustfst/src/algorithms/state_sort.rs index 14bbafe29..2fe40453b 100644 --- a/rustfst/src/algorithms/state_sort.rs +++ b/rustfst/src/algorithms/state_sort.rs @@ -7,9 +7,11 @@ use crate::fst_traits::MutableFst; use crate::semirings::Semiring; use crate::{StateId, Trs}; -/// Sorts the input states of an FST. order[i] gives the the state ID after -/// sorting that corresponds to the state ID i before sorting; it must -/// therefore be a permutation of the input FST's states ID sequence. +/// Sort the input states of an FST. +/// +/// `order[i]` gives the the state ID after sorting that corresponds +/// to the state ID i before sorting; it must therefore be a +/// permutation of the input FST's states ID sequence. pub fn state_sort(fst: &mut F, order: &[StateId]) -> Result<()> where W: Semiring, diff --git a/rustfst/src/algorithms/top_sort.rs b/rustfst/src/algorithms/top_sort.rs index a5cf21083..f3bbbca8e 100644 --- a/rustfst/src/algorithms/top_sort.rs +++ b/rustfst/src/algorithms/top_sort.rs @@ -60,7 +60,7 @@ impl<'a, W: Semiring, F: 'a + Fst> Visitor<'a, W, F> for TopOrderVisitor { } } -/// This operation topologically sorts its input. When sorted, all transitions are from +/// Topologically sort an FST. When sorted, all transitions are from /// lower to higher state IDs. /// /// # Example diff --git a/rustfst/src/lib.rs b/rustfst/src/lib.rs index f378c1d81..e93996265 100644 --- a/rustfst/src/lib.rs +++ b/rustfst/src/lib.rs @@ -20,6 +20,65 @@ //! //! ![fst](https://raw.githubusercontent.com/Garvys/rustfst-images-doc/master/images/project_in.svg?sanitize=true) //! +//! ## Overview +//! +//! For a basic [example](#example) see the section below. +//! +//! Some simple and commonly encountered types of FSTs can be easily +//! created with the macro [`fst`] or the functions +//! [`acceptor`](utils::acceptor) and +//! [`transducer`](utils::transducer). +//! +//! For more complex cases you will likely start with the +//! [`VectorFst`](fst_impls::VectorFst) type, which will be imported +//! in the [`prelude`] along with most everything else you need. +//! [`VectorFst`](fst_impls::VectorFst) corresponds +//! directly to the OpenFST `StdVectorFst`, and can be used to load +//! its files using [`read`](fst_traits::SerializableFst::read) or +//! [`read_text`](fst_traits::SerializableFst::read_text). +//! +//! Because "iteration" over an FST can mean many different things, +//! there are a variety of different iterators. To iterate over state +//! IDs you may use +//! [`states_iter`](fst_traits::StateIterator::states_iter), while to +//! iterate over transitions out of a state, you may use +//! [`get_trs`](fst_traits::CoreFst::get_trs). Since it is common to +//! iterate over both, this can be done using +//! [`fst_iter`](fst_traits::FstIterator::fst_iter) or +//! [`fst_into_iter`](fst_traits::FstIntoIterator::fst_into_iter). It +//! is also very common to iterate over paths accepted by an FST, +//! which can be done with +//! [`paths_iter`](fst_traits::Fst::paths_iter), and as a convenience +//! for generating text, +//! [`string_paths_iter`](fst_traits::Fst::string_paths_iter). +//! Alternately, in the case of a linear FST, you may retrieve the +//! only possible path with +//! [`decode_linear_fst`](utils::decode_linear_fst). +//! +//! Note that iterating over paths is not the same thing as finding +//! the *shortest* path or paths, which is done with +//! [`shortest_path`](algorithms::shortest_path) (for a single path) +//! or +//! [`shortest_path_with_config`](algorithms::shortest_path_with_config) +//! (for N-shortest paths). +//! +//! For the complete list of algorithms, see the [`algorithms`] module. +//! +//! You may now be wondering, especially if you have previously used +//! such linguist-friendly tools as +//! [pyfoma](https://github.com/mhulden/pyfoma), "what if I just want +//! to *transduce some text*???" The unfriendly answer is that +//! rustfst is a somewhat lower-level library, designed for +//! implementing things like speech recognizers. The somewhat more +//! helpful answer is that you would do this by constructing an +//! [`acceptor`](utils::acceptor) for your input, which you will +//! [`compose`](algorithms::compose) with a +//! [`transducer`](utils::transducer), then +//! [`project`](algorithms::project) the result [to its +//! output](algorithms::ProjectType::ProjectOutput), and finally +//! [iterate over the paths](fst_traits::Fst::string_paths_iter) in +//! the resulting FST. +//! //! ## References //! //! Implementation heavily inspired from Mehryar Mohri's, Cyril Allauzen's and Michael Riley's work : @@ -28,6 +87,11 @@ //! - [OpenFst: A general and efficient weighted finite-state transducer library](https://link.springer.com/chapter/10.1007%2F978-3-540-76336-9_3) //! - [Weighted finite-state transducers in speech recognition](https://repository.upenn.edu/cgi/viewcontent.cgi?article=1010&context=cis_papers) //! +//! The API closely resembles that of OpenFST, with some +//! simplifications and changes to make it more idiomatic in Rust, notably +//! the use of `Tr` instead of `Arc`. See [Differences from +//! OpenFST](#differences-from-openfst) for more information. +//! //! ## Example //! //! ```rust @@ -84,6 +148,44 @@ //! Ok(()) //! } //! ``` +//! +//! ## Differences from OpenFST +//! +//! Here is a non-exhaustive list of ways in which Rustfst's API +//! differs from OpenFST: +//! +//! - The default epsilon symbol is `` and not ``. +//! - Functions and methods follow Rust naming conventions, +//! e.g. `add_state` rather than `AddState`, but are otherwise mostly +//! equivalent, except that: +//! - Transitions are called `Tr` and not `Arc`, because `Arc` has a +//! rather different and well-established meaning in Rust, and rustfst +//! uses it (`std::sync::Arc`, that is) to reference-count symbol +//! tables. All associated functions also use `tr`. +//! - Final states are not indicated by a final weight of `zero`. You +//! can test for finality using [`is_final`](fst_traits::CoreFst::is_final), and +//! [`final_weight`](fst_traits::CoreFst::final_weight) returns an [`Option`]. This +//! requires some care when converting OpenFST code. +//! - Transitions can be accessed directly as a slice rather than requiring +//! an iterator. +//! - Semiring operations are expressed as plain old methods rather +//! than strange C++ things. So write `w1.plus(w2)` rather than +//! `Plus(w1, w2)`, for instance. +//! - Weights have in-place operations for ⊕ +//! ([`plus_assign`](Semiring::plus_assign)) and ⊗ +//! ([`times_assign`](Semiring::times_assign)). +//! - Most of the type aliases (which would be trait aliases in Rust) such +//! as `StdArc`, `StdFst`, and so forth, are missing, but type inference +//! allows us to avoid explicit type arguments in most cases, such as +//! when calling [`Tr::new`], for instance. +//! - State IDs are unsigned, with [`NO_STATE_ID`] used for a missing value. +//! They are also 32 bits by default (presumably, 4 billion states +//! is enough for most applications). This means you must take care to +//! cast them to [`usize`] when using them as indices, and vice-versa, +//! preferably checking for overflows +//! - Symbol IDs are also unsigned and 32-bits, with [`NO_LABEL`] used +//! for a missing value. +//! - Floating-point weights are not generic, so are always single-precision. #[warn(missing_docs)] #[cfg(test)] @@ -165,6 +267,7 @@ pub use crate::parsers::nom_utils::NomCustomError; /// A representable float near .001. (Used in Quantize) pub const KDELTA: f32 = 1.0f32 / 1024.0f32; +/// Default tolerance value used in floating-point comparisons. pub const KSHORTESTDELTA: f32 = 1e-6; /// Module re-exporting most of the objects from this crate. @@ -184,11 +287,13 @@ pub mod prelude { #[cfg(test)] pub mod proptest_fst; +/// Used to indicate a transition with no label. #[cfg(feature = "state-label-u32")] pub static NO_LABEL: Label = std::u32::MAX; #[cfg(not(feature = "state-label-u32"))] pub static NO_LABEL: Label = std::usize::MAX; +/// Used to indicate a missing state ID. #[cfg(feature = "state-label-u32")] pub static NO_STATE_ID: StateId = std::u32::MAX; #[cfg(not(feature = "state-label-u32"))] @@ -196,5 +301,7 @@ pub static NO_STATE_ID: StateId = std::usize::MAX; pub(crate) static UNASSIGNED: usize = std::usize::MAX; +/// Provides a trait used to access transitions from a state. pub mod trs; +/// Provides a trait used to mutably access transitions from a state. pub mod trs_iter_mut;