-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow the saving and loading to a byte string #6
Conversation
I'd like to make things use a common abstraction to load from a rust Patchcommit 0c9258a
Author: Zachary Dremann <[email protected]>
Date: Thu Oct 12 14:08:21 2023 -0400
save/load via generic Read/Write
diff --git a/src/lib.rs b/src/lib.rs
index 4d7f63b..994e31a 100644
--- a/src/lib.rs
+++ b/src/lib.rs
@@ -1,7 +1,7 @@
use bitline::BitLine;
use pyo3::exceptions::{PyTypeError, PyValueError};
use pyo3::types::PyType;
-use pyo3::{basic::CompareOp, prelude::*, types::PyTuple, types::PyBytes};
+use pyo3::{basic::CompareOp, prelude::*, types::PyBytes, types::PyTuple};
use std::fs::File;
use std::io::{Read, Write};
use std::mem;
@@ -251,58 +251,13 @@ impl Bloom {
/// Load from a file, see "Persistence" section in the README
#[classmethod]
fn load(_cls: &PyType, filepath: &str, hash_func: &PyAny) -> PyResult<Bloom> {
- // check that the hash_func is callable
- if !hash_func.is_callable() {
- return Err(PyTypeError::new_err("hash_func must be callable"));
- }
- // check that the hash_func isn't the built-in hash function
- if hash_func.is(get_builtin_hash_func(hash_func.py())?) {
- return Err(PyValueError::new_err(
- "Cannot load a bloom filter that uses the built-in hash function",
- ));
- }
- let hash_func = Some(hash_func.to_object(hash_func.py()));
-
- let mut file = File::open(filepath)?;
-
- let mut k_bytes = [0; mem::size_of::<u64>()];
- file.read_exact(&mut k_bytes)?;
- let k = u64::from_le_bytes(k_bytes);
-
- let filter = BitLine::load(&mut file)?;
-
- Ok(Bloom {
- filter,
- k,
- hash_func,
- })
+ Self::load_from_read(hash_func, || File::open(filepath))
}
/// Load from a bytes(), see "Persistence" section in the README
#[classmethod]
fn load_bytes(_cls: &PyType, bytes: &[u8], hash_func: &PyAny) -> PyResult<Bloom> {
- // check that the hash_func is callable
- if !hash_func.is_callable() {
- return Err(PyTypeError::new_err("hash_func must be callable"));
- }
- // check that the hash_func isn't the built-in hash function
- if hash_func.is(get_builtin_hash_func(hash_func.py())?) {
- return Err(PyValueError::new_err(
- "Cannot load a bloom filter that uses the built-in hash function",
- ));
- }
- let hash_func = Some(hash_func.to_object(hash_func.py()));
-
- let k_bytes: [u8; mem::size_of::<u64>()] = bytes[0 .. mem::size_of::<u64>()].try_into().expect("slice with incorrect length");
- let k = u64::from_le_bytes(k_bytes);
-
- let filter = BitLine::load_bytes(&bytes[mem::size_of::<u64>()..])?;
-
- Ok(Bloom {
- filter,
- k,
- hash_func,
- })
+ Self::load_from_read(hash_func, || Ok::<_, std::convert::Infallible>(bytes))
}
/// Save to a file, see "Persistence" section in the README
@@ -320,18 +275,10 @@ impl Bloom {
/// Save to a byte(), see "Persistence" section in the README
fn save_bytes(&self, py: Python<'_>) -> PyResult<PyObject> {
- if self.hash_func.is_none() {
- return Err(PyValueError::new_err(
- "Cannot save a bloom filter that uses the built-in hash function",
- ));
- }
-
- let serialized:Vec<u8> = [
- &self.k.to_le_bytes(),
- &self.filter.bits as &[u8]
- ].concat();
-
- Ok(PyBytes::new(py, &serialized).into())
+ let mut bytes =
+ Vec::with_capacity(mem::size_of_val(&self.k) + (self.filter.len() / 8) as usize);
+ self.save_to_write(|| Ok::<_, std::convert::Infallible>(&mut bytes))?;
+ Ok(PyBytes::new(py, &bytes).into())
}
}
@@ -341,6 +288,56 @@ impl Bloom {
self.hash_func.as_ref().map(|f| f.clone_ref(py))
}
+ fn load_from_read<F, R, E>(hash_func: &PyAny, get_data: F) -> PyResult<Self>
+ where
+ F: FnOnce() -> Result<R, E>,
+ R: Read,
+ PyErr: From<E>,
+ {
+ // check that the hash_func is callable
+ if !hash_func.is_callable() {
+ return Err(PyTypeError::new_err("hash_func must be callable"));
+ }
+ // check that the hash_func isn't the built-in hash function
+ if hash_func.is(get_builtin_hash_func(hash_func.py())?) {
+ return Err(PyValueError::new_err(
+ "Cannot load a bloom filter that uses the built-in hash function",
+ ));
+ }
+ let hash_func = Some(hash_func.to_object(hash_func.py()));
+
+ let mut data = get_data()?;
+
+ let mut k_bytes = [0; mem::size_of::<u64>()];
+ data.read_exact(&mut k_bytes)?;
+ let k = u64::from_le_bytes(k_bytes);
+
+ let filter = BitLine::load(data)?;
+
+ Ok(Bloom {
+ filter,
+ k,
+ hash_func,
+ })
+ }
+
+ fn save_to_write<F, W, E>(&self, get_writer: F) -> PyResult<()>
+ where
+ F: FnOnce() -> Result<W, E>,
+ W: Write,
+ PyErr: From<E>,
+ {
+ if self.hash_func.is_none() {
+ return Err(PyValueError::new_err(
+ "Cannot save a bloom filter that uses the built-in hash function",
+ ));
+ }
+ let mut writer = get_writer()?;
+ writer.write_all(&self.k.to_le_bytes())?;
+ self.filter.save(writer)?;
+ Ok(())
+ }
+
fn zeroed_clone(&self, py: Python<'_>) -> Bloom {
Bloom {
filter: BitLine::new(self.filter.len()).unwrap(),
@@ -382,7 +379,6 @@ impl Bloom {
mod bitline {
use pyo3::exceptions::PyValueError;
use pyo3::prelude::*;
- use std::fs::File;
use std::io::{Read, Write};
#[inline(always)]
@@ -393,7 +389,7 @@ mod bitline {
#[derive(Clone, PartialEq, Eq)]
pub struct BitLine {
- pub bits: Box<[u8]>,
+ bits: Box<[u8]>,
}
impl BitLine {
@@ -453,7 +449,7 @@ mod bitline {
/// Reads the given file from the current position to the end and
/// returns a BitLine containing the data.
- pub fn load(file: &mut File) -> PyResult<Self> {
+ pub fn load<R: Read>(mut file: R) -> PyResult<Self> {
let mut bits = Vec::new();
file.read_to_end(&mut bits)?;
Ok(Self {
@@ -461,18 +457,8 @@ mod bitline {
})
}
- /// Given the provided [u8]
- /// returns a BitLine containing the data.
- pub fn load_bytes(bytes: &[u8]) -> PyResult<Self> {
- let bits = bytes.to_vec();
- Ok(Self {
- bits: bits.into_boxed_slice(),
- })
- }
-
-
/// Writes the BitLine to the given file from the current position.
- pub fn save(&self, file: &mut File) -> PyResult<()> {
+ pub fn save<W: Write>(&self, mut file: W) -> PyResult<()> {
file.write_all(&self.bits)?;
Ok(())
} We might also want to consider doing something like allowing any of |
I rolled in your suggestions for the common abstraction, I wasn't sure how to do that so thanks! I've also implemented file-like objects using omerbenamram/pyo3-file, now all 3 of the following work as expected:
I'll look into doing the same for load() some time tomorrow, and tidying up the matches for the different argument types. |
I've now implemented load(), I'm not entirely sure using a The stub typedef is technically incorrect for save as |
Hoping we can get this change I propopsed to pyo3-file which would let us derive |
rbloom.pyi
Outdated
# save to file, see section "Persistence" | ||
def save(self, filepath: str) -> None: ... | ||
# save to file, file object, or return bytes, see section "Persistence" | ||
def save(self, source: Optional[Union[str, BinaryIO]] = None) -> Union[bytes, None]: ... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can type the stub more accurately with the @overload
decorator:
import typing.overload
@overload
def save(self, dest: Union[str, BinaryIO]) -> None: ...
@overload
def save(self, dest: None = None) -> bytes: ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, when we can update to pyo3 0.20, using PathBuf
will mean we also can take os.PathLike
(PyO3/pyo3#3374)
Thanks for the additional feedback |
Hey guys, just wanted to let you know that I won't be able to take a look at the suggested changes this week due to a looming deadline, but that I definitely will as soon as I can and that I am always super grateful for every contribution!! |
The API Hi, I'm not 100% comfortable with the new API yet. My main issue lies with the fact that the The abstraction I also believe that adding the abstraction on top, while technically elegant, would introduce an additional layer of complexity that makes the code less accessible. I appreciate @Dr-Emann 's obvious affinity to this kind of elegant complexity, but personally feel that the abstraction doesn't really improve code quality (by my subjective point of view) in a meaningful way. Conclusion I want to give you guys an opportunity to voice your opinions before I make my final decision, but right now I'm thinking about making some minor, inconsequential changes to @psychomario 's original two commits myself (including adding a small portion to the README) and then only committing those without the abstraction. |
I merged the original two commits (save_bytes & load_bytes implementation as well as tests), made some minor modifications to the code, and wrote documentation. I am therefore now closing this. Thank you for the contribution! |
Implement the functions:
to allow for saving and loading via Python
bytes
, for storage other than in a file.Apologies if this is terrible Rust, I've never written Rust before today.