-
Notifications
You must be signed in to change notification settings - Fork 20
Conversation
Codecov Report
@@ Coverage Diff @@
## main #89 +/- ##
==========================================
+ Coverage 81.39% 81.45% +0.05%
==========================================
Files 63 63
Lines 10273 10266 -7
==========================================
Hits 8362 8362
+ Misses 1911 1904 -7
|
I actually just encountered a funky error in the wild with cargo trying to fetch the spec submodule but me not having proper creds to get it, leaving me unable to use the repo. To fix that I think we should add Additionally, our exports really need to be refined. When opening the crate with I'd also like to ditch our timely/abomonation dependencies before we publish because otherwise we'll set off every dependency scan in the galaxy because of how cursed abomonation is. We're also missing a lotta docs on a lotta very critical things like A bunch of names are also fairly unintuitive, We also just need more examples, without concrete examples or an understanding of the underlying theory there's almost no discernible difference when looking at We can also probably write a good, long doc piece in our readme and then use // lib.rs
#![cfg_attr(doc, doc_cfg)] # Cargo.toml
[package.metadata.docs.rs]
all-features = true |
It would be nice to have a tutorial with some more explanations aside from the spec PDF. Reading this it seems to be fairly complex? I don't think I could write a program with this from just the docs+spec. Having a tutorial with a simple program that's incrementally (pun intended) built and explained would be nice as you add complexity, e.g., start with a join/filter/smth else and make it more complex as you explain things... |
OTOH, if this is just about releasing this on crates.io to reserve the name that seems reasonable too. |
README.md
Outdated
1. The complete set of **relational operators**: select, project, join, | ||
aggregate, etc. | ||
|
||
1. **Recursion**: Recursive queries allow for instance expressing graph |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Explanation goes right into example (graph queries) rather than giving a general explanation like you do for {window, per-record} operators.
The PR is only for the readme. The public API is currently non existent. I'll work on it next, but there are some challenges there, it's not just a matter of cleaning the docs. I don't think this should stop us from releasing on crates.io, but if people feel strongly about it, we can do it later. |
Gerd does bring up a good point, I didn't think about reserving the crate name |
Removed information about Rust tooling from the README: it is not specific to DBSP, is a matter of taste, and doesn't help people to get started with the project.
Seems like |
@mbudiu-vmw , is there a way the paper could live in a public repo? |
The pdf is already in this repo. |
I think there is a way overleaf can work with an existing git repo. But I'm also ok with removing the submodule and just keeping the PDF, which is the important part. Are you ok with that? |
I just made some edits to the sources. We should have the tex sources too. |
If you can send Val instructions on how to make the repo public perhaps he can do it. |
I don't know how to do it. More importantly, we don't want to depend on Val's repo being available. If he changes the visibility or deletes the repo in the future (or overleaf deletes or closes it automatically for whatever reason), it will break the crate. |
Then we should just copy the sources here as well. We need a backup anyway. |
|
||
Ideally this code should run fine in Linux, MacOs, and Windows. | ||
The code is written in Rust. Here are some tools we found useful for development: | ||
Computing over streaming data is hard. Streaming computations operate over |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the word "incremental" does not appear here at all.
README.md
Outdated
|
||
## Set-up git hooks | ||
1. **Per-record operators** that parse, validate, filter, transform data streams |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the notion of "record" is not defined.
I would first introduce the notion of "transaction".
Once we merge this I will add some diagrams.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume that most people know what records and tables are.
README.md
Outdated
1. **Per-record operators** that parse, validate, filter, transform data streams | ||
one record at a time. | ||
|
||
1. **Windowing operators** that group time series data into time windows, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
frankly there isn't anything about time in these records. So I would not call them "time-series".
Time-series implies that one field of the record is a timestamp.
README.md
Outdated
1. The complete set of **relational operators**: select, project, join, | ||
aggregate, etc. | ||
|
||
1. **Recursion**: Recursive queries express iterative computations, e.g., |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the word "incrementally" appears here for the first time.
Perhaps fixpoint is a more accurate description, with the statement that fixpoint computations can be used to implement recursive queries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Recursion" is a more familiar term to database folks than "fixed point". The goal here is not to write a formal design document, precisely specifying each notion, but to give an idea of the capabilities we are implementing.
- **Change data** represents updates (insertions, deletions, modifications) to | ||
some state modeled as a table of records. | ||
|
||
In DBSP, a time series is just a table where records are only ever added and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"table" appears here for the first time.
README.md
Outdated
time window queries are updated on the fly as new inputs become available. This | ||
means that DBSP can work with arbitrarily large windows as long as they fit | ||
within available storage. All other operators listed above apply to both time | ||
series and change data. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps I could take a stab at improving this documentation, after all I have already tons of slides I could steal material from. But I understand that this text should be more developer-oriented.
Apparently, submodules don't work well with `cargo`: - #89 (comment) - #89 (comment) We will add a copy of the latex source to this repo instead.
I think this README is an improvement over what we have now. I am going to merge it and move on to other stuff. Improvements are welcome. |
I want to create a crates.io release to make it easy for people to try DBSP. In preparation for that I added a project description to the README. Once the crate has been published, I will add links to crates.io and docs.rs.