- Fix usage of ordering strange mixes, such as
[:name, [:age, :desc]]
that were supported until 0.22.0
transform
(TupleTransformer
, actually) now support any Callable (respond_to?(:call)
) object. This allows using Enspirit's monolens for instance.
- Optimize
summarize.restrict
. Push whatever can be pushed down the tree.
-
Add
Bmg.json
andBmg.yaml
factory methods, to get relations on top of usual data files. -
Add a
Database
abstraction, withDatabase.data_folder
,Database.sequel
andDatabase.xlsx
factory methods and implementations, as well asDatabase#to_data_folder
andDatabase#to_xlsx
dump methods. See README for details. -
Add
Summarizer.bucketize
to distribute attribute values in a number of buckets. We support :boundaries, :value_length and :distinct options. -
require 'bmg/sequel
automatically requiresbmg
itself.
-
Add the
minus
operation (also known as set difference, or EXCEPT in SQL). -
A native ordering, i.e.
->(t1,t2){ ... }
like usual withEnumerable#sort
is now supported in various places where ordering may be used (e.g.page
,image
's array option,output_preferences
). This allows the use of complex ordering implemented in pure ruby, but most of the time prevents optimization and/or SQL compilation and backtracking to ruby native implementation instead.
- Add
ttl
(Time to leave) to the available options when using the bmg-redis contrib. If provided, thettl
option sets the provided value as the validity period of a given tuple in Redis. If not, the ttl is not set and the tuple keeps existing indefinitely in Redis.
- Add
cross_product
(aliased tocross_join
) as a shortcut forjoin
with no join attributes (r.join(right)
). If typechecking mode is enabled the shortcut checks that there are no shared attributes between the two relations, and raises an error if there are.
-
Add more output references to CSV and XLSX writers. It is now possible to ignore extra attributes, sort tuples before outputting, and reducing redundancy via group attributes. See OutputPreference class.
-
Ordering.new now supports a simple attribute list without asc/desc info. Ascending order is used for all attributes in such case.
- Don't optimize left_join for now, it breaks.
-
Optimize
join.restrict
, split predicate and push down the tree. -
Make sure restrict.restrict keeps applying further optimizations down the tree.
-
BREAKING: minimal version for ruby is now 2.7. Ruby 2.4 and 2.6 have been removed from the test matrix
-
[bmg-redis] optimization of Relation#restrict to avoid a redis scan_each when each-ing or updating on the candidate key.
BREAKING: The keys kept in redis are now always encoded in json, even if the serializer option is set to :marshal. This is necessary to guarantee that the same keys are always used for the same object (Marshal offers no such guarantee). Unfortunately, it may break existing relations (restrict on the candidate key may return no result while tuples exist, if they have been inserted by Bmg 0.20.x).
-
[bmg-redis] insert, update and delete are now properly enclosed in a redis transaction (via
redis.multi
).
-
Add more output references to CSV and XLSX writers. It is now possible to ignore extra attributes, sort tuples before outputting, and reducing redundancy via group attributes. See OutputPreference class.
-
Ordering.new now supports a simple attribute list without asc/desc info. Ascending order is used for all attributes in such case.
- Fix
project.allbut
optimization under ruby <= 2.6
-
Optimize
project.allbut
, remove the butlist from the projection. -
Optimize
autowrap.project
andautowrap.allbut
to push them down when possible.
- Fixed SQL compilation of
summarize.page
to use a subselect. Most DBMS (notably PostgreSQL) do not supportORDER BY max(...)
.
- bmg-redis requires redis >= 4.0 & < 5.0, no longer >= 4.6.
-
POSSIBLY BREAKING: update/delete protocol slightly changed to take a Predicate as second argument.
The experimental feature has been slightly extended with Restrict being able to push its own predicate further down the update calls.
-
Adds
Bmg.mutable
that works likein_memory
but provides a relvar instead of a relation (that is, one that supports insert/delete/update). Mostly useful for unit tests. -
Fix
r.constants(...).constants(...)
raising an error.Warn: This may break your code, if you used
r.constants
(private method) to inspect the Hash on the Operator class.
-
Add more output references to CSV and XLSX writers. It is now possible to ignore extra attributes, sort tuples before outputting, and reducing redundancy via group attributes. See OutputPreference class.
-
Ordering.new now supports a simple attribute list without asc/desc info. Ascending order is used for all attributes in such case.
- Add support for
.extend(:x => :y)
shortcuts. They are equivalent to.extend(:x => ->(t){ t[:y] })
but they compile to SQL.
- Require predicate >= 2.7.1 that has a required bug fix for matching and not_matching
-
Bmg no longer generates
WHERE IN
andWHERE NOT IN
SQL expressions when translatingmatching
andnot_matching
query trees.The main reason is that
WHERE NOT IN
is not safe facing NULL on the target column which make it error prone to use.The change should mostly be backward compatible but may introduce a few bugs (or actually fix them) and break automated test suites. Hence the bump to 0.19.0.
-
CSV reader no longer consider normal text quotes as candidates for quote chars
-
CSV relations now infer their type's attribute list.
- Excel relations now infer their type's attribute list.
-
autowrapped
is now striped away if no attribute would be touched. -
autosummarize
now supports a default summarization to be specified as an option. Only two values are supported for now::same
and:first
, with:same
being the default:relation.autosummarize([:id], {c: :group}, default: :same) relation.autosummarize([:id], {c: :group}, default: :first)
When
:same
is used a TypeError is raised if an unsummed attribute is not functionnaly dependent of the summarization key.When
:first
is used, no such error is raised and the first value encountered is used. The strategy is much faster, but non-deterministic unless all unsummed attributes are functionnaly dependent of the summarization key.
-
Make
page
robust to comparisons with nil/null (nil is greater) and non comparable attributes (simply ignored). -
Calling
count
on a spied relation properly calls the spy's call method. -
If a relation spy responds to
measure
, the latest is called at spy time (instead ofcall
) with a block that the measure method must yield.
- Upgraded
predicate
to2.6.0
(which itself usessexpr
1.0) This makesbmg
compatible with ruby3.0
and3.1
.
- Fix SQL generation of nary join (equality conditions beyond the two first ones were thrown away).
-
Add support for a
:sheet => Int
option to excel reader, to specify which sheet to use for having tabular data. -
Add
first
andlast
summarizer, that take the first and last values seen, according to an ordering. -
Add support for
:postprocessor_condition
in Autowrap, with values:all
(all tuple attributes must be nil for the postprocessor to apply) and:id
(only an:id
attribute must be nil for it to apply).:all
is the default, to preserve backward compatibility.
-
Fix SQL compilation of summarize expressions having a resulting attribute name different from the attribute summarized.
-
Add
distinct_count
summarizer, with SQL compilation.
-
transform
now supports a ruby Class transformer.Integer
,Float
andString
are natively supported (throughInteger()
andFloat()
for the formers,to_s
for the latter). The class must respond toparse
otherwise, which works withDate
,DateTime
,URI
, etc.nil
are always returned unchanged. -
transform
is compiled to SQL CAST expressions when used with scalar classes (String, Integer, Float, Date, DateTime). The compiler is able to split complex transformations into SQL-supported and SQL-unsupported transformations so that everything that can be pushed down the tree is pushed. -
Add a
:preserve
option toimage
that prevents the application ofallbut(on)
on the tuples of theright
relation when creating the resulting attribute. Default behavior unchanged.
-
Add
ungroup
operator. -
Add
unwrap
operator. -
Fix
Summarize.value_by
when usingsymbolize: true
and a default value.
-
Add a Summarize.value_by that allows flipping vertical series to a tuple-valued attribute.
-
Fix CSV read/write usage under ruby-3.0.
-
Bmg.excel
now strips attribute names. -
Relation#transform
now accepts a Hash whose keys are ruby classes, The corresponding transformation is applied to all values belonging to the class. -
Fix
r.rename(...).rename(...)
yielding a private method call error. -
Add
Summarizer.median(x)
as a shortcut forSummarizer.percentile(x, 50)
-
Summarizer.percentile
now returns a decimal number and not an integer. -
Add
Summarizer.percentile_cont
andSummarizer.percentile_disc
like PostgreSQL (for continuous and discrete) ; the defaultpercentile
is the continuous version.
-
Add
Relation#to_xlsx
to create Excel files from Relations. The feature requires 'bmg/writer/xlsx' and the 'write_xlsx' ruby gem, the latter being not a dependency of Bmg at this point. -
Bmg.excel
generates tuples with a:row_num
attribute ; it's a unique index starting at 1. A:row_num
option may be set tofalse
to not generate them, or to a Symbol to choose it's name. -
Bmg.csv
's options now support a:smart
flag that can be set to true (resp. false) if you want (resp. don't want) Bmg to to identify quotes and separators by itself. The flag is currently true by default (unless input is an IO) for backward compatibility reasons but will likely be set to false in the future. Consider using the flag explicitely to prevent surprises. -
Bmg.csv
now correctly handlesIO
andStringIO
input instances. Using such an input with:smart => true
might lead to problems unless the io can be read multiple times in a row. -
distinct
summarizer as beed added. Collects distinct values as and array. -
percentile(:attr, nth)
summarizer as beed added. Collects the nth percentile via a sort method (O(n) memory requirement!) -
Summarizer.by_proc(least){|t,memo| ... }
can now be used to factor a summarizer that works likeeach_with_object
.least
is the initial value, and defaults to nil. -
Relation#each
now returns an Enumerator when called without block. -
Relation#with_attr_list
ensures that an attribute list is known on the type, consuming the first tuple to discover them if needed.
-
Add Relation#count that returns the exact number of tuples in the relation.
The default implementation consumes the tuples to count them. Push-down optimizations implemented for base operators that do not affect the number of resulting tuples.
Sequel::Relation pushes a
SELECT COUNT(*)
to the SQL engine. -
Optimize
allbut.project
, push the projection down the tree. -
Optimize
image.project
, push the projection down the tree if possible. If the newly introduced attribute is kept no optimization is down (yet). In particular a sub-projection is not pushed down the tree, as the semantics need careful thinking. -
Optimize
autowrap.project
, push the projection down the tree if possible.
-
Image's :array option now support an ordering relation. The tuples will then be sorted in the resulting array.
-
Autosummarize now has
same
,group
,y_by_x
andys_by_x
factory methods. -
Autosummarize
y_by_x
andys_by_x
now supportnil
and simply ignore them. -
Add
images
shortcut, that (currently) compiles to a sequence ofimage
. -
Optimize
allbut.allbut
. But lists are merged and only oneallbut
is kept. -
Optimize
image.allbut
in case where the new image attribute is thrown away. The image can be removed alltogether. -
Optimize
transform.allbut
. The allbut can always be pushed down the tree. -
Optimize
transform.project
. The project can always be pushed down the tree. -
Optimize
transform.restrict
. Push whatever can be pushed down the tree. -
Optimize
allbut.page
. Push the page down the tree if allbut is key preserving. -
Optimize
allbut.matching
. Push the matching down the tree, it's always possible. -
Optimize
page.matching
. Push the matching down the tree, as long as the join clause does not use the new image attribute. -
Optimize
autowrap.matching
. Push the matching down the tree, as long as the join clause only uses untouched attributes.
-
Default Relation#type is provided, that returns Bmg::Type::ANY
-
Add Bmg.text_file to easily parse then query text files, with out of the box support for named regular expressions.
-
Add Relation#with_type
-
Add TupleAlgebra#symbolize_keys
-
Relation#transform now supports Regexp transformation. When a match is found, transformed value is the match's
to_s
, otherwise it is nil. -
Add Relation#where(p) as an alias for restrict(p)
-
Add Relation#exclude(p), a shortcut for restrict(!p)
- Relation#to_csv now accepts an OutputPreference object (or hash) allowing to specify an attributes ordering.
-
TupleTransformer now allows using a Hash as attribute transformation. E.g.,
r = Bmg::Relation.new [{:foo => "x"}, {:foo => "y"}] r2 = r.transform(foo: { "x" => 1, "y" => 2 }) r2.to_a ## [{:foo => 1}, {:foo => 2}]
-
Add a Relation::Proxy module that helps constructing object collections on top of Bmg Relations.
-
Add Relation#transform, for easier attribute transformations than #extend.
TRANSFORM uses ruby semantics for now, is not compiled to SQL, and provides no optimization so far. It makes various transformations much easier than before:
## Will apply attr.to_s on every attribute relation.transform(:to_s) relation.transform(&:to_s) ## Will apply attr.to_s.upcase on every tuple attribute relation.transform([:to_s, :upcase]) ## Will selectively apply on attributes relation.transform(:foo => :upcase, :bar => ->(bar){ bar*2 })
EXTEND is supposed to be used for adding attributes only, not transforming existing ones. The introduction of TRANSFORM makes this clearer by providing an official alternative. The aim is to make formal logic (e.g. optimizer) slightly more powerful, through PRE strengtening (in 1.0) along those rules.
-
Add Relation#to_csv, for easier .csv file generation from relational data.
-
Fix path dependency.
- Fix SQL compilation when using INTERSECT predicates. INTERSECT was seen as SQL's INTERSECT, which exists too.
-
Fix SQL compilation of JOIN when operands restrict on attributes having the same name (without being part of the JOIN clause itself). A bug in Predicate was loosing one AND term.
-
Add Relation#left_join operator, with support for SQL generation.
Left join is NOT relational, as it introduces NULLs. For this reason Bmg's left join allows specifying a default_right_tuple to replace those generated NULLs by actual values.
SQL support: Mixing normal
join
s andleft_join
s in an arbitrary order may yield SQL anomalies or semantically wrong generated SQL code. It is currently good practice to avoid normal joins as soon as aleft_join
has been used.
- Fix SQL compilation of
summarize
when the summarization by has more than one attribute.
- Bump predicate dependency to min 2.3.1, to get a bug fix on image optimization.
-
Optimize Image operator. By default, and when possible, the right operand is restricted to only those matching the left tuples before being iterated.
This is possible when the join key (
on
) contains exactly one attribute (after having removed attributes that are known to be bound to a single literal). The matching process requires materializing the left operand for extracting its keys. Restrict then usesPredicate.in(on.first => ...)
.This is now the default option, under the assumption that
right
operand is frequently (much) bigger that left (images frequently occur along 1-N foreign keys). The option fallbacks to a simpler algorithm when both operands are filtered in such way thaton
is empty, which is the second most frequent usage.
- Force Predicate >= 2.2.1 to avoid an wrong optimizations when chaining restrictions with in and eq on same variable.
- Add Relation#y_by_x to get a Hash with y (last) value mapped to each x.
- Allow Sequel's qualified name to be used like Symbols to denote base tables.
- Fix SQL generation when joining with a summarize.
- Fix
autowrap
post-processing on multiple level cases. Guarantees that when the result ofautowrap
contains hashes with onlynil
values, the post- processing will apply.
- Fix SQL compilation of restrict expressions using Predicate.in with nil.
-
Improve SQL compilation of expressions involving multiple JOINs. While the former version used a lot of subqueries and/or common table expressions (aka. WITH) in such cases, this version linearizes all joins with CROSS and INNER JOIN clauses.
-
Optimize
autowrap.autowrap
when applying to the exact same options. A single autowrap is kept in such cases. -
Optimize
join.autowrap
in cases the join can be pushed down the tree and autowrapping applied afterwards. Variants of this optimization are implemented using both left and right operands, in the hope to move autowrap up the tree and remove unnecessary ones. -
Optimize
autowrap.rename
, in case the renaming can be safely pushed down the tree, that is, when it does not apply to wrapped attributes and does not yield after-the-fact autowrapping. -
Optimize
rename
when the actual renaming is empty or canonical (i.e. old and new attribute names are the same). Also simplify the renaming list by removing canonical entries. -
Add a
:but
options toprefix
andsuffix
that allows excluding certain attributes from the resulting rename. -
Add
{x1 => y1, ..., xn => yn}
shorthands tomatching
,not_matching
andimage
operators. Similar to the shorthand introduced in 0.14.6 forjoin
, based on an inversed renaming on the right operand. -
Adds Summarize operator with avg, collect, contact, count, max, min, stddev, sum and variable summarizers. Only avg, count, min, max and sum compile to SQL for now.
-
Prevents unnecessary DISTINCT when making a restrict+allbut chaining that preserves a reduced key, e.g.,
supplies.restrict(sid: 'SID').allbut([:sid])
no longers generates a DISTINCT, even is
sid
is originally part ofsupplies
's primary key. -
Fix Predicate::NotSupportedError being raised when renaming a restriction using a native expression. In such case, type inference now removes the type predicate and replaces it by a tautology.
-
Optimize
extend.allbut
andextend.project
to strip unnecessary extensions, or simplify them to avoid unnecessary computed attributes. -
Optimize
extend.join
but pushing the join down the tree when the extension attributes are not part the join at all. -
Optimize
extend.rename
by pushing the renaming down the tree for all attributes not introduced by the extension, and renaming the extension attributes themselves otherwise.Add TupleAlgebra.rename as a side effect.
-
Optimize
extend.matching
andextend.not_matching
by pushing the match operators down the tree when match attributes do not overlap with extension attributes. -
Slightly improve SQL compilation to avoid generating WITH expressions on join expressions having only simple terms on right. That is, previously,
x.join(y).join(z)
yield a WITH expression forx.join(y)
, while inner join clauses are correctly chained now.x.join(y.join(z))
will still generate a WITH expression fory.join(z)
though.
- Add
left.join(right, {x1 => y1, ..., xn => yn})
as a shorthand forleft.join(right.rename({y1 => x1, ..., yn => xn}, [x1,...,xn])
. This allows joining ala SQL, i.e. with attributes differing on each operand. A difference with SQL, though, is that theys
attributes are no longer present in the join result.
- Optimize
extend.page
by pushing the page down the tree when extension attributes and page ordering attributes are disjoint.
- Fix error when tracing expressions involving autosummarizations with YByX and YsByX
-
Add support for optional type checking through Type#with_typecheck and Type#without_typecheck.
Type checking is disabled by default, and only check for attribute presence, absence and no-clash policy on the various available operators.
-
Add Relation#materialize (Relation::Materialized) that ensures that the operand is consumed only once and the result kept in memory if reused later one.
- Added a schema in Sequel type inference mechanism. Otherwise, indices are loaded multiple times because Sequel itself does not cache them. (not part of the cache_schema: true) behavior.
- Fix Operator::Project mutating origin tuples.
-
BREAKING CHANGE (since 0.10.0 actually): most update fail when trying to make them on a Relation::Sequel instance. SQL compilation mechanism lacks the update rules implemented in various operators.
-
Fix the Sequel translation in presence of a WHERE clause involving IN with subqueries.
-
SQL compilation now support the
constants
operator. -
SQL compilation now support nary-union, intersect and except, provided they have the same modifier.
-
Optimization: Any relation unioned with empty returns itself.
-
Optimization: All relations return self if allbut is called with and empty attribute list.
-
Optimization: calling
constants
on empty returns an empty relation.
-
Add NotMatching operator, with restrict optimization and SQL compilation.
-
Optimize
autowrap.page
by pushing the page down the tree when autowrap attributes are known and the page ordering does not touch them. -
Enhance key inference on Join, when joining on a candidate key of the right operand. In such case, the left keys can all be kept unchanged.
-
Fix a SQL compilation bug with join expressions in subqueries. Requalification of table names was forgotten in inner join clauses.
-
Add Prefix and Suffix shortcut operators for longer Rename expressions. Attrlist must be known on operand's type.
-
Add Join operator, with explicit attribute list for join key.
-
Attrlist, Key and Predicate inference is now correctly implemented on autowrap.
- Fix Page implementation to support full ordering, e.g
[[:name, :desc], [:id, :asc]]
-
BREAKING CHANGE:
rxmatch
is now case sensitive by default. -
BREAKING CHANGE: you should now use
Bmg.sequel(:table_name, db)
instead ofBmg.sequel(db[:table_name])
to avoid unnecessary long SQL from being generated. -
rxmatch
becomes a shortcut operator, that translates to arestrict
with a OR using Predicate#match. This aims at reusing all existingrestrict
optimizations forrxmatch
, with a free implementation cost. -
Add Restrict optimization: pust it over
autowrap
when the list of attributes are known statically and the predicate does not use any of the wrapped new attributes. -
Add Page optimizations: push it over
constants,
renameand
image` when possible. -
The Sequel contribution now generates valid SQL in all cases and performs necessary optimizations.
- Fix Rxmatch that now applies matching in a case insensitive way by
default when used with a String. A
case_sensitive: true
option can be specified to change that behavior.
-
Add the Page operator: filters on n tuples according to an ordering and a given page size and page index. No optimization implemented yet.
-
Add the Rxmatch operator: filters tuples whose subset of attributes match a given string or regular expression when concatanated together by a space. Restrict optimization is implemented.
-
The expression
r.matching(...)
now correctly preserves the same type asr
.
- Add the Group operator: groups some attributes of the operand as a new relation-valued attribute. Restrict optimization is implemented.
- Fixes the restrict optimization on Matching, that led forgetting about the join key
- Add the Matching operator: filters the left operand to tuples that have at least a matching tuple on right operand on a given shared join key. Restrict optimization is implemented.
- The default implementation of
Relation::Type
now exposes the relation predicate, when known.
-
Add
Relation#ys_by_x
consumption method, that converts a relation to a Hash mappingtuple[x]
keys to[tuple[y]]
values. This is similar to a given summary with autosummarize, but provided as a consumption method. The options support specifying an order and whether ys must be distinct. -
Add
Relation#empty?
-
Add
Relation#visit
that allows visiting an expression tree with a block. The block is yield with every (relation, parent) pair in a depth first search walk of the tree. -
Relation#to_s
andRelation#inspect
now provide a friendly representation of the expression tree. This is used to improve what#debug
prints on its argument. -
Relation::Sequel#insert
now inserts known attribute constants inherited from the relvar predicate. This means thatrel.restrict(x: 2).allbut([:x]).insert(y: 7)
will actually insert the tuple
{x: 2, y: 7}
in the underlying SQL table. -
Fix
rename.restrict
optimization that failed with an UnsupportedError on native predicate. -
Objects obtained through
Bmg.csv
andBmg.excel
and now real Relation instances, and no longer tuple enumerabled. -
Add a spying mechanism that allows analyzing tree expressions just before each is called. This works by calling
spied(spy)
on relations, just like other operators. The spied operator always stays on top of the expression tree, by a delegation mechanism when algebra methods are called. When the relation is eventually consumed, it calls thespy
argument with itself. The spy has the opportunity to inspect the expression tree, and act accordingly (e.g. raising an error if something strange is detected).
-
Update mechanism (insert, delete & update) is provided for operators yielding no update ambiguity: allbut, constants, extend, project, rename.
-
Optimization: push restrictions over autosummarize, rename & restrict.
-
Added Relation#to_json
-
Predicate required version is bumped to 1.3.0, that contains an important security fix.
-
Fix Image#restrict optimization that pushed down restrictions on right attributes that do not exist for it.
-
Optimize tautolological restrictions by always returning the operand itself. This way, it is no longer necessary to use
unless p.tautology?
before usingrestrict
. -
Optimize contraduction restrictions by always returning an empty relation. This will further optimize since many optimizations are implemented on Relation::Empty itself.
-
Predicate dependency bumped to 1.2.0 to get a few bug fixes.
-
Optimization: push restrictions over image & constants.
-
Optimization: stack subsequent unions as only one n-adic operator.
-
Introduce
Relation.empty
for empty relations taken into account by the optimization.
-
Add the Constants operator: extends the operand's tuple with attributes whose values are known statically. This is a special case of extension where values are not Proc but constants.
-
Add the Image operator: extends the operand's tuple with the relational image on a right operand. Unlike Alf, the join attributes are explicit for now.
-
Add the Restrict operator: restrict filters the operand tuples to those for which a predicate evaluates to true.
-
Add the Union operator: union returns both the tuples from left and right operands, but strips the duplicates, if any.
-
Add connectivity to real SQL database, through Sequel. require 'bmg/sequel' is needed first. It contributes a
Bmg.sequel(dataset)
method that returns relation instances over Sequel dataset objects. -
Optimization: push restrict over allbut, project, union & constants.
-
Optimization: convert double restrict to a predicate conjunction.
-
Add the Extend operator: extends operand tuples with attributes resulting from specified computations.
-
Add Relation#one (and Relation#one_or_nil), that returns the tuple of a singleton or raises an error (or returns nil).
- Options passed to Reader::Excel and Bmg.excel are passed, unchanged to Roo::Spreadsheet. Now, all options from Roo are thus compatible with Bmg.
- Birth.