Allows a graph to limit the number of results from a triple() operation #346

kusamau · 2013-11-28T12:29:12Z

I realized that in some situation the triples() method may return a huge number of results and if you are implementing something like 'paging' may be a critical performance issue.
Apart from more complicated solution as caching an easy, partial solution is to add a LIMIT, OFFSET and a necessary GROUP BY clauses to the SPARQLStore.triples(). Those parameters can be passed augmenting the Graph object, which is passes as 'context' to the triples method.
I implemented such solution with a few lines and I hope other may agree in this.

For example, using the LIMIT value is as easy as :

graph_instance.LIMIT = 10
graph_instance.OFFSET = 2 #retrieve the results from 20 to 30
# GROUP BY is done automatically on the first None of the triple, the subject in this case
graph_instance.triples((None, None, None))

adding a SPARQL 'LIMIT' clause. For instance, the LIMIT value may be set adding graph_instance.LIMIT = 10

gromgull · 2013-11-28T15:05:35Z

Hmm - I dont know if this is necessary in the general case. graph.triples returns a generator - you can use itertools.islice to return only some if you are not interested in all.

For the SPARQLStore, it should maybe fetch only N results at a time, and then stream the rest. Even then, a keyword param to .triples seems cleaner than a member variable?

gromgull · 2013-11-28T15:05:57Z

oh - nevermind my first comment, your code is only concerned with the SPARQLStore

kusamau · 2013-11-28T15:10:15Z

anyway thanks for taking alive the conversation :)

pagination and reduce the number of results

of hasattr

instead of hasattr

joernhees · 2013-12-03T01:02:02Z

rdflib/plugins/stores/sparqlstore.py

@@ -305,6 +326,30 @@ def triples(self, (s, p, o), context=None):
        query = "SELECT %s WHERE { %s %s %s }" % \
            (v, s.n3(), p.n3(), o.n3())

+        #The ORDER BY is necessa


necessa+ry+

joernhees · 2013-12-03T01:05:46Z

restarted the travis build, but it immediately gets stuck :-/

joernhees · 2013-12-03T01:26:49Z

travis build is getting stuck because travis dropped support for python 2.5, should be fixed now, see #349

joernhees · 2013-12-03T01:33:13Z

@kusamau could you squash your commits and rebase on master?

also: maybe include an example in the doc string for an end user?

Adds OFFSET and GROUP BY to the triples() method in order to allow pagination and reduce the number of results Exception was possible becayse was by mistake was used a getattr instead of hasattr Exception was possible becayse was by mistake was used a getattr instead of hasattr By mistake committed few lines from a test class removes an unused import Includes better sparqlestore.triple() documentation minor text update CLOSED - task 8: Fetching citation metadata on the server side https://github.com/cedadev/djcharme/issues/issue/8

joernhees · 2013-12-05T11:39:07Z

.gitignore

@@ -14,3 +14,4 @@ build/
 /.tox/
 /docs/draft/
 *~
+/.settings


this seems specific to your dev environment, maybe you want to have it in ~/.gitignore?

joernhees · 2013-12-05T11:40:21Z

@kusamau see my comment above and i'll hit merge

kusamau · 2013-12-05T12:51:55Z

Should look better now. Thanks joe! :)

joernhees · 2013-12-05T15:37:11Z

@kusamau no, i meant: http://git-scm.com/book/en/Git-Tools-Rewriting-History#Squashing-Commits

(you have a lot of commits that touch a lot of files and revert themselves, so squash them)

kusamau · 2013-12-05T16:31:12Z

I did it already. If you see the #5af120e it contains all the comments of the previous commits. Obviously I may be wrong but i did exactly that command, I mean

git rebase -i HEAD~9

"squashing" all but the first commits. May be close the "pull request" and open a new one?

joernhees · 2013-12-09T11:13:03Z

@kusamau i can't see that...
5af120e only contains changes to .settings.
you probably didn't push the squashed commits but only have them locally. should be easy to fix with a git push --force to update your github master branch.

kusamau · 2013-12-09T16:32:13Z

I tried a git push --force but the only message I receive is Everything up-to-date However I can see the problem. let me try something more.

Adds OFFSET and GROUP BY to the triples() method in order to allow pagination and reduce the number of results Exception was possible becayse was by mistake was used a getattr instead of hasattr Exception was possible becayse was by mistake was used a getattr instead of hasattr By mistake committed few lines from a test class removes an unused import Includes better sparqlestore.triple() documentation minor text update Simplified the LIMIT condition Adds OFFSET and GROUP BY to the triples() method in order to allow pagination and reduce the number of results Exception was possible becayse was by mistake was used a getattr instead of hasattr Exception was possible becayse was by mistake was used a getattr instead of hasattr By mistake committed few lines from a test class removes an unused import Includes better sparqlestore.triple() documentation minor text update Remove an item to ignore only locally to my system

kusamau · 2013-12-09T17:24:10Z

I think is a problem of this pulling request. I continue to suspect that the solution should be to close and reopen the pull request.

coveralls · 2013-12-09T17:25:40Z

Coverage increased (+2.17%) when pulling e387b1e on kusamau:master into 04fe227 on RDFLib:master.

joernhees · 2013-12-09T21:07:33Z

@kusamau well, you're the only one having access to the https://github.com/kusamau/rdflib repository and its master branch from where you started this pull request. Only if you update that branch the commits in this pull request change. As i can't do that and it didn't seem to work for you i used a workaround by stashing your commits and referring this pull request.

coveralls · 2013-12-09T21:12:29Z

Coverage increased (+0.08%) when pulling e387b1e on kusamau:master into 04fe227 on RDFLib:master.

gromgull · 2013-12-11T18:49:58Z

I don't understand how to use this code. You pass in a magic context object, that has some special attributes.
The example in the doc-string is non-sensical:

g.LIMIT = limit
g.OFFSET = offset
triple_generator = graph.triples(mytriple)
        #do something
#Removes LIMIT and OFFSET if not required for the next triple() calls
del g.LIMIT        
del g.OFFSET

What is g and how does it relate to the graph? And it's not passed in to the call.

How do I construct a context with these particular attributes?

I would change this, and instead simple add a limit, offset and order_by keyword parameter to the triples call.

kusamau · 2013-12-11T19:54:09Z

I realize now that the example is incorrect. it should be

agraph.LIMIT = limit
agraph.OFFSET = offset
triple_generator = agraph.triples(mytriple)
        #do something
#Removes LIMIT and OFFSET if not required for the next triple() calls
del agraph.LIMIT        
del agraph.OFFSET

@PuckCh

Update package 2013/12/31 RELEASE 4.1 ====================== This is a new minor version RDFLib, which includes a handful of new features: * A TriG parser was added (we already had a serializer) - it is up-to-date wrt. to the newest spec from: http://www.w3.org/TR/trig/ * The Turtle parser was made up to date wrt. to the latest Turtle spec. * Many more tests have been added - RDFLib now has over 2000 (passing!) tests. This is mainly thanks to the NT, Turtle, TriG, NQuads and SPARQL test-suites from W3C. This also included many fixes to the nt and nquad parsers. * ```ConjunctiveGraph``` and ```Dataset``` now support directly adding/removing quads with ```add/addN/remove``` methods. * ```rdfpipe``` command now supports datasets, and reading/writing context sensitive formats. * Optional graph-tracking was added to the Store interface, allowing empty graphs to be tracked for Datasets. The DataSet class also saw a general clean-up, see: RDFLib/rdflib#309 * After long deprecation, ```BackwardCompatibleGraph``` was removed. Minor enhancements/bugs fixed: ------------------------------ * Many code samples in the documentation were fixed thanks to @PuckCh * The new ```IOMemory``` store was optimised a bit * ```SPARQL(Update)Store``` has been made more generic. * MD5 sums were never reinitialized in ```rdflib.compare``` * Correct default value for empty prefix in N3 [#312]RDFLib/rdflib#312 * Fixed tests when running in a non UTF-8 locale [#344]RDFLib/rdflib#344 * Prefix in the original turtle have an impact on SPARQL query resolution [#313]RDFLib/rdflib#313 * Duplicate BNode IDs from N3 Parser [#305]RDFLib/rdflib#305 * Use QNames for TriG graph names [#330]RDFLib/rdflib#330 * \uXXXX escapes in Turtle/N3 were fixed [#335]RDFLib/rdflib#335 * A way to limit the number of triples retrieved from the ```SPARQLStore``` was added [#346]RDFLib/rdflib#346 * Dots in localnames in Turtle [#345]RDFLib/rdflib#345 [#336]RDFLib/rdflib#336 * ```BNode``` as Graph's public ID [#300]RDFLib/rdflib#300 * Introduced ordering of ```QuotedGraphs``` [#291]RDFLib/rdflib#291 2013/05/22 RELEASE 4.0.1 ======================== Following RDFLib tradition, some bugs snuck into the 4.0 release. This is a bug-fixing release: * the new URI validation caused lots of problems, but is nescessary to avoid ''RDF injection'' vulnerabilities. In the spirit of ''be liberal in what you accept, but conservative in what you produce", we moved validation to serialisation time. * the ```rdflib.tools``` package was missing from the ```setup.py``` script, and was therefore not included in the PYPI tarballs. * RDF parser choked on empty namespace URI [#288](RDFLib/rdflib#288) * Parsing from ```sys.stdin``` was broken [#285](RDFLib/rdflib#285) * The new IO store had problems with concurrent modifications if several graphs used the same store [#286](RDFLib/rdflib#286) * Moved HTML5Lib dependency to the recently released 1.0b1 which support python3 2013/05/16 RELEASE 4.0 ====================== This release includes several major changes: * The new SPARQL 1.1 engine (rdflib-sparql) has been included in the core distribution. SPARQL 1.1 queries and updates should work out of the box. * SPARQL paths are exposed as operators on ```URIRefs```, these can then be be used with graph.triples and friends: ```py # List names of friends of Bob: g.triples(( bob, FOAF.knows/FOAF.name , None )) # All super-classes: g.triples(( cls, RDFS.subClassOf * '+', None )) ``` * a new ```graph.update``` method will apply SPARQL update statements * Several RDF 1.1 features are available: * A new ```DataSet``` class * ```XMLLiteral``` and ```HTMLLiterals``` * ```BNode``` (de)skolemization is supported through ```BNode.skolemize```, ```URIRef.de_skolemize```, ```Graph.skolemize``` and ```Graph.de_skolemize``` * Handled of Literal equality was split into lexical comparison (for normal ```==``` operator) and value space (using new ```Node.eq``` methods). This introduces some slight backwards incomaptible changes, but was necessary, as the old version had inconsisten hash and equality methods that could lead the literals not working correctly in dicts/sets. The new way is more in line with how SPARQL 1.1 works. For the full details, see: https://github.com/RDFLib/rdflib/wiki/Literal-reworking * Iterating over ```QueryResults``` will generate ```ResultRow``` objects, these allow access to variable bindings as attributes or as a dict. I.e. ```py for row in graph.query('select ... ') : print row.age, row["name"] ``` * "Slicing" of Graphs and Resources as syntactic sugar: ([#271](RDFLib/rdflib#271)) ```py graph[bob : FOAF.knows/FOAF.name] -> generator over the names of Bobs friends ``` * The ```SPARQLStore``` and ```SPARQLUpdateStore``` are now included in the RDFLib core * The documentation has been given a major overhaul, and examples for most features have been added. Minor Changes: -------------- * String operations on URIRefs return new URIRefs: ([#258](RDFLib/rdflib#258)) ```py >>> URIRef('http://example.org/')+'test rdflib.term.URIRef('http://example.org/test') ``` * Parser/Serializer plugins are also found by mime-type, not just by plugin name: ([#277](RDFLib/rdflib#277)) * ```Namespace``` is no longer a subclass of ```URIRef``` * URIRefs and Literal language tags are validated on construction, avoiding some "RDF-injection" issues ([#266](RDFLib/rdflib#266)) * A new memory store needs much less memory when loading large graphs ([#268](RDFLib/rdflib#268)) * Turtle/N3 serializer now supports the base keyword correctly ([#248](RDFLib/rdflib#248)) * py2exe support was fixed ([#257](RDFLib/rdflib#257)) * Several bugs in the TriG serializer were fixed * Several bugs in the NQuads parser were fixed

kusamau added 2 commits November 28, 2013 12:06

Allows a graph to limit the number of results from a triple() operation

8d15623

adding a SPARQL 'LIMIT' clause. For instance, the LIMIT value may be set adding graph_instance.LIMIT = 10

Simplified the LIMIT condition

1086dd9

kusamau added 5 commits November 29, 2013 13:07

Adds OFFSET and GROUP BY to the triples() method in order to allow

04f944d

pagination and reduce the number of results

Exception was possible becayse was by mistake was used a getattr instead

2c62a62

of hasattr

Exception was possible becayse was by mistake was used a getattr

c4d8c10

instead of hasattr

By mistake committed few lines from a test class

a384c70

removes an unused import

d0c5c73

joernhees reviewed Dec 3, 2013
View reviewed changes

kusamau added 3 commits December 5, 2013 10:16

Includes better sparqlestore.triple() documentation

8973f39

minor text update

4c8394e

joernhees reviewed Dec 5, 2013
View reviewed changes

Remove an item to ignore only locally to my system

3e589b9

kusamau added 2 commits December 9, 2013 17:15

Merge branch 'master' of https://github.com/kusamau/rdflib.git

e387b1e

joernhees closed this Dec 9, 2013

joernhees reopened this Dec 9, 2013

joernhees closed this in 5175f80 Dec 9, 2013

gromgull reopened this Dec 11, 2013

Corrects the triples() example

0776161

gromgull closed this in 3eeaf70 Dec 31, 2013

joernhees added the performance label Mar 4, 2014

pyup-bot mentioned this pull request Nov 8, 2016

Update rdflib to 4.2.1 mytardis/mytardis#733

Closed

This was referenced Jan 16, 2017

Initial Update mozilla/addons-server#4303

Closed

Update rdflib to 4.2.1 mozilla/addons-server#4390

Closed

pyup-bot mentioned this pull request Jan 29, 2017

Update rdflib to 4.2.2 mytardis/mytardis#815

Merged

This was referenced Mar 16, 2017

Initial Update mozilla/amo-validator#510

Closed

Update rdflib to 4.2.2 mozilla/amo-validator#515

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allows a graph to limit the number of results from a triple() operation #346

Allows a graph to limit the number of results from a triple() operation #346

kusamau commented Nov 28, 2013

gromgull commented Nov 28, 2013

gromgull commented Nov 28, 2013

kusamau commented Nov 28, 2013

joernhees Dec 3, 2013

joernhees commented Dec 3, 2013

joernhees commented Dec 3, 2013

joernhees commented Dec 3, 2013

joernhees Dec 5, 2013

joernhees commented Dec 5, 2013

kusamau commented Dec 5, 2013

joernhees commented Dec 5, 2013

kusamau commented Dec 5, 2013

joernhees commented Dec 9, 2013

kusamau commented Dec 9, 2013

kusamau commented Dec 9, 2013

coveralls commented Dec 9, 2013

joernhees commented Dec 9, 2013

coveralls commented Dec 9, 2013

gromgull commented Dec 11, 2013

kusamau commented Dec 11, 2013

Allows a graph to limit the number of results from a triple() operation #346

Allows a graph to limit the number of results from a triple() operation #346

Conversation

kusamau commented Nov 28, 2013

gromgull commented Nov 28, 2013

gromgull commented Nov 28, 2013

kusamau commented Nov 28, 2013

joernhees Dec 3, 2013

Choose a reason for hiding this comment

joernhees commented Dec 3, 2013

joernhees commented Dec 3, 2013

joernhees commented Dec 3, 2013

joernhees Dec 5, 2013

Choose a reason for hiding this comment

joernhees commented Dec 5, 2013

kusamau commented Dec 5, 2013

joernhees commented Dec 5, 2013

kusamau commented Dec 5, 2013

joernhees commented Dec 9, 2013

kusamau commented Dec 9, 2013

kusamau commented Dec 9, 2013

coveralls commented Dec 9, 2013

joernhees commented Dec 9, 2013

coveralls commented Dec 9, 2013

gromgull commented Dec 11, 2013

kusamau commented Dec 11, 2013