-
Notifications
You must be signed in to change notification settings - Fork 572
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allows a graph to limit the number of results from a triple() operation #346
Conversation
adding a SPARQL 'LIMIT' clause. For instance, the LIMIT value may be set adding graph_instance.LIMIT = 10
Hmm - I dont know if this is necessary in the general case. graph.triples returns a generator - you can use For the SPARQLStore, it should maybe fetch only N results at a time, and then stream the rest. Even then, a keyword param to .triples seems cleaner than a member variable? |
oh - nevermind my first comment, your code is only concerned with the SPARQLStore |
anyway thanks for taking alive the conversation :) |
pagination and reduce the number of results
instead of hasattr
@@ -305,6 +326,30 @@ def triples(self, (s, p, o), context=None): | |||
query = "SELECT %s WHERE { %s %s %s }" % \ | |||
(v, s.n3(), p.n3(), o.n3()) | |||
|
|||
#The ORDER BY is necessa |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
necessa+ry+
restarted the travis build, but it immediately gets stuck :-/ |
travis build is getting stuck because travis dropped support for python 2.5, should be fixed now, see #349 |
@kusamau could you squash your commits and rebase on master? also: maybe include an example in the doc string for an end user? |
Adds OFFSET and GROUP BY to the triples() method in order to allow pagination and reduce the number of results Exception was possible becayse was by mistake was used a getattr instead of hasattr Exception was possible becayse was by mistake was used a getattr instead of hasattr By mistake committed few lines from a test class removes an unused import Includes better sparqlestore.triple() documentation minor text update CLOSED - task 8: Fetching citation metadata on the server side https://github.com/cedadev/djcharme/issues/issue/8
@@ -14,3 +14,4 @@ build/ | |||
/.tox/ | |||
/docs/draft/ | |||
*~ | |||
/.settings |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this seems specific to your dev environment, maybe you want to have it in ~/.gitignore?
@kusamau see my comment above and i'll hit merge |
Should look better now. Thanks joe! :) |
@kusamau no, i meant: http://git-scm.com/book/en/Git-Tools-Rewriting-History#Squashing-Commits (you have a lot of commits that touch a lot of files and revert themselves, so squash them) |
I did it already. If you see the #5af120e it contains all the comments of the previous commits. Obviously I may be wrong but i did exactly that command, I mean git rebase -i HEAD~9 "squashing" all but the first commits. May be close the "pull request" and open a new one? |
I tried a |
Adds OFFSET and GROUP BY to the triples() method in order to allow pagination and reduce the number of results Exception was possible becayse was by mistake was used a getattr instead of hasattr Exception was possible becayse was by mistake was used a getattr instead of hasattr By mistake committed few lines from a test class removes an unused import Includes better sparqlestore.triple() documentation minor text update Simplified the LIMIT condition Adds OFFSET and GROUP BY to the triples() method in order to allow pagination and reduce the number of results Exception was possible becayse was by mistake was used a getattr instead of hasattr Exception was possible becayse was by mistake was used a getattr instead of hasattr By mistake committed few lines from a test class removes an unused import Includes better sparqlestore.triple() documentation minor text update Remove an item to ignore only locally to my system
I think is a problem of this pulling request. I continue to suspect that the solution should be to close and reopen the pull request. |
@kusamau well, you're the only one having access to the https://github.com/kusamau/rdflib repository and its master branch from where you started this pull request. Only if you update that branch the commits in this pull request change. As i can't do that and it didn't seem to work for you i used a workaround by stashing your commits and referring this pull request. |
I don't understand how to use this code. You pass in a magic context object, that has some special attributes. g.LIMIT = limit
g.OFFSET = offset
triple_generator = graph.triples(mytriple)
#do something
#Removes LIMIT and OFFSET if not required for the next triple() calls
del g.LIMIT
del g.OFFSET What is How do I construct a context with these particular attributes? I would change this, and instead simple add a limit, offset and order_by keyword parameter to the triples call. |
I realize now that the example is incorrect. it should be
|
Update package 2013/12/31 RELEASE 4.1 ====================== This is a new minor version RDFLib, which includes a handful of new features: * A TriG parser was added (we already had a serializer) - it is up-to-date wrt. to the newest spec from: http://www.w3.org/TR/trig/ * The Turtle parser was made up to date wrt. to the latest Turtle spec. * Many more tests have been added - RDFLib now has over 2000 (passing!) tests. This is mainly thanks to the NT, Turtle, TriG, NQuads and SPARQL test-suites from W3C. This also included many fixes to the nt and nquad parsers. * ```ConjunctiveGraph``` and ```Dataset``` now support directly adding/removing quads with ```add/addN/remove``` methods. * ```rdfpipe``` command now supports datasets, and reading/writing context sensitive formats. * Optional graph-tracking was added to the Store interface, allowing empty graphs to be tracked for Datasets. The DataSet class also saw a general clean-up, see: RDFLib/rdflib#309 * After long deprecation, ```BackwardCompatibleGraph``` was removed. Minor enhancements/bugs fixed: ------------------------------ * Many code samples in the documentation were fixed thanks to @PuckCh * The new ```IOMemory``` store was optimised a bit * ```SPARQL(Update)Store``` has been made more generic. * MD5 sums were never reinitialized in ```rdflib.compare``` * Correct default value for empty prefix in N3 [#312]RDFLib/rdflib#312 * Fixed tests when running in a non UTF-8 locale [#344]RDFLib/rdflib#344 * Prefix in the original turtle have an impact on SPARQL query resolution [#313]RDFLib/rdflib#313 * Duplicate BNode IDs from N3 Parser [#305]RDFLib/rdflib#305 * Use QNames for TriG graph names [#330]RDFLib/rdflib#330 * \uXXXX escapes in Turtle/N3 were fixed [#335]RDFLib/rdflib#335 * A way to limit the number of triples retrieved from the ```SPARQLStore``` was added [#346]RDFLib/rdflib#346 * Dots in localnames in Turtle [#345]RDFLib/rdflib#345 [#336]RDFLib/rdflib#336 * ```BNode``` as Graph's public ID [#300]RDFLib/rdflib#300 * Introduced ordering of ```QuotedGraphs``` [#291]RDFLib/rdflib#291 2013/05/22 RELEASE 4.0.1 ======================== Following RDFLib tradition, some bugs snuck into the 4.0 release. This is a bug-fixing release: * the new URI validation caused lots of problems, but is nescessary to avoid ''RDF injection'' vulnerabilities. In the spirit of ''be liberal in what you accept, but conservative in what you produce", we moved validation to serialisation time. * the ```rdflib.tools``` package was missing from the ```setup.py``` script, and was therefore not included in the PYPI tarballs. * RDF parser choked on empty namespace URI [#288](RDFLib/rdflib#288) * Parsing from ```sys.stdin``` was broken [#285](RDFLib/rdflib#285) * The new IO store had problems with concurrent modifications if several graphs used the same store [#286](RDFLib/rdflib#286) * Moved HTML5Lib dependency to the recently released 1.0b1 which support python3 2013/05/16 RELEASE 4.0 ====================== This release includes several major changes: * The new SPARQL 1.1 engine (rdflib-sparql) has been included in the core distribution. SPARQL 1.1 queries and updates should work out of the box. * SPARQL paths are exposed as operators on ```URIRefs```, these can then be be used with graph.triples and friends: ```py # List names of friends of Bob: g.triples(( bob, FOAF.knows/FOAF.name , None )) # All super-classes: g.triples(( cls, RDFS.subClassOf * '+', None )) ``` * a new ```graph.update``` method will apply SPARQL update statements * Several RDF 1.1 features are available: * A new ```DataSet``` class * ```XMLLiteral``` and ```HTMLLiterals``` * ```BNode``` (de)skolemization is supported through ```BNode.skolemize```, ```URIRef.de_skolemize```, ```Graph.skolemize``` and ```Graph.de_skolemize``` * Handled of Literal equality was split into lexical comparison (for normal ```==``` operator) and value space (using new ```Node.eq``` methods). This introduces some slight backwards incomaptible changes, but was necessary, as the old version had inconsisten hash and equality methods that could lead the literals not working correctly in dicts/sets. The new way is more in line with how SPARQL 1.1 works. For the full details, see: https://github.com/RDFLib/rdflib/wiki/Literal-reworking * Iterating over ```QueryResults``` will generate ```ResultRow``` objects, these allow access to variable bindings as attributes or as a dict. I.e. ```py for row in graph.query('select ... ') : print row.age, row["name"] ``` * "Slicing" of Graphs and Resources as syntactic sugar: ([#271](RDFLib/rdflib#271)) ```py graph[bob : FOAF.knows/FOAF.name] -> generator over the names of Bobs friends ``` * The ```SPARQLStore``` and ```SPARQLUpdateStore``` are now included in the RDFLib core * The documentation has been given a major overhaul, and examples for most features have been added. Minor Changes: -------------- * String operations on URIRefs return new URIRefs: ([#258](RDFLib/rdflib#258)) ```py >>> URIRef('http://example.org/')+'test rdflib.term.URIRef('http://example.org/test') ``` * Parser/Serializer plugins are also found by mime-type, not just by plugin name: ([#277](RDFLib/rdflib#277)) * ```Namespace``` is no longer a subclass of ```URIRef``` * URIRefs and Literal language tags are validated on construction, avoiding some "RDF-injection" issues ([#266](RDFLib/rdflib#266)) * A new memory store needs much less memory when loading large graphs ([#268](RDFLib/rdflib#268)) * Turtle/N3 serializer now supports the base keyword correctly ([#248](RDFLib/rdflib#248)) * py2exe support was fixed ([#257](RDFLib/rdflib#257)) * Several bugs in the TriG serializer were fixed * Several bugs in the NQuads parser were fixed
I realized that in some situation the triples() method may return a huge number of results and if you are implementing something like 'paging' may be a critical performance issue.
Apart from more complicated solution as caching an easy, partial solution is to add a LIMIT, OFFSET and a necessary GROUP BY clauses to the SPARQLStore.triples(). Those parameters can be passed augmenting the Graph object, which is passes as 'context' to the triples method.
I implemented such solution with a few lines and I hope other may agree in this.
For example, using the LIMIT value is as easy as :