-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add method to get URL Status (returns an URLItem) #92
Conversation
Implemented only for MemoryFrontier and RocksDb Unfortunately the internal storage doesn't make a distinction between Discovered and Known URLs which have to be refetched So all scheduled items will be returned as ill always return KwownURLItem or Status.NOT_FOUND runtime exception Signed-off-by: Laurent Klock <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few minor issues and questions. Great to have additional tests!
service/src/main/java/crawlercommons/urlfrontier/service/memory/InternalURL.java
Outdated
Show resolved
Hide resolved
service/src/main/java/crawlercommons/urlfrontier/service/memory/MemoryFrontierService.java
Outdated
Show resolved
Hide resolved
service/src/main/java/crawlercommons/urlfrontier/service/rocksdb/RocksDBService.java
Outdated
Show resolved
Hide resolved
(To be done in separate PR) Signed-off-by: Laurent Klock <[email protected]>
Signed-off-by: Laurent Klock <[email protected]>
Signed-off-by: Laurent Klock <[email protected]>
Signed-off-by: Laurent Klock <[email protected]>
service/src/main/java/crawlercommons/urlfrontier/service/rocksdb/RocksDBService.java
Outdated
Show resolved
Hide resolved
Signed-off-by: Laurent Klock <[email protected]>
Signed-off-by: Laurent Klock <[email protected]>
Signed-off-by: Laurent Klock <[email protected]>
Signed-off-by: Laurent Klock <[email protected]>
Thanks @klockla |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see comment in the conversation re-client side
Signed-off-by: Laurent Klock <[email protected]>
Added the method in client. |
client/src/main/java/crawlercommons/urlfrontier/client/GetURLStatus.java
Show resolved
Hide resolved
client/src/main/java/crawlercommons/urlfrontier/client/GetURLStatus.java
Outdated
Show resolved
Hide resolved
client/src/main/java/crawlercommons/urlfrontier/client/GetURLStatus.java
Outdated
Show resolved
Hide resolved
private String crawl; | ||
|
||
@Option( | ||
names = {"-k", "--key"}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see comment about key being generated by default on the server side. Should be optional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
client/src/main/java/crawlercommons/urlfrontier/client/GetURLStatus.java
Outdated
Show resolved
Hide resolved
client/src/main/java/crawlercommons/urlfrontier/client/GetURLStatus.java
Outdated
Show resolved
Hide resolved
thanks a lot @klockla - I gave it a try and it seems to work fine |
Added missing license header Signed-off-by: Laurent Klock <[email protected]>
Tested, works great! Thanks @klockla, this is a great contribution to the project |
Add a new API method to retrieve information about an URL
Implemented only for MemoryFrontier and RocksDb
(may fullfill partially #57 )
Unfortunately the internal storage doesn't make a distinction between Discovered and Known URLs which have to be refetched (or I have missed the point)
So all scheduled items will be returned as a KnownURLItem (with a refetch date equal to 0 for completed items)
If the URL is not in URLFrontier, the method will return io.grpc.Status.NOT_FOUND.asRuntimeException()
Signed-off-by: Laurent Klock [email protected]