-
Notifications
You must be signed in to change notification settings - Fork 633
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Eroshareripper now uses mirror for data #29
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, other than a few comments.
Also, rather than relying on a rehost, we could actually set up the redirects to internet archive / wayback machine on our own. There's basically two things to look for: the site pages (which are backed up in wayback machine) and the content, which is also hosted on the internet archive, and we can pull the link format directly out of a rehost like eroshae and extrapolate the redirect.
This will work for now and unblock ripping of eroshare links so that's awesome!
@@ -51,22 +51,34 @@ public void downloadURL(URL url, int index) { | |||
} | |||
@Override | |||
public boolean canRip(URL url) { | |||
Pattern p = Pattern.compile("^https?://[w.]*eroshare.com/([a-zA-Z0-9\\-_]+)/?$"); | |||
Pattern p = Pattern.compile("^https?://eroshae.com/([a-zA-Z0-9\\-_]+)/?$"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should probably stick to allowing optional www
(which is what the
[w.]*was accomplishing although
(www[.])?` documents the intent better and would likely be more efficient under the regex engine.
@@ -79,12 +91,14 @@ public Document getNextPage(Document doc) throws IOException { | |||
// Find next page | |||
String nextUrl = ""; | |||
Element elem = doc.select("li.next > a").first(); | |||
logger.info(elem); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why remove the logger
line?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume I did it because I was trying to debug and it was clogging up the log/output
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough
Feature/log4j2
Category
This change is exactly one of the following (please change
[ ]
to[x]
) to indicate which:Description
This allows users to rip eroshare (Now down and never coming back) albums using a mirror. Some content will 404 because it wasn't archived before eroshare went down
Testing
Required verification:
mvn test
(there are no new failures or errors).Optional but recommended: