Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Eroshareripper now uses mirror for data #29

Merged
merged 3 commits into from
Aug 10, 2017

Conversation

cyian-1756
Copy link
Collaborator

Category

This change is exactly one of the following (please change [ ] to [x]) to indicate which:

Description

This allows users to rip eroshare (Now down and never coming back) albums using a mirror. Some content will 404 because it wasn't archived before eroshare went down

Testing

Required verification:

  • I've verified that there are no regressions in mvn test (there are no new failures or errors).
  • I've verified that this change works as intended.
    • Downloads all relevant content.
    • Downloads content from multiple pages (as necessary or appropriate).
    • Saves content at reasonable file names (e.g. page titles or content IDs) to help easily browse downloaded content.
  • I've verified that this change did not break existing functionality (especially in the Ripper I modified).

Optional but recommended:

  • I've added a unit test to cover my change.

Copy link
Contributor

@metaprime metaprime left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, other than a few comments.

Also, rather than relying on a rehost, we could actually set up the redirects to internet archive / wayback machine on our own. There's basically two things to look for: the site pages (which are backed up in wayback machine) and the content, which is also hosted on the internet archive, and we can pull the link format directly out of a rehost like eroshae and extrapolate the redirect.

This will work for now and unblock ripping of eroshare links so that's awesome!

@@ -51,22 +51,34 @@ public void downloadURL(URL url, int index) {
}
@Override
public boolean canRip(URL url) {
Pattern p = Pattern.compile("^https?://[w.]*eroshare.com/([a-zA-Z0-9\\-_]+)/?$");
Pattern p = Pattern.compile("^https?://eroshae.com/([a-zA-Z0-9\\-_]+)/?$");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should probably stick to allowing optional www (which is what the [w.]*was accomplishing although(www[.])?` documents the intent better and would likely be more efficient under the regex engine.

@@ -79,12 +91,14 @@ public Document getNextPage(Document doc) throws IOException {
// Find next page
String nextUrl = "";
Element elem = doc.select("li.next > a").first();
logger.info(elem);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why remove the logger line?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume I did it because I was trying to debug and it was clogging up the log/output

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough

@metaprime metaprime merged commit e07d60a into RipMeApp:master Aug 10, 2017
lbalmaceda pushed a commit to lbalmaceda/ripme that referenced this pull request Oct 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

eroshare links should download from mirror
3 participants