Eroshareripper now uses mirror for data #29

cyian-1756 · 2017-07-30T01:52:51Z

Description

This allows users to rip eroshare (Now down and never coming back) albums using a mirror. Some content will 404 because it wasn't archived before eroshare went down

Testing

Required verification:

I've verified that there are no regressions in mvn test (there are no new failures or errors).
I've verified that this change works as intended.
- Downloads all relevant content.
- Downloads content from multiple pages (as necessary or appropriate).
- Saves content at reasonable file names (e.g. page titles or content IDs) to help easily browse downloaded content.
I've verified that this change did not break existing functionality (especially in the Ripper I modified).

Optional but recommended:

I've added a unit test to cover my change.

metaprime

Looks good, other than a few comments.

Also, rather than relying on a rehost, we could actually set up the redirects to internet archive / wayback machine on our own. There's basically two things to look for: the site pages (which are backed up in wayback machine) and the content, which is also hosted on the internet archive, and we can pull the link format directly out of a rehost like eroshae and extrapolate the redirect.

This will work for now and unblock ripping of eroshare links so that's awesome!

metaprime · 2017-08-10T08:10:38Z

src/main/java/com/rarchives/ripme/ripper/rippers/EroShareRipper.java

@@ -51,22 +51,34 @@ public void downloadURL(URL url, int index) {
    }
    @Override
    public boolean canRip(URL url) {
-        Pattern p = Pattern.compile("^https?://[w.]*eroshare.com/([a-zA-Z0-9\\-_]+)/?$");
+        Pattern p = Pattern.compile("^https?://eroshae.com/([a-zA-Z0-9\\-_]+)/?$");


Should probably stick to allowing optional www (which is what the [w.]*was accomplishing although(www[.])?` documents the intent better and would likely be more efficient under the regex engine.

metaprime · 2017-08-10T08:11:32Z

src/main/java/com/rarchives/ripme/ripper/rippers/EroShareRipper.java

@@ -79,12 +91,14 @@ public Document getNextPage(Document doc) throws IOException {
        // Find next page
        String nextUrl = "";
        Element elem = doc.select("li.next > a").first();
-        logger.info(elem);


Why remove the logger line?

I assume I did it because I was trying to debug and it was clogging up the log/output

Fair enough

Feature/log4j2

cyian-1756 added 3 commits July 29, 2017 21:07

Eroshareripper now uses mirror for data

fb6e23e

changed regex to include eroshare.com

fdf82f5

Added eroshare mirror eroshae

cdbdc99

metaprime reviewed Aug 10, 2017

View reviewed changes

This was referenced Aug 10, 2017

[Request] Support for erome.com #36

Closed

eroshare: direct links not working 4pr0n/ripme#526

Open

TODO: Clean up EroShareRipper #39

Open

metaprime merged commit e07d60a into RipMeApp:master Aug 10, 2017

lbalmaceda pushed a commit to lbalmaceda/ripme that referenced this pull request Oct 9, 2022

Merge pull request RipMeApp#29 from ripmeapp2/feature/log4j2

76e2570

Feature/log4j2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Eroshareripper now uses mirror for data #29

Eroshareripper now uses mirror for data #29

cyian-1756 commented Jul 30, 2017

metaprime left a comment

metaprime Aug 10, 2017

metaprime Aug 10, 2017

cyian-1756 Aug 10, 2017

metaprime Aug 11, 2017

Eroshareripper now uses mirror for data #29

Eroshareripper now uses mirror for data #29

Conversation

cyian-1756 commented Jul 30, 2017

Category

Description

Testing

metaprime left a comment

Choose a reason for hiding this comment

metaprime Aug 10, 2017

Choose a reason for hiding this comment

metaprime Aug 10, 2017

Choose a reason for hiding this comment

cyian-1756 Aug 10, 2017

Choose a reason for hiding this comment

metaprime Aug 11, 2017

Choose a reason for hiding this comment