rewrote scrapers for new ui #3

mueslimak3r · 2024-06-02T23:18:18Z

Opening this as a draft because it hasn't been thoroughly tested, and hasn't been formatted to respect the "debug" flag.
Also, because I scrape the series-part list from a specific series' page, the series parts/stories don't have stats such as rating. Only one-shot stories get those stats

This was easier than handling clicking the "View Full 128 Part Series" button on the author works page before parsing the list from there. The list on the author's page is what has the stats in each story's card.

In its current state this is adequate for me, and I'll leave this as-is.
If anyone wants to finish what I've started, I'll keep an eye out and update this PR as needed.

Closes #2

domaniko · 2024-08-13T18:23:36Z

Thank you for the PR.

It works for quite some texts, but I found some issues with others.

These small changes improved it a lot for me:

@@ -390,7 +390,7 @@ def parse_series_page(page_url, author):
 
 def parse_author_works_page(html):
     soup = bs4.BeautifulSoup(html, 'html.parser')
-    author_element = soup.find('h1', class_='headline__title')
+    author_element = soup.find('title')
     if not author_element:
         error("Cannot determine author on member page.")
     if "Stories by " in author_element.text.strip():

and

@@ -478,16 +478,16 @@ def get_story_text(st):
     #[0].select("div[class^=_item_title]")[0]['href']
 
     #vals = re.findall('<option value=".*?">(\d+)</option>', sel_match.group(1))
-    if not paginator_elements: # just one page
-        error("Couldn't find paginator elements.")
     complete_text = ""
 
     end = 1
-    for pe in paginator_elements:
-        if pe.text.strip() == '' or not pe.text.strip().isnumeric():
-            continue
-        if int(pe.text.strip()) > end:
-            end = int(pe.text.strip())
+    if paginator_parent_element:
+        for pe in paginator_elements:
+            if pe.text.strip() == '' or not pe.text.strip().isnumeric():
+                continue
+            if int(pe.text.strip()) > end:
+                end = int(pe.text.strip())

The generated EPUBs now do not have an additional line break between paragraphs wich makes reading in some EBook Readers a little bit more awkward.

mueslimak3r · 2024-10-10T04:55:51Z

@domaniko can you open a PR to merge your changes into my branch so they can be added to this PR with proper attribution?

I'm also happy to just make your changes on my end and push them.

domaniko · 2024-10-10T16:33:14Z

Done so mueslimak3r#1

domaniko · 2024-10-19T08:19:32Z

@mueslimak3r Could you please consider mueslimak3r#2

Issue with authors who only published single stories

mueslimak3r added 2 commits June 2, 2024 16:13

rewrote scrapers using bs4 for new ui

b2f89ba

fixed title parsing for series part

13b0f77

mueslimak3r marked this pull request as ready for review July 15, 2024 02:29

Additional fix needed for new UI

31df3c7

mueslimak3r and others added 2 commits October 10, 2024 15:56

Merge pull request #1 from domaniko/new-ui-fix

6e5182d

Issue with authors who only published single stories

24bbf71

mueslimak3r added 5 commits January 20, 2025 23:00

Merge pull request #2 from domaniko/new-ui-fix

ac26d6a

Issue with authors who only published single stories

support series url instead of needing url of first story in series

a4d2b58

Merge branch 'new-ui-fix'

333ae81

Add license that matches upstream

1533ba1

Update README.md

e9bcf3c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rewrote scrapers for new ui #3

rewrote scrapers for new ui #3

mueslimak3r commented Jun 2, 2024 •

edited

Loading

domaniko commented Aug 13, 2024

mueslimak3r commented Oct 10, 2024

domaniko commented Oct 10, 2024

domaniko commented Oct 19, 2024

rewrote scrapers for new ui #3

Are you sure you want to change the base?

rewrote scrapers for new ui #3

Conversation

mueslimak3r commented Jun 2, 2024 • edited Loading

domaniko commented Aug 13, 2024

mueslimak3r commented Oct 10, 2024

domaniko commented Oct 10, 2024

domaniko commented Oct 19, 2024

mueslimak3r commented Jun 2, 2024 •

edited

Loading