Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Future is no longer Send #206

Closed
JanzenJohn opened this issue Sep 9, 2024 · 5 comments
Closed

Future is no longer Send #206

JanzenJohn opened this issue Sep 9, 2024 · 5 comments
Labels
C-help-wanted Category: help wanted

Comments

@JanzenJohn
Copy link

JanzenJohn commented Sep 9, 2024

the issue in #181 is no longer fixable with the feature "atomic"

error: future cannot be sent between threads safely
   --> src/bin/worker.rs:36:22
    |
36  |           handles.push(tokio::spawn(async move {
    |  ______________________^
37  | |             meta_worker(args, redis, proxy).await;
38  | |         }));
    | |__________^ future created by async block is not `Send`
    |
    = help: within `ego_tree::Node<Node>`, the trait `Sync` is not implemented for `*mut tendril::fmt::UTF8`, which is required by `{async block@src/bin/worker.rs:36:35: 38:10}: Send`

using scraper 0.19 / 0.20 has the same result

@LoZack19
Copy link
Contributor

This is not a problem, since Node implements Send. Therefore you can wrap it in a Mutex to make it Sync

@JanzenJohn
Copy link
Author

use tokio::spawn;

fn main() {
    spawn(worker());
}

async fn worker() {
    let html = reqwest::get("https://www.rust-lang.org").await.unwrap();
    let html = html.text().await.unwrap();
    let document = scraper::Html::parse_document(&html);
    let selector = scraper::Selector::parse("a").unwrap();
    for element in document.select(&selector) {
        let html = reqwest::get("https://www.rust-lang.org").await.unwrap();
        let html = html.text().await.unwrap();
        let document = scraper::Html::parse_document(&html);
    }
}

this is the code to recreate this, is your comment and advice for the user of the lib or for the fix?

@adamreichold
Copy link
Member

The problem is that you are holding on to document over await points when fetching other documents in the for loop. What you need to do this select/collect/extract your links into a data structure that is independent of document and only then fetch those other resources, e.g.

async fn worker() {
    let html = reqwest::get("https://www.rust-lang.org").await.unwrap();
    let html = html.text().await.unwrap();
    let links = {
      let document = scraper::Html::parse_document(&html);
      let selector = scraper::Selector::parse("a").unwrap();
      document.select(&selector).map(|element| element.attr("href").unwrap().to_owned()).collect::<Vec<_>>()
    };
    for href in links {
        let html = reqwest::get(href).await.unwrap();
        let html = html.text().await.unwrap();
        let document = scraper::Html::parse_document(&html);
    }
}

@JanzenJohn
Copy link
Author

yes i recognized that, but i was under the impression that the atomic feature would just allow me to use it over async calls

@adamreichold
Copy link
Member

Well, it allows you to keep it alive, thereby possibly sending it to other threads (after resuming after a suspension point). But you are essentially holding a &Html alive via the selector evaluation, and for that to work Html would need to be Sync not just Send which that feature enables.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-help-wanted Category: help wanted
Projects
None yet
Development

No branches or pull requests

4 participants