Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow to load user profile in non-headless mode #90

Open
cderv opened this issue Mar 21, 2020 · 8 comments · May be fixed by #91
Open

Allow to load user profile in non-headless mode #90

cderv opened this issue Mar 21, 2020 · 8 comments · May be fixed by #91

Comments

@cderv
Copy link
Collaborator

cderv commented Mar 21, 2020

This is from a question by @yonicd about crrri being able to load a session with user credential and extension.

From a little search, I think this is possible in non-headless mode, and maybe in headless mode for credentials (but not sure).

What I tried is just using the User Profile Directory of my chrome browser. With chrome you can to that with --user-data-dir. In crrri, I think for security reason, we just create a new work dir per session, and remove when closing.

We could maybe offer an option for user to opt-in in a persisent user Profile. The use case I see:

  • Either access the default one, usually the one used by chrome in general use
  • Either create a specific one, using chrome browser to prepare the profile, and then be able to use is in headless mode.

What we must take care of:

  • Do not delete default user dir. Maybe do not delete if directory is set by user
  • Warn that it could be not safe, and maybe don't allow to use Default one in headless mode ?

I don't know if it will be enough for your usage @yonicd but it seems normal that crrri allow that.

Test using internal non exported functions
library(crrri)

# Launch chrome with my user profile
# When browser opens I can see my extensions 
chrome <- crrri:::chr_launch(
  bin = "C:/Program Files (x86)/Google/Chrome/Application/chrome.exe",
  debug_port = 9222L,
  extra_args = NULL,
  headless = FALSE,
  work_dir = "C:/Users/chris/AppData/Local/Google/Chrome/User Data"
)

session <- CDPRemote$new(
  host = "localhost",
  debug_port = 9222L,
  secure = FALSE,
  local = FALSE,
  retry_delay = 0.2,
  max_attempts = 15L
)

client <- session$connect(callback = ~ .x$inspect())

Page <- client$Page
# I connected to the community before in this profile
# so i should be already connected in inspector mode too.
Page$navigate(url = "http://community.rstudio.com/")

# You can check that credentials are available in
#  chrome://settings/passwords
# in the open browser
@yonicd
Copy link

yonicd commented Mar 21, 2020

That should be great for me. Thanks!

I am trying to use a service called paperpile that forces you to be logged in with chrome user and have their extension installed to use their site...

I am trying to drive their website in order to set up a chron activity to store an updated file every night.

@yonicd
Copy link

yonicd commented Mar 21, 2020

hmmm

this is close with RSelenium, i can get the extension to load, but my --user-data-dir is being ignored

c_opts <- list(
  args  = c("--disable-gpu",
            "--window-size=1280,800",
            "--user-data-dir=~/Library/Application Support/Google/Chrome",
            "--load-extension=~/Library/Application Support/Google/Chrome/Default/Extensions/bomfdkbfpdhijjbeoicnfhjbdhncfhig/1.5.137_0"),
  prefs = list(
    "profile.default_content_settings.popups" = 0L,
    "download.prompt_for_download" = FALSE,
    "download.directory_upgrade" = TRUE,
    "safebrowsing.enabled" = TRUE,
    "download.default_directory" = tempdir()
  )
)

rD <- RSelenium::rsDriver(
  browser = "chrome",
  verbose = TRUE,
  port = 1324L,
  check = TRUE,
  extraCapabilities = list(
    chromeOptions = c_opts
  ),
)

@cderv
Copy link
Collaborator Author

cderv commented Mar 22, 2020

I never tried with RSelenium. However, it seems with my current test script that the user data dir is not accessible in headless mode. 🤔 You could try in non-headless mode to see if it differs.

It also possible that loading the Default profile is not allowed in headless mode. By security.
One solution could be to create a new profile for your usage and use this chrome custom profile where you would have installed extension and all necessary for your usage.
Do you see what I mean ?

Some references to research on this topic

@cderv
Copy link
Collaborator Author

cderv commented Mar 22, 2020

However, it seems with my current test script that the user data dir is not accessible in headless mode.

I can confirm that I don't get the same behavior in headless and headful mode. not cool... 😞

@cderv
Copy link
Collaborator Author

cderv commented Mar 22, 2020

I don't succeed to persist cookie between headful where I create it and headless when I try to access it.
Using the same data dir, it works fine when creating a cookie in headful, closing everything, and reading again in headful.

@yonicd I think we would need to be sure that what you want to do is ok with chrome headless first, to see how to implement it in crrri. There may be something I am missing here 🤔

@yonicd
Copy link

yonicd commented Mar 22, 2020

headless mode is funky. there is also a weird bug that wont let you set the download.default_directory.

I got the RSelenium version to work with my user profile with the same setting i had in the comment above (/shrug)

this is the RSelenium solution to my problem... (stupid site)

c_opts <- list(
  args  = c(
    "--disable-gpu",
    "--window-size=1280,800",
    "--user-data-dir=~/Library/Application Support/Google/Chrome",
    "--load-extension=~/Library/Application Support/Google/Chrome/Default/Extensions/bomfdkbfpdhijjbeoicnfhjbdhncfhig/1.5.137_0"),
  prefs = list(
    "profile.default_content_settings.popups" = 0L,
    "profile.content_settings.exceptions.clipboard" = 1L,
    "download.prompt_for_download" = FALSE,
    "download.directory_upgrade" = TRUE,
    "safebrowsing.enabled" = TRUE,
    "download.default_directory" = tempdir()
  )
)

rD <- RSelenium::rsDriver(
  browser = "chrome",
  verbose = FALSE,
  port = 1324L,
  extraCapabilities = list(
    chromeOptions = c_opts
  ),
  check = FALSE
)

# navigate
  rD$client$navigate('https://paperpile.com/app/shared/YEeG5y')

#select all
  
  rD$client$executeScript('document.querySelector("#selectionButton-1020-btnIconEl").click();')
  
#copy bib

  el <- rD$client$findElement(using = 'css','body')
  el$sendKeysToElement(sendKeys = list(RSelenium::selKeys$command_meta,'b'))
  
# wait for it to copy  
  wait <- TRUE
  i <- 0
  while(wait){
    wait <- rD$client$executeScript('return document.querySelector(".pp-status-text").innerText')[[1]]!="Copying Bibtex citations"
    Sys.sleep(3 + i/2)
    i <- i + 1
  }

  expectation <- as.numeric(gsub('[^0-9]','',rD$client$executeScript('return document.querySelector(".pp-status-text").innerText')[[1]])) - 1
  
# write clipboard to local file

  tf <- tempfile(fileext = '.bib')
  cat(clipr::read_clip(),file = tf,sep = '\n')
  
# vallidate bib  
  pp <- paperpile::parse_bib(path = tf)
  
  if(length(pp)!=expectation)
    message('number of citations mismatch')

rD$client$closeall()
rD$server$stop()

@yonicd
Copy link

yonicd commented Mar 22, 2020

i see why it is working... chrome is copying into my pwd a dir called ~ with my chrome profile.... that is a weird action

@cderv cderv linked a pull request Mar 22, 2020 that will close this issue
@cderv
Copy link
Collaborator Author

cderv commented Mar 22, 2020

Just a note about extensions: it can't be used in headless mode !
See https://github.com/puppeteer/puppeteer/blob/master/docs/api.md#working-with-chrome-extensions

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants