Logging in for private accounts #64
-
Hi Chris - This seems to be a basic question, but I can't find documentation anywhere. I'm trying to pull post data from private accounts. Is there a way to login with instascrape before I initiate post.scrape()? Thanks for the great tool! Raffi |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
Hey Raffi! Thanks for reaching out!!! Have you had success scraping public accounts or is it not working across the board? Unfortunately as of right now, private account scraping is not directly supported but if you check back in a couple weeks, the situation will be greatly improved. I'm working on a new major version slotted for release sometime in the next few weeks that is going to officially introduce direct support for For the time being, here is a super hacky approach but something like this might work for you: import time
from selenium.webdriver import Chrome
from instascrape.scrapers.json_tools import json_from_html, parse_json_from_mapping
from instascrape.core._json_flattener import FlatJSONDict
from instascrape.core._mappings import _PostMapping
def instantiate_webdriver(executable_path) -> Chrome:
"""Return an instance of the Chrome webdriver"""
return Chrome(executable_path)
def manually_login(browser: Chrome) -> None:
"""Sleep for 30 seconds to allow manual login to Instagram"""
browser.get("https://www.instagram.com")
time.sleep(30)
def _get_html_from_post(browser: Chrome, url: str) -> str:
"""Return HTML from a given URL"""
browser.get(url)
return browser.page_source
def _get_post_json(post_html: str) -> dict:
"""Return JSON parsed from the HTML that contains the posts data"""
dummy_json = json_from_html(post_html, as_dict=False)
post_html = post_html.replace(dummy_json, "")
data_json = json_from_html(post_html)
return data_json
def _scrape_data(post_json: dict) -> dict:
"""Return a dict containing the scraped data"""
flat_json_dict = FlatJSONDict(post_json)
post_mapping = _PostMapping.return_mapping()
return parse_json_from_mapping(flat_json_dict, post_mapping)
def scrape_url(browser: Chrome, url: str) -> dict:
"""Return data scraped from a single URL"""
post_html = _get_html_from_post(browser, URL)
post_json = _get_post_json(post_html)
data_dict =_scrape_data(post_json)
return data_dict
if __name__ == "__main__":
EXEC_PATH="/path/to/chromedriver.exe"
URL="https://www.instagram.com/p/CI0HNNzsp5l/"
# Instantiate browser and manually login
browser = instantiate_webdriver(executable_path=EXEC_PATH)
manually_login(browser)
# Scrape a single URL
data_dict = scrape_url(browser, URL) Like I said, super hacky approach and the support will be much cleaner in a coming update. Cheers, |
Beta Was this translation helpful? Give feedback.
Hey Raffi!
Thanks for reaching out!!! Have you had success scraping public accounts or is it not working across the board?
Unfortunately as of right now, private account scraping is not directly supported but if you check back in a couple weeks, the situation will be greatly improved. I'm working on a new major version slotted for release sometime in the next few weeks that is going to officially introduce direct support for
selenium
. This will also include much better handling of JavaScript rendered content that isn't currently handled very well by the lib. Once this is released, doing something like scraping a private account will be much more straightforward!For the time being, here i…