Logging in for private accounts #64

rmesrobi · 2021-01-05T01:57:58Z

rmesrobi
Jan 5, 2021

Hi Chris - This seems to be a basic question, but I can't find documentation anywhere. I'm trying to pull post data from private accounts. Is there a way to login with instascrape before I initiate post.scrape()?

Thanks for the great tool!

Raffi

Answered by chris-greening

Jan 5, 2021

Hey Raffi!

Thanks for reaching out!!! Have you had success scraping public accounts or is it not working across the board?

Unfortunately as of right now, private account scraping is not directly supported but if you check back in a couple weeks, the situation will be greatly improved. I'm working on a new major version slotted for release sometime in the next few weeks that is going to officially introduce direct support for selenium. This will also include much better handling of JavaScript rendered content that isn't currently handled very well by the lib. Once this is released, doing something like scraping a private account will be much more straightforward!

For the time being, here i…

View full answer

chris-greening · 2021-01-05T21:28:33Z

chris-greening
Jan 5, 2021
Maintainer

Hey Raffi!

Thanks for reaching out!!! Have you had success scraping public accounts or is it not working across the board?

Unfortunately as of right now, private account scraping is not directly supported but if you check back in a couple weeks, the situation will be greatly improved. I'm working on a new major version slotted for release sometime in the next few weeks that is going to officially introduce direct support for selenium. This will also include much better handling of JavaScript rendered content that isn't currently handled very well by the lib. Once this is released, doing something like scraping a private account will be much more straightforward!

For the time being, here is a super hacky approach but something like this might work for you:

import time

from selenium.webdriver import Chrome   

from instascrape.scrapers.json_tools import json_from_html, parse_json_from_mapping
from instascrape.core._json_flattener import FlatJSONDict
from instascrape.core._mappings import _PostMapping

def instantiate_webdriver(executable_path) -> Chrome:
    """Return an instance of the Chrome webdriver"""
    return Chrome(executable_path)

def manually_login(browser: Chrome) -> None:
    """Sleep for 30 seconds to allow manual login to Instagram"""
    browser.get("https://www.instagram.com")
    time.sleep(30)

def _get_html_from_post(browser: Chrome, url: str) -> str:
    """Return HTML from a given URL"""
    browser.get(url)
    return browser.page_source

def _get_post_json(post_html: str) -> dict:
    """Return JSON parsed from the HTML that contains the posts data"""
    dummy_json = json_from_html(post_html, as_dict=False)
    post_html = post_html.replace(dummy_json, "")
    data_json = json_from_html(post_html)
    return data_json

def _scrape_data(post_json: dict) -> dict:
    """Return a dict containing the scraped data"""
    flat_json_dict = FlatJSONDict(post_json)
    post_mapping = _PostMapping.return_mapping()
    return parse_json_from_mapping(flat_json_dict, post_mapping)

def scrape_url(browser: Chrome, url: str) -> dict:
    """Return data scraped from a single URL"""
    post_html = _get_html_from_post(browser, URL)
    post_json = _get_post_json(post_html)
    data_dict =_scrape_data(post_json)
    return data_dict

if __name__ == "__main__":
    EXEC_PATH="/path/to/chromedriver.exe"
    URL="https://www.instagram.com/p/CI0HNNzsp5l/"

    # Instantiate browser and manually login
    browser = instantiate_webdriver(executable_path=EXEC_PATH)
    manually_login(browser)

    # Scrape a single URL 
    data_dict = scrape_url(browser, URL)

Like I said, super hacky approach and the support will be much cleaner in a coming update.

Cheers,
Chris

1 reply

rmesrobi Jan 5, 2021
Author

It works great on public accounts. I'm doing a research project on my own network and was hoping to utilize your functions. I'll try out the hacky approach and look forward to the next update. Thanks for the prompt and helpful reply!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Logging in for private accounts #64

{{title}}

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

Logging in for private accounts #64

rmesrobi Jan 5, 2021

Replies: 1 comment · 1 reply

chris-greening Jan 5, 2021 Maintainer

rmesrobi Jan 5, 2021 Author

rmesrobi
Jan 5, 2021

Replies: 1 comment 1 reply

chris-greening
Jan 5, 2021
Maintainer

rmesrobi Jan 5, 2021
Author