0 %
!
Programmer
SEO-optimizer
English
German
Russian
HTML
CSS
WordPress
Python
C#
  • Bootstrap, Materialize
  • GIT knowledge

Undetected Chromedriver in Python

14.11.2023

Undetected chromedriver is a powerful tool that allows you to run Selenium web automation scripts undetected. It avoids detection by websites that seek to block scrapers and bots. For Python programmers, undetected chromedriver offers stealthy web automation capabilities.

Overview

Undetected chromedriver leverages Chromium’s debugging protocol to bypass protections employed by websites. It spawns an actual Chrome browser instance in headless mode and attaches to it via the debugger. This makes your script appear like a real user browsing the site.

Some key advantages of using undetected chromedriver in Python:

  • Avoids bot detection mechanisms used by websites
  • Supports modern Chrome capabilities like headless mode, native events, etc.
  • Works seamlessly with existing Selenium scripts with minimal changes
  • Open source and actively maintained on GitHub

Below we’ll explore how to install and use undetected chromedriver for stealth web scraping and automation in Python scripts.

Installing

To install undetected chromedriver:

pip install undetected-chromedriver

This will fetch the latest version from PyPI.

In addition to using the web scraper directly from GitHub, you can also download the full codebase locally.

Cloning the repository allows you to work with your own editable version. This provides flexibility to understand the scraper architecture, review the logic, tweak components to your needs, extend functionality and integrate natively into local systems.

To get started:

  1. Clone the GitHub repository to copy the files to your local machine
  2. Navigate your terminal into the project directory
  3. Install required packages from the requirements.txt file
  4. Execute scripts and customize as needed

Whether tweaking existing scrapers or developing new ones, maintaining a local codebase helps manage changes. It also facilitates integrating scrapers within larger data infrastructure for automated harvesting and downstream processing.

Using Undetected chromedriver in Python

The usage is simple if you’re already using Selenium. Just replace webdriver.Chrome() with undetected_chromedriver.Chrome() and you’re good to go.

For example:

from undetected_chromedriver import Chrome
from selenium.webdriver.common.by import By

driver = Chrome()
driver.get("https://website.com")

login_button = driver.find_element(By.ID, "login")
login_button.click()

username = driver.find_element(By.NAME, "username")
password = driver.find_element(By.NAME, "password")

username.send_keys("john")
password.send_keys("password123")

driver.find_element(By.TAG_NAME, "button").click()

This will open the website using an undetectable Chrome instance, click login, enter credentials and submit the form.

The website will not be able to determine that it’s an automated script versus a real user.

Customization Options

You can pass additional options to customize the Chrome instance:

driver = Chrome(headless=True, options=options)

Configuring a Mobile Browser Environment

When scraping responsive sites, we can tailor requests to mimic mobile devices for the associated user experience. This allows gathering data as the average smartphone visitor sees it.

We initialize a mobile browser environment by setting device dimensions, screen pixel density, and the Android Chrome user agent string. The user agent signals sites to return their mobile-optimized pages as if accessed on a Nexus 5 phone.

Now when browsing through Selenium, our script will retrieve the touch-friendly, stripped down content designed for smaller screens rather than the full desktop site. Configuring a mobile browser grants access to UI/UX details, pricing nuances, messaging variance and other differences in the mobile experience.

With mobile usage share rising globally, structuring scrapers to parse both desktop and mobile sites becomes increasingly important for a complete competitive view. Setting up a basic mobile emulator environment marks an initial step towards robust, multi-perspective data harvesting.

Dealing with Updates

Websites frequently update their anti-bot measures. To keep undetected chromedriver working, it’s recommended to update Selenium and ChromeDriver periodically.

You can configure pip to update dependencies automatically:

pip install undetected-chromedriver --upgrade

Or during install:

pip install -U undetected-chromedriver

The GitHub repo is updated frequently with fixes and new evasion features. Watch releases to stay updated.

Conclusion

Undetected chromedriver is a powerful tool for Python developers to create stealthy web scrapers and automate browsers evading detection. With minimal code changes, it enables driving Chrome in a nearly undetectable fashion.

Compared to proxies, VPNs and browser automation frameworks, it provides robust support for modern Chrome features. And it’s open source allowing you to contribute fixes for new anti-bot measures deployed by websites.

Posted in PythonTags:
Write a comment
© 2024... All Rights Reserved.

You cannot copy content of this page