Undetected Chromedriver in Python
Undetected chromedriver is a powerful tool that allows you to run Selenium web automation scripts undetected. It avoids detection by websites that seek to block scrapers and bots. For Python programmers, undetected chromedriver offers stealthy web automation capabilities.
Overview
Undetected chromedriver leverages Chromium’s debugging protocol to bypass protections employed by websites. It spawns an actual Chrome browser instance in headless mode and attaches to it via the debugger. This makes your script appear like a real user browsing the site.
Some key advantages of using undetected chromedriver in Python:
- Avoids bot detection mechanisms used by websites
- Supports modern Chrome capabilities like headless mode, native events, etc.
- Works seamlessly with existing Selenium scripts with minimal changes
- Open source and actively maintained on GitHub
Below we’ll explore how to install and use undetected chromedriver for stealth web scraping and automation in Python scripts.
Installing
To install undetected chromedriver:
pip install undetected-chromedriver
This will fetch the latest version from PyPI.
In addition to using the web scraper directly from GitHub, you can also download the full codebase locally.
Cloning the repository allows you to work with your own editable version. This provides flexibility to understand the scraper architecture, review the logic, tweak components to your needs, extend functionality and integrate natively into local systems.
To get started:
- Clone the GitHub repository to copy the files to your local machine
- Navigate your terminal into the project directory
- Install required packages from the requirements.txt file
- Execute scripts and customize as needed
Whether tweaking existing scrapers or developing new ones, maintaining a local codebase helps manage changes. It also facilitates integrating scrapers within larger data infrastructure for automated harvesting and downstream processing.
Using Undetected chromedriver in Python
The usage is simple if you’re already using Selenium. Just replace webdriver.Chrome()
with undetected_chromedriver.Chrome()
and you’re good to go.
For example:
from undetected_chromedriver import Chrome
from selenium.webdriver.common.by import By
driver = Chrome()
driver.get("https://website.com")
login_button = driver.find_element(By.ID, "login")
login_button.click()
username = driver.find_element(By.NAME, "username")
password = driver.find_element(By.NAME, "password")
username.send_keys("john")
password.send_keys("password123")
driver.find_element(By.TAG_NAME, "button").click()
This will open the website using an undetectable Chrome instance, click login, enter credentials and submit the form.
The website will not be able to determine that it’s an automated script versus a real user.
Customization Options
You can pass additional options to customize the Chrome instance:
driver = Chrome(headless=True, options=options)
Configuring a Mobile Browser Environment
When scraping responsive sites, we can tailor requests to mimic mobile devices for the associated user experience. This allows gathering data as the average smartphone visitor sees it.
We initialize a mobile browser environment by setting device dimensions, screen pixel density, and the Android Chrome user agent string. The user agent signals sites to return their mobile-optimized pages as if accessed on a Nexus 5 phone.
Now when browsing through Selenium, our script will retrieve the touch-friendly, stripped down content designed for smaller screens rather than the full desktop site. Configuring a mobile browser grants access to UI/UX details, pricing nuances, messaging variance and other differences in the mobile experience.
With mobile usage share rising globally, structuring scrapers to parse both desktop and mobile sites becomes increasingly important for a complete competitive view. Setting up a basic mobile emulator environment marks an initial step towards robust, multi-perspective data harvesting.
Dealing with Updates
Websites frequently update their anti-bot measures. To keep undetected chromedriver working, it’s recommended to update Selenium and ChromeDriver periodically.
You can configure pip to update dependencies automatically:
pip install undetected-chromedriver --upgrade
Or during install:
pip install -U undetected-chromedriver
The GitHub repo is updated frequently with fixes and new evasion features. Watch releases to stay updated.
Conclusion
Undetected chromedriver is a powerful tool for Python developers to create stealthy web scrapers and automate browsers evading detection. With minimal code changes, it enables driving Chrome in a nearly undetectable fashion.
Compared to proxies, VPNs and browser automation frameworks, it provides robust support for modern Chrome features. And it’s open source allowing you to contribute fixes for new anti-bot measures deployed by websites.
Professional data parsing via ZennoPoster, Python, creating browser and keyboard automation scripts. SEO-promotion and website creation: from a business card site to a full-fledged portal.