Bypass CAPTCHA With Selenium and Nodejs

How to Bypass CAPTCHA With Selenium and Node.js

In this article, I will explain how to bypass CAPTCHA using Selenium and Node.js. We’ll look at different approaches and modern tools that make it possible, while keeping things simple and practical for any developer looking to overcome these challenges.

What is CAPTCHA?

CAPTCHA stands for “Completely Automated Public Turing test to tell Computers and Humans Apart.” It creates challenges like distorted text, puzzles, and reCAPTCHA to ensure that only humans can solve them easily. These tasks, like picking out objects in images or entering scrambled text, are simple for people but challenging for bots. CAPTCHA prevents automated systems from accessing websites or services, protecting them from bot traffic.

Selenium and Node.js

Selenium is a popular browser automation tool that allows users to automate browser activities. With Node.js, a fast and scalable JavaScript runtime, you can create powerful web automation scripts to interact with websites.

  • Selenium WebDriverA tool for automating browsers.
  • Node.js: Used for handling asynchronous tasks and integrating Selenium into web scraping workflows.

Step-by-Step Guide to Bypassing CAPTCHA

1. Installing Necessary Tools

To begin, you will need Node.js installed, along with Selenium and a browser driver like ChromeDriver.

npm install selenium-webdriver
npm install chromedriver

Also, install undetected-chromedriver, a package that helps bypass CAPTCHAs by making ChromeDriver undetectable.

npm install undetected-chromedriver

2. Setting up Selenium with Node.js

Once installed, create a simple Selenium script to open a website.

const { Builder, By } = require('selenium-webdriver');
const chrome = require('selenium-webdriver/chrome');
(async function example() {
let driver = await new Builder().forBrowser('chrome').build();
try {
await driver.get('https://example.com');
} finally {
await driver.quit();
}
})();

3. Detecting and Handling CAPTCHA

When a CAPTCHA challenge is detected, the simplest solutions often involve human interaction. But what if that’s not an option? In cases of reCAPTCHA v2, you can:

Use Third-Party CAPTCHA Solvers: Services like Bright Data, 2Captcha, or Anti-Captcha allow you to outsource CAPTCHA solving to a human. Read my list of the best CAPTCHA solvers to find the perfect service for your needs.

npm install 2captcha

Example of using 2Captcha:

const TwoCaptcha = require('2captcha');
const apiKey = 'YOUR_2CAPTCHA_API_KEY';
const captchaSolver = new TwoCaptcha(apiKey);
// Example reCAPTCHA handling
async function solveCaptcha(siteKey, pageUrl) {
return new Promise((resolve, reject) => {
captchaSolver.solveRecaptchaV2({
sitekey: siteKey,
url: pageUrl
}, (err, result) => {
if (err) reject(err);
resolve(result);
});
});
}

4. Using Undetected ChromeDriver

Some CAPTCHAs, like Google’s reCAPTCHA v3, analyze browser behavior, detecting bots based on browser fingerprinting. By using undetected-chromedriver, you can reduce the chance of being detected as a bot.

Example:

const undetectedChromedriver = require('undetected-chromedriver');
(async function example() {
const driver = await new undetectedChromedriver.Builder()
.forBrowser('chrome')
.build();
try {
await driver.get('https://example.com');
} finally {
await driver.quit();
}
})();

This tool prevents your automation from being flagged as a bot by websites that employ fingerprinting techniques.

5. Handling Advanced CAPTCHA Challenges

Image CAPTCHAs

For image-based CAPTCHAs, you might need to extract the image and use image recognition or CAPTCHA solving services like Anti-Captcha. A more advanced option would be integrating machine learning techniques like convolutional neural networks (CNNs) to solve these image-based CAPTCHAs.

Bypassing reCAPTCHA v3

reCAPTCHA v3 operates in the background by monitoring user activity and providing a score based on interaction patterns. With v3, bypassing it becomes harder since you don’t need to solve anything explicitly, but you can trick the scoring algorithm by mimicking human-like browser interaction patterns. Techniques include:

  • Randomly moving the mouse pointer
  • Introducing small delays between actions
  • Using real browser profiles
async function humanLikeBehavior(driver) {
// Move the mouse randomly
await driver.actions().move({ origin: 'element', x: Math.random() * 100, y: Math.random() * 100 }).perform();
// Introduce random pauses
await driver.sleep(Math.random() * 2000 + 1000);
}

6. IP Blocking and Browser Fingerprinting

Many CAPTCHA systems track the IP addresses of users attempting to solve them. You can bypass this by rotating IP addresses via proxies:

npm install proxy-agent

Example:

const ProxyAgent = require('proxy-agent');
(async function example() {
const options = new chrome.Options().addArguments(' - proxy-server=PROXY_IP:PORT');
let driver = await new Builder().forBrowser('chrome').setChromeOptions(options).build();
try {
await driver.get('https://example.com');
} finally {
await driver.quit();
}
})();

Ethics and Legal Considerations

While it’s possible to bypass CAPTCHAs for testing or automation, it’s crucial to understand the legal and ethical implications of doing so. Many websites explicitly prohibit automated access in their terms of service. Use automation responsibly, especially if working in sensitive domains.

Conclusion

Bypassing CAPTCHA is a complex and evolving challenge, but you can overcome many obstacles by combining tools like Selenium, Node.js, undetected ChromeDriver, and CAPTCHA-solving services. Always stay within legal and ethical boundaries when applying these techniques, ensuring your automation projects remain compliant.

Similar Posts