How to Use Proxies With SeleniumBase

How to Use Proxies With SeleniumBase in Python

In this guide, I’ll explain how to set up proxies with SeleniumBase in Python. With this method, your web scraping becomes more efficient and reliable, allowing you to bypass common obstacles and gather the data you need smoothly.

What is SeleniumBase?

SeleniumBase is a framework built on top of Selenium WebDriver, offering enhanced web automation features. It simplifies browser automation, testing, and web scraping by providing tools for managing browser interactions. It’s beneficial for creating end-to-end test suites and performing web scraping tasks that involve complex browser interactions, such as handling pop-ups, form submissions, and dynamic content loading.

Why Use Proxies in Web Scraping?

Websites often set limits on the number of requests a single IP can make in a short period. The website may block the IP address if your script sends too many requests. Using proxies can help you avoid this by:

  1. IP Rotation: Changing the IP address periodically to make requests appear from different sources.
  2. Geo-targeting: Accessing content restricted to certain regions by using proxies from specific locations.
  3. Anonymity: Masking your IP address to avoid being flagged as a bot.

Best Proxy Services to Use With SeleniumBase

Knowing how to set up proxies in SeleniumBase isn’t enough. I want to introduce some of the most reliable proxy services here, with a focus on residential proxies. Residential proxies let you manage rotation easily and are the safest IP type for web scraping.

  • Bright Data — Largest provider, precise targeting, Proxy Manager tool, starting at $5.04/GB
  • Oxylabs — Extensive network, precise targeting, dedicated support, starting at $4/GB
  • Smartproxy — Large pool, broad locations, self-service, starting at $2.2/GB
  • Webshare — Customization options, self-service, affordable, starting at $4.5/GB
  • SOAX — Flexible rotation, precise targeting, 24/7 support, starting at $2.2/GB

Read in more detail here. I am not affiliated with any of the providers 🙂

Setting Up SeleniumBase

Before using SeleniumBase, ensure you have Python installed on your machine. Follow these steps to set up SeleniumBase:

Install SeleniumBase via pip:

pip install seleniumbase

Install a web browser driver.

For Chrome, you can install the ChromeDriver:

seleniumbase install chromedriver

Verify the installation:

Run a simple command to check if SeleniumBase is installed correctly.

sbase verify

Using Proxies in SeleniumBase

To configure SeleniumBase proxies, you must pass proxy details to the browser session. SeleniumBase allows setting proxies by modifying browser options. Below are steps to configure proxies in different ways:

Setting Up a Proxy for a Single Request

You can configure a proxy for a one-time request by specifying the proxy server when setting up the browser. Here’s a basic example:

from seleniumbase import BaseCase
class ProxyTest(BaseCase):
def test_with_proxy(self):
# Specify the proxy server
proxy = "http://your_proxy_ip:port"
# Add proxy options
chrome_options = {
"proxy": {
"httpProxy": proxy,
"sslProxy": proxy,
"proxyType": "MANUAL",
}
}
# Open the browser with proxy settings
self.open("http://example.com", chrome_options=chrome_options)
# Verify the page title or other actions
self.assert_title("Example Domain")

Replace “your_proxy_ip:port” with your proxy server’s IP address and port.

Rotating Proxies

You may need to rotate proxies for larger scraping projects to avoid detection. Here’s how you can use a list of proxies and switch between them for each request:

import random
from seleniumbase import BaseCase
class RotatingProxyTest(BaseCase):
def test_with_rotating_proxies(self):
# List of proxies
proxies = [
"http://proxy1_ip:port",
"http://proxy2_ip:port",
"http://proxy3_ip:port"
]
# Choose a random proxy from the list
proxy = random.choice(proxies)
# Add proxy options
chrome_options = {
"proxy": {
"httpProxy": proxy,
"sslProxy": proxy,
"proxyType": "MANUAL",
}
}
# Open the browser with proxy settings
self.open("http://example.com", chrome_options=chrome_options)
# Perform actions as needed
self.assert_title("Example Domain")

With this approach, the script will select a different proxy from the list for each request, helping minimize the detection risk.

Using Authentication-Protected Proxies

If your proxy requires authentication, you must set up the credentials before using the proxy. Here’s how you can do this:

from selenium.webdriver import ChromeOptions
from seleniumbase import BaseCase
class AuthProxyTest(BaseCase):
def test_with_auth_proxy(self):
# Set proxy and credentials
proxy = "proxy_ip:port"
username = "your_username"
password = "your_password"
# Setup Chrome options
chrome_options = ChromeOptions()
chrome_options.add_argument(f' - proxy-server={proxy}')
# Bypass proxy authentication using a Chrome extension
chrome_options.add_extension('proxy_auth_plugin.zip')
# Open browser with options
self.driver = self.get_new_driver(chrome_options=chrome_options)
self.open("http://example.com")
# Check some condition
self.assert_title("Example Domain")

The above code uses a Chrome extension to handle proxy authentication, which requires creating an extension file (proxy_auth_plugin.zip) with credentials.

Challenges and Best Practices for Using Proxies with SeleniumBase

Using proxies is not without its challenges. Below are some tips and strategies for getting the most out of proxies in web scraping:

  1. Avoid Free Proxies: Free proxies are often unreliable, slow, and may already be flagged by websites. Invest in reputable paid proxy services for better results.
  2. Monitor Proxy Health: Regularly check the status of your proxies to ensure they are working as expected.
  3. Randomize Request Timing: Add random delays between requests instead of sending them all at once to reduce the risk of being blocked.
  4. Use Headless Mode Wisely: While using headless mode may speed up scraping, it can also increase the likelihood of being flagged as a bot. Use it carefully and combine it with other techniques like user-agent spoofing.
  5. Handle CAPTCHA Challenges: If a website presents CAPTCHA challenges, you may need to integrate a CAPTCHA-solving service.

Additional Techniques for Scraping with Proxies

Combining proxy use with other techniques can make your scraping efforts more efficient. Here are a few advanced strategies:

  • User-Agent Spoofing: Randomize the user-agent string to simulate requests from different browsers or devices.
  • Session Management: Manage cookies and sessions effectively to avoid triggering anti-bot mechanisms.
  • Headless Browser Settings: Use headless browsers to scrape without displaying a graphical interface, but modify browser fingerprint settings to avoid detection.

Conclusion

Proxies are powerful tools for overcoming anti-scraping measures and gathering data efficiently. With SeleniumBase, configuring proxies is straightforward, whether you need a single proxy for testing or a rotation system for large-scale scraping. Remember to follow best practices and continuously monitor the performance of your proxies. Combining proxies with techniques like user-agent spoofing and session management can minimize roadblocks and collect the data you need.

Similar Posts