Web Scraping With Watir Ruby

Web Scraping With Watir Ruby in 2025

In this article, I’ll walk you through setting up Watir in Ruby and show you how to extract data, handle pagination, scroll through pages, and capture screenshots. Let’s get started!

What Is Watir and Why Use It?

Watir (Web Application Testing in Ruby) is a Ruby library that helps automate web browsers for testing purposes. It interacts with browsers like Chrome, Firefox, and Edge to mimic human actions like clicking buttons, typing into forms, and navigating between pages.

Although Watir was initially designed for web application testing, it can also be used for web scraping. Watir can render JavaScript and interact with dynamic content, which makes it suitable for scraping websites that require human-like interactions, such as clicking or scrolling.

How to Set Up Watir in Ruby

Before we start scraping, you need to set up Watir in Ruby. Below are the steps to install the required tools and libraries.

Step 1: Install Ruby

First, ensure that Ruby is installed on your system. Ruby is a programming language that Watir is built on, and you need it to run your scraper. You can download Ruby from the official website here.

Once Ruby is installed, you will also have access to gem, Ruby’s package manager.

Step 2: Install Watir

Now that Ruby is installed, you can install the Watir gem. Open your terminal and run the following command to install Watir:

gem install watir

Step 3: Create a New Ruby Project

Create a new folder for your project and initialize a new Ruby project. This will allow you to manage your project’s dependencies. In your terminal, navigate to the folder where you want to store your project and run the following command:

bundle init

This will create a Gemfile in your project folder. Open this file and add the following line to include the Watir gem:

gem 'watir'

Next, install the gem by running the command:

bundle install

Building a Simple Scraper

Now that Watir is set up, let’s write a simple scraper to extract data from a webpage.

Step 1: Open the Website and Get the HTML

First, you need to import the Watir library and create a new instance of a browser. For this example, we’ll use Chrome in headless mode, which means the browser will run in the background without opening a GUI.

Here’s a basic Ruby script to get the HTML content of a webpage:

require 'watir'
# Launch Chrome browser in headless mode
browser = Watir::Browser.new :chrome, headless: true
# Open the website
browser.goto('https://www.example.com')
# Get the HTML of the page
html_content = browser.html
# Print the HTML content
puts html_content
# Close the browser
browser.close

Step 2: Extract Specific Data

Next, let’s extract specific data from the webpage. Suppose you want to scrape the titles of products from a product page. You can use the browser’s methods to locate elements and extract their text.

Here’s an example that extracts the titles of products from the page:

require 'watir'
# Launch Chrome browser in headless mode
browser = Watir::Browser.new :chrome, headless: true
# Open the website
browser.goto('https://www.example.com/products')
# Extract the product titles
product_titles = browser.divs(class: 'product-title').map(&:text)
# Print the product titles
puts product_titles
# Close the browser
browser.close

In this example, we use the divs method to find all elements with the class product-title and then extract their text.

Handling Pagination

Many websites paginate their content, meaning the product listings are spread across multiple pages. If you want to scrape all the products, you need to navigate through all the pages.

Let’s see how to handle pagination with Watir. We will click the “Next” button repeatedly until it is no longer available.

require 'watir'
# Launch Chrome browser in headless mode
browser = Watir::Browser.new :chrome, headless: true
# Open the website
browser.goto('https://www.example.com/products')
# List to store product titles
product_titles = []
# Loop through the pages
while browser.button(class: 'next').exists?
# Extract product titles from the current page
titles = browser.divs(class: 'product-title').map(&:text)
product_titles.concat(titles)
# Click the "Next" button to go to the next page
browser.button(class: 'next').click
# Wait for the next page to load
browser.wait
end
# Print the product titles
puts product_titles
# Close the browser
browser.close

This script will continue to click the “Next” button until it no longer exists. Each time it clicks, it extracts the product titles from the current page and adds them to the list.

Scrolling for Infinite Scrolling Pages

Some websites use infinite scrolling, where new content is loaded as the user scrolls down the page. To scrape such sites, you need to simulate scrolling.

Here’s how to simulate scrolling with Watir:

require 'watir'
# Launch Chrome browser in headless mode
browser = Watir::Browser.new :chrome, headless: true
# Open the website
browser.goto('https://www.example.com/infinite-scroll')
# List to store product titles
product_titles = []
# Set initial scroll height
previous_height = 0
# Keep scrolling until new content stops loading
loop do
# Extract product titles
titles = browser.divs(class: 'product-title').map(&:text)
product_titles.concat(titles)
# Scroll down
browser.execute_script('window.scrollBy(0, 1000)')
sleep 2
# Get the new scroll height
new_height = browser.execute_script('return document.body.scrollHeight')
# Break if the page height hasn't changed
break if new_height == previous_height
previous_height = new_height
end
# Print the product titles
puts product_titles
# Close the browser
browser.close

This script will keep scrolling down the page until the height of the page no longer increases, meaning there is no more content to load.

Taking Screenshots with Watir

Sometimes, you may want to capture screenshots while scraping. Watir makes it easy to take screenshots of the entire page, a specific element, or the visible area.

Full-Page Screenshot

To capture a full-page screenshot, you can use Watir’s built-in screenshot method. However, full-page screenshots may require a third-party plugin, as Watir only captures the visible portion of the page by default.

require 'watir'
# Launch Chrome browser in headless mode
browser = Watir::Browser.new :chrome, headless: true
# Open the website
browser.goto('https://www.example.com/product/1')
# Take a full-page screenshot
browser.screenshot.save 'fullpage_screenshot.png'
# Close the browser
browser.close

Specific Element Screenshot

To capture a screenshot of a specific element, you can use Selenium’s screenshot_as method, which Watir inherits.

require 'watir'
# Launch Chrome browser in headless mode
browser = Watir::Browser.new :chrome, headless: true
# Open the website
browser.goto('https://www.example.com/product/1')
# Get the target element
element = browser.div(class: 'product-summary')
# Take a screenshot of the element
screenshot_data = element.wd.screenshot_as(:png)
# Save the screenshot
File.open('element_screenshot.png', 'wb') { |file| file.write(screenshot_data) }
# Close the browser
browser.close

Conclusion

So, we’ve explored web scraping with Watir in Ruby. Watir is a fantastic tool for scraping dynamic websites. It offers excellent flexibility, allowing you to handle tasks like pagination, scrolling, and taking screenshots. Although it might be slower compared to other scraping tools, its ability to interact with JavaScript-heavy websites makes it a must-have in your toolkit. Now that you’ve learned the basics, you can start building your web scrapers for all sorts of projects, from tracking product prices to extracting data from infinite scroll pages.

Similar Posts