Axios Pagination: How to Scrape Multiple Pages
In this guide, I’ll show you how to use Axios, a popular tool for making HTTP requests in Node.js, to scrape paginated websites. But, Axios alone can’t handle pagination on its own. We’ll need to write some extra code to move from one page to the next until we’ve collected all the data.
Along the way, we’ll also cover how to handle JavaScript-based pagination and avoid getting blocked while scraping. Let’s dive in and get started!
What is Pagination?
Pagination is a technique websites use to divide large amounts of data into multiple pages. Instead of loading all the data at once, the website displays a limited number of items per page, and users can navigate through pages using buttons or links.
There are three common types of pagination:
- Numbered Pagination — Pages are labeled numerically, like 1, 2, 3, 4, …
- “Next” and “Previous” Buttons — Users click Next to go forward or Previous to go back.
- Infinite Scrolling — New data loads automatically as the user scrolls down.
Each type requires a different approach for scraping. Let’s start with the simplest: numbered pagination.
Setting Up Axios for Web Scraping
To scrape paginated data, we need two npm packages:
- Axios — A popular HTTP client to send requests to websites
- Cheerio — A lightweight library for parsing HTML and extracting data
Install Axios and Cheerio
First, create a new Node.js project and install the required packages:
mkdir axios-pagination-scraper
cd axios-pagination-scraper
npm init -y
npm install axios cheerio
Now, we can start writing our scraper.
Scraping Websites with Numbered Pagination
Step 1: Basic Scraper for a Single Page
We will scrape product details (name, price, and image URL) from a fictional e-commerce website that uses numbered pagination.
const axios = require('axios');
const cheerio = require('cheerio');
const url = 'https://www.scrapingcourse.com/ecommerce/?page=1';
async function scrapePage(pageUrl) {
try {
const response = await axios.get(pageUrl);
if (response.status !== 200) {
console.error(`Error: ${response.status}`);
return [];
}
const $ = cheerio.load(response.data);
const products = [];
$('.product').each((index, product) => {
const name = $(product).find('.product-name').text().trim();
const price = $(product).find('.product-price').text().trim();
const imageUrl = $(product).find('.product-image').attr('src');
products.push({ Name: name, Price: price, Image: imageUrl });
});
return products;
} catch (error) {
console.error(`Error: ${error.message}`);
return [];
}
}
(async () => {
const products = await scrapePage(url);
console.log(products);
})();
This script extracts products from only the first page. Now, let’s modify it to scrape multiple pages.
Step 2: Scraping Multiple Pages
We need to loop through each page and extract data to scrape all pages. We assume the website has 12 pages.
async function scrapeAllPages(baseURL, totalPages) {
let allProducts = [];
for (let i = 1; i <= totalPages; i++) {
const pageUrl = `${baseURL}?page=${i}`;
console.log(`Scraping: ${pageUrl}`);
const products = await scrapePage(pageUrl);
allProducts = […allProducts, …products];
}
return allProducts;
}
(async () => {
const allProducts = await scrapeAllPages('https://www.scrapingcourse.com/ecommerce/', 12);
console.log(allProducts);
})();
This version scrapes all 12 pages and combines the data into a single array.
Scraping Websites with “Next” and “Previous” Buttons
Some websites don’t use numbered pages but instead provide Next and Previous buttons.
To handle this, we need to:
- Extract the “Next Page” URL.
- Visit the next page recursively until there is no Next button.
Step 1: Modify the Scraper to Detect the Next Page Link
async function scrapePages(url) {
let products = [];
try {
const response = await axios.get(url);
if (response.status !== 200) {
console.error(`Error: ${response.status}`);
return products;
}
const $ = cheerio.load(response.data);
$('.product').each((index, product) => {
const name = $(product).find('.product-name').text().trim();
const price = $(product).find('.product-price').text().trim();
const imageUrl = $(product).find('.product-image').attr('src');
products.push({ Name: name, Price: price, Image: imageUrl });
});
const nextPage = $('.next').attr('href');
if (nextPage) {
console.log(`Scraping next page: ${nextPage}`);
const nextProducts = await scrapePages(nextPage);
products = […products, …nextProducts];
}
} catch (error) {
console.error(`Error: ${error.message}`);
}
return products;
}
(async () => {
const allProducts = await scrapePages('https://www.scrapingcourse.com/ecommerce/');
console.log(allProducts);
})();
Now, the scraper automatically follows the “Next” button until no more pages are left.
Handling JavaScript-Based Pagination
Websites using JavaScript to load data dynamically cannot be scraped with Axios alone. You need a headless browser like Puppeteer or Playwright.
Using Puppeteer for Infinite Scrolling
Install Puppeteer:
npm install puppeteer
Modify the scraper:
const puppeteer = require('puppeteer');
async function scrapeInfiniteScroll(url) {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto(url, { waitUntil: 'networkidle2' });
let products = [];
while (true) {
const newProducts = await page.evaluate(() => {
return Array.from(document.querySelectorAll('.product')).map(product => ({
Name: product.querySelector('.product-name').innerText,
Price: product.querySelector('.product-price').innerText,
Image: product.querySelector('.product-image').src
}));
});
products = […products, …newProducts];
const loadMore = await page.$('.load-more');
if (loadMore) {
await loadMore.click();
await page.waitForTimeout(2000);
} else {
break;
}
}
await browser.close();
return products;
}
(async () => {
const allProducts = await scrapeInfiniteScroll('https://www.scrapingcourse.com/infinite-scroll');
console.log(allProducts);
})();
This scraper scrolls the page and clicks the “Load More” button until no more content appears.
Avoiding Anti-Bot Detection
Websites often block scrapers using CAPTCHAs, IP bans, and JavaScript challenges. To avoid this:
- Use rotating proxies
- Randomize request headers and user-agents
- Add delays between requests
- Using a reliable solution like Bright Data or Oxylabs, which automatically bypasses anti-bots. Simple code example:
const axios = require('axios');
const url = 'https://www.scrapingcourse.com/antibot-challenge';
const apiKey = 'YOUR_Bright Data_API_KEY';
axios.get(`https://api.brightdata.com/v1/?url=${url}&apikey=${apiKey}&js_render=true`)
.then(response => console.log(response.data))
.catch(error => console.log(error));
Conclusion
When scraping multiple pages, it’s important to understand the anti-bot measures that websites often use. While tools like Axios and Cheerio are great for handling basic pagination, these anti-bot systems can still get in the way. That’s why I recommend using Bright Data or Oxylabs. It’s a full-fledged web scraping solution that helps you scrape websites without worrying about getting blocked.
With Bright Data or Oxylabs, you can access rotating proxies, CAPTCHA solving, and other innovative features that make scraping easier and more reliable. So, if you want to keep your scraping smooth and uninterrupted, Bright Data and Oxylabs are definitely worth considering. It’ll save you time and headaches, so you can focus on gathering the data you need!