Web Scraping With jQuery: A Complete Tutorial
Here, I’ll show you how to build your web scraper using jQuery. Whether you’re scraping on the client-side or server-side, I’ll walk you through the steps and key concepts. By the end, you’ll have the skills to start extracting data from websites in no time! Let’s dive in!
What is jQuery?
jQuery is one of the most popular JavaScript libraries used for DOM (Document Object Model) manipulation. It simplifies tasks like traversing and modifying the DOM, handling events, and making AJAX requests. jQuery is used to add interactivity to websites and is incredibly useful for web scraping as it allows you to select HTML elements and extract the content from them easily.
You can use jQuery for both client-side and server-side web scraping. In client-side scraping, your web browser executes the scraping code, while in server-side scraping, the scraping process is handled on the server, typically using Node.js. We’ll explore both approaches in this tutorial.
What is Client-Side Scraping?
Client-side scraping means running scraping code directly in the user’s browser. This approach is done using JavaScript, which allows the browser to access the HTML content of the webpage and extract the data you need. However, client-side scraping has some limitations, mainly due to security features like CORS (Cross-Origin Resource Sharing).
For example, if you try to fetch data from another domain (a website that’s not the same as the one you’re currently browsing), your browser will block the request due to CORS policies. These restrictions make client-side scraping difficult for large-scale web scraping tasks, but it’s still useful for small projects or when you can scrape the same domain.
How Do You Scrape a Web Page Using jQuery?
The first thing you need to do is download the HTML content of the webpage you want to scrape. This is done with an HTTP request. jQuery provides a simple way to make such requests using the $.get() function.
Here’s an example of how to scrape a webpage:
$.get("https://example.com", function(html) {
console.log(html);
});
This code sends an HTTP GET request to the specified URL and logs the HTML content of the webpage to the console. However, if you try this on a website that doesn’t allow cross-origin requests, you’ll run into a problem known as the CORS error.
Why Can’t We Scrape Websites Client-Side?
The CORS issue happens because modern browsers implement security restrictions to prevent malicious websites from stealing data from others. When your browser requests a different website, it includes an “Origin” header that identifies where the request is coming from. The server receiving the request checks this header to see if the request is from an allowed domain. If not, the server will block the request.
This security feature makes client-side scraping challenging. To bypass this, you could use a proxy server or a headless browser, but these solutions are typically part of server-side scraping.
Bypassing CORS Restrictions with Bright Data’s Web Unlocker
Since client-side scraping is restricted by CORS policies, a better approach is to use a server-side proxy solution like Bright Data’s Web Unlocker.
Why Use Web Unlocker?
✅ Bypasses CORS Restrictions — Fetch data from any website without browser security blocks.
✅ Handles Anti-Bot Measures — Automatically manages headers, cookies, and CAPTCHAs.
✅ No Need for Manual Proxy Rotation — Dynamically assigns IPs and rotates them as needed.
✅ Works Seamlessly with jQuery & Node.js — Fetch data server-side before processing it with jQuery.
How to Use Web Unlocker with jQuery in Node.js
To integrate Web Unlocker into your server-side scraper , follow these steps:
1️⃣ Sign up for Bright Data and get your Web Unlocker credentials.
2️⃣ Install Axios (a popular HTTP client for Node.js):
npm install axios
3️⃣ Modify your scraper to use Web Unlocker:
const axios = require("axios");
const webUnlockerProxy = "http://username:[email protected]:22225";
async function scrapeWithWebUnlocker(url) {
try {
const response = await axios.get(url, {
proxy: {
host: "brd.superproxy.io",
port: 22225,
auth: {
username: "your-username",
password: "your-password"
}
}
});
console.log(response.data); // Process the HTML with jQuery
} catch (error) {
console.error("Error fetching data:", error);
}
}
scrapeWithWebUnlocker("https://example.com");
How This Works:
- The request is routed through Bright Data’s Web Unlocker , bypassing CORS and anti-bot protections.
- The response contains the HTML of the target page , which can then be processed with jQuery in Node.js.
- This method ensures reliable and scalable scraping without worrying about IP bans or CAPTCHAs.
Server-Side Scraping With jQuery and Node.js
Since client-side scraping is limited by CORS, a more robust approach is to use server-side scraping. This method involves running your scraper on a server that doesn’t have the same restrictions as browsers. Node.js, a JavaScript runtime environment, is perfect for server-side scraping. You can use jQuery with Node.js to scrape web pages without worrying about CORS errors.
How to Use jQuery With Node.js?
To use jQuery in Node.js, you need to install both the jQuery library and a tool called jsdom. jsdom is a JavaScript implementation of the web standards (like the DOM) that runs in Node.js. It essentially simulates a web browser inside your server, allowing jQuery to work as if it were in a real browser environment.
First, you need to set up your Node.js environment. Make sure you have Node.js installed on your system. Then, create a new directory for your project and install the required dependencies.
mkdir my-web-scraper
cd my-web-scraper
npm init -y
npm install jquery jsdom
Once the packages are installed, you can start using jQuery in your Node.js scraper.
Example: Scraping a Web Page With Node.js and jQuery
Here’s a basic example of how to scrape a webpage using jQuery in Node.js:
const { JSDOM } = require("jsdom");
const { window } = new JSDOM("", { url: "https://example.com" });
const $ = require("jquery")(window);
$.get("https://example.com", function(html) {
console.log(html);
});
This code initializes a jsdom instance with the URL of the page you want to scrape. After that, jQuery can interact with the HTML document in the same way it would in a browser.
Scraping Specific Data
Once you have the HTML content of the page, the next step is to extract the data you need. This is where jQuery shines. You can use jQuery’s .find() method to select specific elements and .text() or .attr() to extract their content.
Let’s say you want to scrape product names, prices, and URLs from an e-commerce site. Here’s how you can do it:
$.get("https://example.com/products", function(html) {
const productElements = $(html).find("li.product");
const products = [];
productElements.each((i, productElement) => {
const product = {
name: $(productElement).find("h2").text(),
price: $(productElement).find(".price").text(),
url: $(productElement).find("a").attr("href")
};
products.push(product);
});
console.log(products);
});
In this example:
- We use $.get() to retrieve the HTML of the product page.
- The .find() method searches for all li elements with the class product.
- We loop through each product element and extract the product name, price, and URL using jQuery’s .text() and .attr() methods.
This will print an array of product objects containing the scraped data.
Storing Scraped Data
Once you’ve extracted the data you need, the next step is to store it. You can save the data to a local file, a database, or any other storage medium. For simplicity, let’s store the data in a JSON file.
To do this, you can use Node.js’s built-in fs (file system) module to write the data to a file.
const fs = require("fs");
fs.writeFileSync("products.json", JSON.stringify(products, null, 2));
This code will save the scraped product data to a file called products.json in the current directory.
Advanced Scraping Techniques
Pagination
Many websites have paginated content, meaning that not all the data is displayed on a single page. To scrape all the data from a paginated website, you’ll need to navigate through multiple pages.
In this case, you can modify your scraper to follow the “Next” link or construct the URLs for each page. Here’s an example of how you might scrape multiple pages:
function scrapePage(pageUrl) {
$.get(pageUrl, function(html) {
const productElements = $(html).find("li.product");
const products = [];
productElements.each((i, productElement) => {
const product = {
name: $(productElement).find("h2").text(),
price: $(productElement).find(".price").text(),
url: $(productElement).find("a").attr("href")
};
products.push(product);
});
console.log(products);
const nextPage = $(html).find(".next-page").attr("href");
if (nextPage) {
scrapePage(nextPage); // Recursively scrape the next page
}
});
}
scrapePage("https://example.com/products?page=1");
This recursive function will scrape the current page and then check for a link to the next page. If a “Next” link is found, it will call scrapePage() again with the URL of the next page.
Using Regular Expressions
Sometimes, you may need to extract specific data patterns from a webpage. For example, if product prices are listed in a consistent format, you can use regular expressions (regex) to match and extract the prices.
Here’s an example of how to use regex with jQuery:
$.get("https://example.com/products", function(html) {
const prices = new Set();
$(html).find("span.price").each((i, priceElement) => {
const priceText = $(priceElement).text();
if (/^$d .d{2}$/.test(priceText)) {
prices.add(priceText);
}
});
console.log(Array.from(prices));
});
This code uses a regular expression (/^$d .d{2}$/) to match strings that represent prices, like “$69.00”. It stores the matched prices in a Set to ensure uniqueness.
Conclusion
In this tutorial, I showed you how to scrape data from websites using jQuery. We looked at both client-side and server-side scraping methods. While client-side scraping has its limits because of security measures like CORS, server-side scraping with jQuery in Node.js lets you avoid these restrictions and scrape data from any site. I also explained how to use jQuery’s DOM traversal methods to extract specific information.
We covered how to handle pagination, use regular expressions, and store data. Now you have the tools to build effective web scrapers and gather useful data from the web. I hope you feel confident and excited to start scraping!