How to Scrape Real Estate Data: A Complete Guide

Scraping real estate data from websites presents unique challenges. Property sites use JavaScript rendering, dynamic content loading, and anti-bot measures that block traditional scrapers. This guide covers practical solutions using Python and Bright Data’s tools.

Note: I am not affiliated with Bright Data. It’s the platform I am most familiar with therefore I chose to use it here.

Why Real Estate Scraping Is Challenging

Real estate websites present three main obstacles:

1. Summary vs. Detail Pages
Search results show limited data (price, address, thumbnail). Full property details — square footage, property history, agent information — require visiting individual listing pages, multiplying the number of requests needed.

2. JavaScript Rendering
Most real estate platforms render content client-side. A basic HTTP request returns incomplete HTML because data loads via JavaScript after the initial page load.

3. Anti-Bot Protection
Sites implement IP filtering, CAPTCHAs, rate limiting, and browser fingerprinting to detect and block scrapers.

Basic Scraper with Requests and BeautifulSoup

This approach works for simple sites without heavy anti-bot measures.

Inspecting the Target Page

Before writing code, inspect the page structure:

Open the search results in your browser
Right-click on a property listing and select “Inspect”
Identify the HTML elements containing the data you need (price, address, links)

Installation

pip install requests beautifulsoup4

Basic Scraper Code

import requests
from bs4 import BeautifulSoup
import csv
url = "https://www.example.com/homes/for_sale/San-Francisco/"
headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36"
}
response = requests.get(url, headers=headers)
if response.status_code == 200:
    soup = BeautifulSoup(response.content, 'html.parser')
    results = []
    property_items = soup.find_all("li", {"data-testid": "property-card"})
    for item in property_items:
        try:
            address = item.find("address").get_text(strip=True)
            price = item.find("span", {"data-test": "property-card-price"}).get_text(strip=True)
            url_link = item.find("a").get("href")
            full_url = f"https://www.example.com{url_link}" if url_link and not url_link.startswith("http") else url_link
            if address or price:
                results.append({
                    "address": address,
                    "price": price,
                    "link": full_url
                })
        except Exception as e:
            print(f"Error parsing card: {e}")
    if results:
        with open("real_estate_data.csv", "w", newline="", encoding="utf-8") as f:
            writer = csv.DictWriter(f, fieldnames=["address", "price", "link"])
            writer.writeheader()
            writer.writerows(results)
        print(f"Scraped {len(results)} listings")
else:
    print(f"Failed to fetch page. Status code: {response.status_code}")

Limitation: This approach fails on sites with JavaScript rendering or anti-bot protection.

Handling Anti-Bot Measures with Bright Data

When basic scraping fails, Bright Data offers several solutions depending on your needs.

Option 1: Unlocker APIs (Recommended for Static Pages)

Unlocker APIs handle proxy rotation, CAPTCHA solving, and anti-bot bypassing automatically. You send one request; it returns clean HTML or JSON.

Best for:

Pages that don’t require browser interaction
High-volume scraping with predictable costs
Teams without proxy infrastructure

Direct API Access (Recommended Method)

import requests
API_KEY = "YOUR_API_KEY"
ZONE_NAME = "YOUR_ZONE_NAME"
headers = {
    "Authorization": f"Bearer {API_KEY}",
    "Content-Type": "application/json"
}
payload = {
    "zone": ZONE_NAME,
    "url": "https://www.example.com/homes/for_sale/San-Francisco/",
    "format": "raw"
}
response = requests.post(
    "https://api.brightdata.com/request",
    headers=headers,
    json=payload
)
if response.status_code == 200:
    html_content = response.text
    # Parse with BeautifulSoup
    soup = BeautifulSoup(html_content, 'html.parser')
    # Extract data as shown above

Native Proxy-Based Access

For workflows already using proxy routing:

import requests
host = 'brd.superproxy.io'
port = 33335
username = 'brd-customer--zone-'
password = ''
proxy_url = f'http://{username}:{password}@{host}:{port}'
proxies = {
    'http': proxy_url,
    'https': proxy_url
}
url = "https://www.example.com/homes/for_sale/San-Francisco/"
response = requests.get(url, proxies=proxies)

Note: For native proxy access, install the Bright Data SSL certificate to avoid SSL errors, or set verify=False in your requests (not recommended for production).

Handling JavaScript-Rendered Content

If pages return incomplete data, use the x-unblock-expect header to wait for specific elements:

headers = {
    "Authorization": f"Bearer {API_KEY}",
    "Content-Type": "application/json"
}
payload = {
    "zone": ZONE_NAME,
    "url": "https://www.example.com/property/12345",
    "format": "raw",
    "headers": {
        "x-unblock-expect": '{"element": ".property-details"}'
    }
}

Option 2: Browser API (For Interactive Pages)

When you need full browser interaction — clicking buttons, scrolling, handling login flows — use the Browser API. For other tools, read my article about the best scraping browsers.

Best for:

JavaScript-heavy sites requiring interaction
Multi-step navigation flows
Sites with complex anti-bot detection

Puppeteer Example

const puppeteer = require('puppeteer-core');
const AUTH = 'YOUR_USERNAME:YOUR_PASSWORD';
const TARGET_URL = 'https://www.example.com/homes/for_sale/San-Francisco/';
async function scrapeRealEstate() {
    const browserWSEndpoint = `wss://${@brd.superproxy.io">AUTH}@brd.superproxy.io:9222`;
    const browser = await puppeteer.connect({ browserWSEndpoint });
    
    try {
        const page = await browser.newPage();
        await page.goto(TARGET_URL, { timeout: 120000 });
        
        // Wait for listings to load
        await page.waitForSelector('[data-testid="property-card"]');
        
        // Extract data
        const listings = await page.evaluate(() => {
            const cards = document.querySelectorAll('[data-testid="property-card"]');
            return Array.from(cards).map(card => ({
                address: card.querySelector('address')?.textContent?.trim(),
                price: card.querySelector('[data-test="property-card-price"]')?.textContent?.trim(),
                link: card.querySelector('a')?.href
            }));
        });
        
        console.log(listings);
    } finally {
        await browser.close();
    }
}
scrapeRealEstate();

Playwright Example

from playwright.sync_api import sync_playwright
AUTH = 'YOUR_USERNAME:YOUR_PASSWORD'
def scrape_real_estate():
    with sync_playwright() as p:
        browser = p.chromium.connect_over_cdp(f'wss://{@brd.superproxy.io">AUTH}@brd.superproxy.io:9222')
        page = browser.new_page()
        
        page.goto('https://www.example.com/homes/for_sale/San-Francisco/', timeout=120000)
        page.wait_for_selector('[data-testid="property-card"]')
        
        listings = page.evaluate('''() => {
            const cards = document.querySelectorAll('[data-testid="property-card"]');
            return Array.from(cards).map(card => ({
                address: card.querySelector('address')?.textContent?.trim(),
                price: card.querySelector('[data-test="property-card-price"]')?.textContent?.trim(),
                link: card.querySelector('a')?.href
            }));
        }''')
        
        browser.close()
        return listings

Option 3: Web Scraper API (Pre-Built Scrapers)

For popular real estate sites like Zillow, Realtor.com, or Redfin, Bright Data and other providers offer pre-built scrapers that return structured data directly.

import requests
API_KEY = "YOUR_API_KEY"
DATASET_ID = "gd_xxxxx"  # Real estate scraper ID
headers = {
    "Authorization": f"Bearer {API_KEY}",
    "Content-Type": "application/json"
}
# Synchronous scraping (up to 20 URLs)
response = requests.post(
    f"https://api.brightdata.com/datasets/v3/scrape?dataset_id={DATASET_ID}&format=json",
    headers=headers,
    json=[{"url": "https://www.zillow.com/homedetails/123-main-st/12345_zpid/"}]
)
data = response.json()

Pagination and Multiple Pages

Real estate listings span multiple pages. Handle pagination by iterating through page numbers:

import requests
from bs4 import BeautifulSoup
import time
API_KEY = "YOUR_API_KEY"
ZONE_NAME = "YOUR_ZONE_NAME"
BASE_URL = "https://www.example.com/homes/for_sale/San-Francisco/"
all_results = []
for page_num in range(1, 6):
    url = f"{BASE_URL}?page={page_num}"
    
    payload = {
        "zone": ZONE_NAME,
        "url": url,
        "format": "raw"
    }
    
    response = requests.post(
        "https://api.brightdata.com/request",
        headers={"Authorization": f"Bearer {API_KEY}", "Content-Type": "application/json"},
        json=payload
    )
    
    if response.status_code == 200:
        soup = BeautifulSoup(response.text, 'html.parser')
        # Parse and append results
        # ...
    
    time.sleep(1)  # Basic rate limiting

Extracting Full Property Details

Listing pages contain limited data. Scrape individual property pages for complete information:

def scrape_property_details(property_url, api_key, zone_name):
    payload = {
        "zone": zone_name,
        "url": property_url,
        "format": "raw"
    }
    
    response = requests.post(
        "https://api.brightdata.com/request",
        headers={"Authorization": f"Bearer {api_key}", "Content-Type": "application/json"},
        json=payload
    )
    
    if response.status_code == 200:
        soup = BeautifulSoup(response.text, 'html.parser')
        return {
            "description": soup.find("div", {"data-testid": "description"})?.get_text(strip=True),
            "bedrooms": soup.find("span", {"data-testid": "beds"})?.get_text(strip=True),
            "bathrooms": soup.find("span", {"data-testid": "baths"})?.get_text(strip=True),
            "sqft": soup.find("span", {"data-testid": "sqft"})?.get_text(strip=True),
        }
    return None
# Enrich listing data
for listing in results:
    details = scrape_property_details(listing["link"], API_KEY, ZONE_NAME)
    if details:
        listing.update(details)

Choosing the Right Product

Summary

Start simple with requests BeautifulSoup for basic sites
Use Unlocker API when you encounter anti-bot protection on static pages
Switch to Browser API when pages require JavaScript interaction
Consider Web Scraper API for popular sites with pre-built scrapers
Handle pagination to collect data across multiple pages
Scrape detail pages for complete property information

How to Scrape Real Estate Data: A Complete Guide

Why Real Estate Scraping Is Challenging

Basic Scraper with Requests and BeautifulSoup

Inspecting the Target Page

Installation

Basic Scraper Code

Handling Anti-Bot Measures with Bright Data

Option 1: Unlocker APIs (Recommended for Static Pages)

Direct API Access (Recommended Method)

Native Proxy-Based Access

Handling JavaScript-Rendered Content

Option 2: Browser API (For Interactive Pages)

Puppeteer Example

Playwright Example

Option 3: Web Scraper API (Pre-Built Scrapers)

Pagination and Multiple Pages

Extracting Full Property Details

Choosing the Right Product

Summary

How to Scrape DuckDuckGo SERP Data: 4 Effective Approaches

What are Data Ethics: Everything You Need to Know

Web Scraping With Playwright Guide (2026 Updated)

What Is a CAPTCHA and How Does It Work?

How to Avoid CAPTCHAs? Multiple Ways!

How to Change User Agent with cURL

Why Real Estate Scraping Is Challenging

Basic Scraper with Requests and BeautifulSoup

Inspecting the Target Page

Installation

Basic Scraper Code

Handling Anti-Bot Measures with Bright Data

Option 1: Unlocker APIs (Recommended for Static Pages)

Direct API Access (Recommended Method)

Native Proxy-Based Access

Handling JavaScript-Rendered Content

Option 2: Browser API (For Interactive Pages)

Puppeteer Example

Playwright Example

Option 3: Web Scraper API (Pre-Built Scrapers)

Pagination and Multiple Pages

Extracting Full Property Details

Choosing the Right Product

Summary

Similar Posts