Google Maps Data Scraping Using Puppeteer

Google Maps is a treasure trove of data that can be valuable for various purposes, including market research, lead generation, and location-based insights. However, accessing this data in bulk often requires web scraping tools. One of the best tools for scraping Google Maps is Puppeteer, a Node.js library that provides a high-level API to control Chrome or Chromium over the DevTools Protocol. In this blog, we will explore how to scrape data from Google Maps using Puppeteer.

What You Will Learn

  • Setting up Puppeteer
  • Navigating Google Maps
  • Extracting Data (Business Names, Ratings, Addresses, etc.)
  • Dealing with Pagination
  • Tips for Avoiding Blocks

Prerequisites

Before we dive into the code, ensure you have the following:

  • Node.js installed on your system.
  • Basic understanding of JavaScript and web scraping.
  • Familiarity with CSS selectors, as they’ll help in targeting specific elements on the page.

Step 1: Install Puppeteer

Start by installing Puppeteer. Open your terminal and run the following command:

npm install puppeteer

Puppeteer automatically downloads Chromium when installed, so you’re ready to go without any additional configuration.

Step 2: Launching a Browser Instance

First, let’s set up Puppeteer to launch a browser and navigate to Google Maps:

const puppeteer = require('puppeteer');

(async () => {
  // Launch a browser instance
  const browser = await puppeteer.launch({
    headless: false,  // Set to 'true' if you don't need to see the browser
  });

  // Open a new page
  const page = await browser.newPage();

  // Navigate to Google Maps
  await page.goto('https://www.google.com/maps');

  // Wait for the page to load completely
  await page.waitForSelector('#searchboxinput');

  // Interact with the search box (e.g., searching for "Hotels in San Francisco")
  await page.type('#searchboxinput', 'Hotels in San Francisco');
  await page.click('#searchbox-searchbutton');

  // Wait for search results to load
  await page.waitForSelector('.section-result');
  
  // Further code goes here...
})();

In this code:

  • We launch Puppeteer in non-headless mode, allowing you to observe the browser.
  • The goto function navigates to Google Maps.
  • We then wait for the search box and input a query using Puppeteer’s .type() and .click() functions.

Step 3: Extracting Data

Once the search results load, we can extract the required information. Google Maps often displays results in cards with business names, addresses, ratings, etc. You can scrape this data by targeting specific CSS selectors.

const data = await page.evaluate(() => {
  let results = [];
  let items = document.querySelectorAll('.section-result');
  
  items.forEach((item) => {
    const name = item.querySelector('.section-result-title span')?.innerText || 'N/A';
    const rating = item.querySelector('.cards-rating-score')?.innerText || 'N/A';
    const address = item.querySelector('.section-result-location')?.innerText || 'N/A';

    results.push({ name, rating, address });
  });

  return results;
});

console.log(data);

In this script:

  • We use page.evaluate() to run code inside the browser’s context and gather information.
  • The document.querySelectorAll() function finds all the result cards.
  • For each result, we extract the business name, rating, and address using their respective CSS selectors.

Step 4: Handling Pagination

Google Maps paginates results, so we need to loop through multiple pages to scrape all data. We can detect and click the “Next” button to go through the results until no more pages are available.

let hasNextPage = true;

while (hasNextPage) {
  // Extract data from the current page
  const currentPageData = await page.evaluate(() => {
    let results = [];
    let items = document.querySelectorAll('.section-result');
    
    items.forEach((item) => {
      const name = item.querySelector('.section-result-title span')?.innerText || 'N/A';
      const rating = item.querySelector('.cards-rating-score')?.innerText || 'N/A';
      const address = item.querySelector('.section-result-location')?.innerText || 'N/A';

      results.push({ name, rating, address });
    });

    return results;
  });

  // Store the current page data or process it as needed
  console.log(currentPageData);

  // Check if there's a "Next" button and click it
  const nextButton = await page.$('.n7lv7yjyC35__button-next-icon');
  
  if (nextButton) {
    await nextButton.click();
    await page.waitForTimeout(2000);  // Wait for the next page to load
  } else {
    hasNextPage = false;  // Exit loop if no next button is found
  }
}

This script iterates through the available pages until it can no longer find the “Next” button. After each page, it extracts the data and proceeds to the next set of results.

Step 5: Tips for Avoiding Blocks

Google Maps may block or throttle your scraper if you send too many requests in a short period. Here are some tips to reduce the chances of being blocked:

  • Use Headless Mode Sparingly: Running the browser in headless mode can sometimes trigger blocks more quickly.
  • Set Random Delays: Avoid scraping at a constant rate. Randomize delays between page loads and actions to mimic human behavior.
await page.waitForTimeout(Math.floor(Math.random() * 3000) + 2000); // Wait 2-5 seconds
  • Rotate User-Agents: Use a different user-agent string for each session.
await page.setUserAgent('Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36');
  • Proxy Rotation: Consider using proxies to distribute your requests across different IP addresses.

Conclusion

Scraping Google Maps using Puppeteer is a powerful way to automate data collection for businesses, market research, or lead generation. By following the steps outlined in this blog, you can gather business names, addresses, ratings, and more with ease. Remember to respect Google’s terms of service and legal guidelines when scraping their data.

With Puppeteer, the possibilities are vast—whether it’s handling pagination, extracting detailed information, or using random delays to avoid detection, you’re well on your way to mastering Google Maps scraping!

Similar Posts