Google Maps Data Scraping Using Puppeteer
Google Maps is a treasure trove of data that can be valuable for various purposes, including market research, lead generation, and location-based insights. However, accessing this data in bulk often requires web scraping tools. One of the best tools for scraping Google Maps is Puppeteer, a Node.js library that provides a high-level API to control Chrome or Chromium over the DevTools Protocol. In this blog, we will explore how to scrape data from Google Maps using Puppeteer.
What You Will Learn
- Setting up Puppeteer
- Navigating Google Maps
- Extracting Data (Business Names, Ratings, Addresses, etc.)
- Dealing with Pagination
- Tips for Avoiding Blocks
Prerequisites
Before we dive into the code, ensure you have the following:
- Node.js installed on your system.
- Basic understanding of JavaScript and web scraping.
- Familiarity with CSS selectors, as they’ll help in targeting specific elements on the page.
Step 1: Install Puppeteer
Start by installing Puppeteer. Open your terminal and run the following command:
npm install puppeteer
Puppeteer automatically downloads Chromium when installed, so you’re ready to go without any additional configuration.
Step 2: Launching a Browser Instance
First, let’s set up Puppeteer to launch a browser and navigate to Google Maps:
const puppeteer = require('puppeteer');
(async () => {
// Launch a browser instance
const browser = await puppeteer.launch({
headless: false, // Set to 'true' if you don't need to see the browser
});
// Open a new page
const page = await browser.newPage();
// Navigate to Google Maps
await page.goto('https://www.google.com/maps');
// Wait for the page to load completely
await page.waitForSelector('#searchboxinput');
// Interact with the search box (e.g., searching for "Hotels in San Francisco")
await page.type('#searchboxinput', 'Hotels in San Francisco');
await page.click('#searchbox-searchbutton');
// Wait for search results to load
await page.waitForSelector('.section-result');
// Further code goes here...
})();
In this code:
- We launch Puppeteer in non-headless mode, allowing you to observe the browser.
- The
goto
function navigates to Google Maps. - We then wait for the search box and input a query using Puppeteer’s
.type()
and.click()
functions.
Step 3: Extracting Data
Once the search results load, we can extract the required information. Google Maps often displays results in cards with business names, addresses, ratings, etc. You can scrape this data by targeting specific CSS selectors.
const data = await page.evaluate(() => {
let results = [];
let items = document.querySelectorAll('.section-result');
items.forEach((item) => {
const name = item.querySelector('.section-result-title span')?.innerText || 'N/A';
const rating = item.querySelector('.cards-rating-score')?.innerText || 'N/A';
const address = item.querySelector('.section-result-location')?.innerText || 'N/A';
results.push({ name, rating, address });
});
return results;
});
console.log(data);
In this script:
- We use
page.evaluate()
to run code inside the browser’s context and gather information. - The
document.querySelectorAll()
function finds all the result cards. - For each result, we extract the business name, rating, and address using their respective CSS selectors.
Step 4: Handling Pagination
Google Maps paginates results, so we need to loop through multiple pages to scrape all data. We can detect and click the “Next” button to go through the results until no more pages are available.
let hasNextPage = true;
while (hasNextPage) {
// Extract data from the current page
const currentPageData = await page.evaluate(() => {
let results = [];
let items = document.querySelectorAll('.section-result');
items.forEach((item) => {
const name = item.querySelector('.section-result-title span')?.innerText || 'N/A';
const rating = item.querySelector('.cards-rating-score')?.innerText || 'N/A';
const address = item.querySelector('.section-result-location')?.innerText || 'N/A';
results.push({ name, rating, address });
});
return results;
});
// Store the current page data or process it as needed
console.log(currentPageData);
// Check if there's a "Next" button and click it
const nextButton = await page.$('.n7lv7yjyC35__button-next-icon');
if (nextButton) {
await nextButton.click();
await page.waitForTimeout(2000); // Wait for the next page to load
} else {
hasNextPage = false; // Exit loop if no next button is found
}
}
This script iterates through the available pages until it can no longer find the “Next” button. After each page, it extracts the data and proceeds to the next set of results.
Step 5: Tips for Avoiding Blocks
Google Maps may block or throttle your scraper if you send too many requests in a short period. Here are some tips to reduce the chances of being blocked:
- Use Headless Mode Sparingly: Running the browser in headless mode can sometimes trigger blocks more quickly.
- Set Random Delays: Avoid scraping at a constant rate. Randomize delays between page loads and actions to mimic human behavior.
await page.waitForTimeout(Math.floor(Math.random() * 3000) + 2000); // Wait 2-5 seconds
- Rotate User-Agents: Use a different user-agent string for each session.
await page.setUserAgent('Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36');
- Proxy Rotation: Consider using proxies to distribute your requests across different IP addresses.
Conclusion
Scraping Google Maps using Puppeteer is a powerful way to automate data collection for businesses, market research, or lead generation. By following the steps outlined in this blog, you can gather business names, addresses, ratings, and more with ease. Remember to respect Google’s terms of service and legal guidelines when scraping their data.
With Puppeteer, the possibilities are vast—whether it’s handling pagination, extracting detailed information, or using random delays to avoid detection, you’re well on your way to mastering Google Maps scraping!