How to Create a Local Email Extractor with Node.js

Email extraction is a valuable skill for data collection, marketing, and various other applications. In this blog, we’ll guide you through the process of creating a local email extractor using Node.js. Node.js is a powerful runtime environment that allows you to build fast and scalable network applications, making it perfect for this task.

Why Use Node.js for Email Extraction?

Node.js is known for its non-blocking, event-driven architecture, making it an excellent choice for I/O-heavy applications like web scraping and data processing. Its vast ecosystem of libraries also allows you to easily implement features like file reading and regular expression matching.

Prerequisites

Before you begin, ensure you have the following installed:

  • Node.js (version 12 or higher)
  • npm (Node Package Manager, which comes with Node.js)

You can check your Node.js version using the following command:

node -v

If you need to install Node.js, you can download it from Node.js official site.

Step 1: Setting Up Your Project

First, create a new directory for your email extractor project and navigate into it:

mkdir email-extractor
cd email-extractor

Now, initialize a new Node.js project:

This command will create a package.json file with default settings.

Step 2: Installing Required Packages

We’ll need the fs module for file system operations and the readline module for reading input files line by line. Both are included in Node.js by default, but for regular expressions, we won’t need any additional libraries.

However, if you want to handle HTML files and extract emails from them, you can install the cheerio library, which provides jQuery-like functionality for HTML parsing:

npm install cheerio

Step 3: Creating the Email Extractor

Now, let’s create a file named emailExtractor.js:

touch emailExtractor.js

Open emailExtractor.js in your favorite text editor, and let’s start coding!

Step 4: Reading a Text File and Extracting Emails

Here’s the basic structure for reading a text file and extracting emails using regular expressions:

const fs = require('fs');
const readline = require('readline');

// Function to extract emails
function extractEmails(text) {
    const emailRegex = /\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b/g;
    return text.match(emailRegex) || [];
}

// Read the input file
const inputFile = 'input.txt'; // Change this to your input file

const rl = readline.createInterface({
    input: fs.createReadStream(inputFile),
    crlfDelay: Infinity
});

let allEmails = new Set();

rl.on('line', (line) => {
    const emails = extractEmails(line);
    emails.forEach(email => allEmails.add(email));
});

rl.on('close', () => {
    console.log('Extracted Emails:');
    console.log(Array.from(allEmails));
});

Step 5: Testing Your Email Extractor

To test the email extractor, create a sample text file named input.txt in the same directory:

Hello, you can reach me at [email protected] or [email protected].
This line contains an invalid email: invalid-[email protected]

Run your email extractor script using Node.js:

node emailExtractor.js

You should see the following output:

Extracted Emails:
[ '[email protected]', '[email protected]' ]

Step 6: Enhancing the Email Extractor with HTML Support

If you want to extract emails from HTML files, you can enhance your script by using the cheerio library. Here’s how you can modify your code to include HTML parsing:

const cheerio = require('cheerio');

// Function to extract emails from HTML
function extractEmailsFromHTML(html) {
    const $ = cheerio.load(html);
    const textContent = $('body').text();
    return extractEmails(textContent);
}

// Modify the reading logic to check for HTML
const inputHTMLFile = 'input.html'; // Change this to your HTML file

fs.readFile(inputHTMLFile, 'utf8', (err, html) => {
    if (err) {
        console.error(err);
        return;
    }
    const emailsFromHTML = extractEmailsFromHTML(html);
    console.log('Extracted Emails from HTML:');
    console.log(emailsFromHTML);
});

Now, create an HTML file named input.html:

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <title>Sample Email Page</title>
</head>
<body>
    <p>Contact us at [email protected] or [email protected].</p>
    <p>Invalid email format: [email protected]</p>
</body>
</html>

Run the updated script:

node emailExtractor.js

You should see the emails extracted from the HTML file as well.

Conclusion

In this blog, we covered how to create a local email extractor using Node.js. We started with a basic text file extractor and enhanced it to handle HTML content. With Node.js and its powerful libraries, you can easily build a flexible email extraction tool for your projects.