Building an Email Extractor with PHP and MySQL: A Step-by-Step Guide
Introduction
In our previous blogs, we covered the fundamentals of email extraction using PHP and MySQL. We explored how to scrape websites for emails and the tools and techniques involved. In this blog, we will take a practical approach and build a simple email extractor from scratch using PHP and MySQL. By the end of this guide, you’ll have a fully functional email extractor that can extract emails from websites and store them in a database.
1. Overview of the Project
We will create a PHP script that:
- Takes a URL input from the user.
- Scrapes the webpage for email addresses.
- Validates the extracted emails.
- Stores the emails in a MySQL database.
This project will provide a hands-on experience in using PHP for web scraping and data storage.
2. Setting Up the Environment
Before we start coding, ensure you have the following set up:
- Web Server: Use WAMP or XAMPP to create a local server environment.
- Database: Create a MySQL database named
email_extractor
with a table for storing emails.
CREATE TABLE emails (
id INT AUTO_INCREMENT PRIMARY KEY,
email_address VARCHAR(255) NOT NULL,
source VARCHAR(255),
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
3. Creating the PHP Script
Now, let’s create a PHP script named email_extractor.php
that will perform the email extraction. We will utilize cURL
for fetching webpage content and DOMDocument
for parsing HTML.
<?php
// Database connection
$servername = "localhost";
$username = "root";
$password = "";
$dbname = "email_extractor";
$conn = new mysqli($servername, $username, $password, $dbname);
if ($conn->connect_error) {
die("Connection failed: " . $conn->connect_error);
}
// Function to extract emails from the given URL
function extractEmails($url) {
// Initialize cURL session
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$html = curl_exec($ch);
curl_close($ch);
// Load HTML into DOMDocument
$dom = new DOMDocument();
@$dom->loadHTML($html);
$emails = [];
// Extract emails using regex
$pattern = '/[a-z0-9._%+-]+@[a-z0-9.-]+\.[a-z]{2,4}\b/i';
preg_match_all($pattern, $html, $matches);
foreach ($matches[0] as $email) {
// Remove duplicates and store valid emails
if (!in_array($email, $emails)) {
$emails[] = $email;
}
}
return $emails;
}
// Check if URL is provided
if (isset($_POST['url'])) {
$url = $_POST['url'];
$emails = extractEmails($url);
// Insert extracted emails into database
foreach ($emails as $email) {
$sql = "INSERT INTO emails (email_address, source) VALUES ('$email', '$url')";
$conn->query($sql);
}
echo "Emails extracted and stored successfully!";
}
// Close database connection
$conn->close();
?>
4. Validating Extracted Emails
It’s essential to validate the emails you extract to ensure they are legitimate. You can implement a basic email validation function in your PHP script:
function isValidEmail($email) {
return filter_var($email, FILTER_VALIDATE_EMAIL) !== false;
}
// Modify the insertion loop to validate emails
foreach ($emails as $email) {
if (isValidEmail($email)) {
$sql = "INSERT INTO emails (email_address, source) VALUES ('$email', '$url')";
$conn->query($sql);
}
}
5. Enhancements and Features to Consider
You can enhance your email extractor by adding the following features:
- Error Handling: Implement error handling to manage exceptions and invalid URLs gracefully.
- Rate Limiting: Introduce delays between requests to avoid overwhelming target servers.
- User Interface Improvements: Make the form more user-friendly with validation messages and loading indicators.
- Email Verification: Integrate third-party APIs for verifying the existence of extracted email addresses.
Conclusion
In this blog, we built a simple email extractor using PHP and MySQL. We learned how to scrape emails from a webpage, validate them, and store them in a MySQL database. This project serves as a practical introduction to web scraping and data handling with PHP.