Creating a Python Package for Email Extraction

n the world of data collection and web scraping, email extraction is a common task that can be made more efficient by creating a reusable Python package. In this blog post, we’ll walk through the steps to create a Python package that simplifies the process of extracting email addresses from various text sources.

Why Create a Python Package?

Creating a Python package allows you to:

  • Encapsulate functionality: Keep your email extraction logic organized and easy to reuse.
  • Share with others: Distribute your package via PyPI (Python Package Index) so others can benefit from your work.
  • Version control: Maintain different versions of your package for compatibility with various projects.

Prerequisites

Make sure you have the following installed:

  • Python (version 3.6 or higher)
  • pip (Python package manager)

You can check your Python version using:

python --version

If you need to install Python, you can download it from Python’s official site.

Step 1: Setting Up the Package Structure

Create a new directory for your package:

mkdir email_extractor
cd email_extractor

Inside this directory, create the following structure:

email_extractor/

├── email_extractor/
   ├── __init__.py
   └── extractor.py

├── tests/
   └── test_extractor.py

├── setup.py
└── README.md
  • The email_extractor folder will contain your package code.
  • The tests folder will contain unit tests.
  • setup.py is the configuration file for your package.
  • README.md provides information about your package.

Step 2: Writing the Email Extraction Logic

Open extractor.py and implement the email extraction logic:

import re

class EmailExtractor:
    def __init__(self):
        # Define the regex for matching email addresses
        self.email_regex = r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b'

    def extract(self, text):
        """
        Extracts email addresses from the given text.
        
        :param text: The input text from which to extract emails
        :return: A list of extracted email addresses
        """
        return re.findall(self.email_regex, text)

Step 3: Writing Unit Tests

Next, let’s write some unit tests to ensure our package works correctly. Open test_extractor.py and add the following code:

import unittest
from email_extractor.extractor import EmailExtractor

class TestEmailExtractor(unittest.TestCase):
    def setUp(self):
        self.extractor = EmailExtractor()

    def test_extract_emails(self):
        test_text = "You can reach me at [email protected] and [email protected]."
        expected_emails = ['[email protected]', '[email protected]']
        self.assertEqual(self.extractor.extract(test_text), expected_emails)

    def test_no_emails(self):
        test_text = "This text has no email addresses."
        expected_emails = []
        self.assertEqual(self.extractor.extract(test_text), expected_emails)

if __name__ == '__main__':
    unittest.main()

Step 4: Creating the setup.py File

The setup.py file is essential for packaging and distributing your Python package. Open setup.py and add the following content:

from setuptools import setup, find_packages

setup(
    name='email-extractor',
    version='0.1.0',
    description='A simple email extraction package',
    author='Your Name',
    author_email='[email protected]',
    packages=find_packages(),
    install_requires=[],  # Add any dependencies your package needs
    classifiers=[
        'Programming Language :: Python :: 3',
        'License :: OSI Approved :: MIT License',
        'Operating System :: OS Independent',
    ],
    python_requires='>=3.6',
)

Step 5: Writing the README File

Open README.md and write a brief description of your package and how to use it:

# Email Extractor

A simple Python package for extracting email addresses from text.

## Installation

You can install the package using pip:

```bash
pip install email-extractor

Usage

from email_extractor.extractor import EmailExtractor

extractor = EmailExtractor()
emails = extractor.extract("Contact us at [email protected].")
print(emails)  # Output: ['[email protected]']

#### Step 6: Running the Tests

Before packaging your code, it's a good idea to run the tests to ensure everything is working as expected. Run the following command:

```bash
python -m unittest discover -s tests

If all tests pass, you’re ready to package your code!

Step 7: Building the Package

To build your package, run:

python setup.py sdist bdist_wheel

This will create a dist directory containing the .tar.gz and .whl files for your package.

Step 8: Publishing Your Package

To publish your package to PyPI, you’ll need an account on PyPI. Once you have an account, install twine if you haven’t already:

pip install twine

Then, use Twine to upload your package:

twine upload dist/*

Follow the prompts to enter your PyPI credentials.

Conclusion

In this blog, we walked through the process of creating a Python package for email extraction. You learned how to set up the package structure, implement email extraction logic, write unit tests, and publish your package to PyPI.

By packaging your code, you can easily reuse it across different projects and share it with the broader Python community. Happy coding!

Similar Posts