Creating a Python Package for Email Extraction
n the world of data collection and web scraping, email extraction is a common task that can be made more efficient by creating a reusable Python package. In this blog post, we’ll walk through the steps to create a Python package that simplifies the process of extracting email addresses from various text sources.
Why Create a Python Package?
Creating a Python package allows you to:
- Encapsulate functionality: Keep your email extraction logic organized and easy to reuse.
- Share with others: Distribute your package via PyPI (Python Package Index) so others can benefit from your work.
- Version control: Maintain different versions of your package for compatibility with various projects.
Prerequisites
Make sure you have the following installed:
- Python (version 3.6 or higher)
- pip (Python package manager)
You can check your Python version using:
python --version
If you need to install Python, you can download it from Python’s official site.
Step 1: Setting Up the Package Structure
Create a new directory for your package:
mkdir email_extractor
cd email_extractor
Inside this directory, create the following structure:
email_extractor/
│
├── email_extractor/
│ ├── __init__.py
│ └── extractor.py
│
├── tests/
│ └── test_extractor.py
│
├── setup.py
└── README.md
- The
email_extractor
folder will contain your package code. - The
tests
folder will contain unit tests. setup.py
is the configuration file for your package.README.md
provides information about your package.
Step 2: Writing the Email Extraction Logic
Open extractor.py
and implement the email extraction logic:
import re
class EmailExtractor:
def __init__(self):
# Define the regex for matching email addresses
self.email_regex = r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b'
def extract(self, text):
"""
Extracts email addresses from the given text.
:param text: The input text from which to extract emails
:return: A list of extracted email addresses
"""
return re.findall(self.email_regex, text)
Step 3: Writing Unit Tests
Next, let’s write some unit tests to ensure our package works correctly. Open test_extractor.py
and add the following code:
import unittest
from email_extractor.extractor import EmailExtractor
class TestEmailExtractor(unittest.TestCase):
def setUp(self):
self.extractor = EmailExtractor()
def test_extract_emails(self):
test_text = "You can reach me at [email protected] and [email protected]."
expected_emails = ['[email protected]', '[email protected]']
self.assertEqual(self.extractor.extract(test_text), expected_emails)
def test_no_emails(self):
test_text = "This text has no email addresses."
expected_emails = []
self.assertEqual(self.extractor.extract(test_text), expected_emails)
if __name__ == '__main__':
unittest.main()
Step 4: Creating the setup.py
File
The setup.py
file is essential for packaging and distributing your Python package. Open setup.py
and add the following content:
from setuptools import setup, find_packages
setup(
name='email-extractor',
version='0.1.0',
description='A simple email extraction package',
author='Your Name',
author_email='[email protected]',
packages=find_packages(),
install_requires=[], # Add any dependencies your package needs
classifiers=[
'Programming Language :: Python :: 3',
'License :: OSI Approved :: MIT License',
'Operating System :: OS Independent',
],
python_requires='>=3.6',
)
Step 5: Writing the README File
Open README.md
and write a brief description of your package and how to use it:
# Email Extractor
A simple Python package for extracting email addresses from text.
## Installation
You can install the package using pip:
```bash
pip install email-extractor
Usage
from email_extractor.extractor import EmailExtractor
extractor = EmailExtractor()
emails = extractor.extract("Contact us at [email protected].")
print(emails) # Output: ['[email protected]']
#### Step 6: Running the Tests
Before packaging your code, it's a good idea to run the tests to ensure everything is working as expected. Run the following command:
```bash
python -m unittest discover -s tests
If all tests pass, you’re ready to package your code!
Step 7: Building the Package
To build your package, run:
python setup.py sdist bdist_wheel
This will create a dist
directory containing the .tar.gz
and .whl
files for your package.
Step 8: Publishing Your Package
To publish your package to PyPI, you’ll need an account on PyPI. Once you have an account, install twine
if you haven’t already:
pip install twine
Then, use Twine to upload your package:
twine upload dist/*
Follow the prompts to enter your PyPI credentials.
Conclusion
In this blog, we walked through the process of creating a Python package for email extraction. You learned how to set up the package structure, implement email extraction logic, write unit tests, and publish your package to PyPI.
By packaging your code, you can easily reuse it across different projects and share it with the broader Python community. Happy coding!