Creating a Command-Line Email Extractor in Ruby
Email extraction is a crucial task in various domains like marketing, data collection, and web scraping. In this blog, we will walk you through the process of building a command-line email extractor using Ruby. With its simplicity and flexibility, Ruby is a fantastic choice for developing such tools.
Why Use Ruby for Email Extraction?
Ruby is a dynamic, object-oriented programming language known for its readability and ease of use. It’s great for scripting and automating tasks, making it a perfect fit for building a command-line email extractor. The goal is to build a tool that reads a text file, scans its content for email addresses, and outputs the results.
Prerequisites
Before you start, ensure you have the following:
- Ruby installed on your machine (version 2.5 or higher)
- Basic understanding of Ruby and regular expressions
You can check your Ruby version using:
ruby -v
If Ruby isn’t installed, you can download it from Ruby’s official site.
Step 1: Setting Up the Project
Let’s begin by creating a new Ruby file for our email extractor:
touch email_extractor.rb
Open this file in your favorite text editor, and let’s start coding.
Step 2: Reading the Input File
First, we need to handle reading a text file provided by the user. You can use Ruby’s File
class to read the content:
# email_extractor.rb
filename = ARGV[0]
if filename.nil?
puts "Please provide a file name as an argument."
exit
end
begin
file_content = File.read(filename)
rescue Errno::ENOENT
puts "File not found: #{filename}"
exit
end
This code will read the filename from the command-line arguments and handle file reading errors gracefully.
Step 3: Using Regular Expressions to Find Emails
Emails follow a standard format, and regular expressions (regex) are perfect for identifying patterns in text. We’ll use a basic regex to find email addresses:
# Basic email regex
email_regex = /\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b/
# Extract emails from the content
emails = file_content.scan(email_regex)
# Display the extracted emails
if emails.empty?
puts "No emails found in the file."
else
puts "Extracted Emails:"
puts emails.uniq
end
# Basic email regex
email_regex = /\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b/
# Extract emails from the content
emails = file_content.scan(email_regex)
# Display the extracted emails
if emails.empty?
puts "No emails found in the file."
else
puts "Extracted Emails:"
puts emails.uniq
end
Here, we use the scan
method to search the content for all matches of the email_regex
. We also ensure that only unique email addresses are displayed.
Step 4: Enhancing the Email Extractor
While the basic extractor works, it can be improved to handle different edge cases. For example, we can allow input from a URL, sanitize the extracted emails, or even write the output to a new file.
Let’s add an option to save the extracted emails to a file:
# Save emails to a file if the user provides an output filename
output_file = ARGV[1]
if output_file
File.open(output_file, "w") do |file|
emails.uniq.each { |email| file.puts email }
end
puts "Emails saved to #{output_file}"
else
puts emails.uniq
end
Now the user can specify an output file, like so:
ruby email_extractor.rb input.txt output_emails.txt
Step 5: Testing the Command-Line Email Extractor
To test your script, create a sample text file, input.txt
, containing email addresses:
Run your script from the command line:
ruby email_extractor.rb input.txt
You should see the valid email addresses extracted from the file. If an output file is provided, the emails will also be saved there.
Conclusion
In this blog, we have built a simple yet powerful command-line email extractor using Ruby. This tool can be extended in various ways, such as integrating web scraping functionality or applying more complex regex for different email formats. With Ruby’s flexibility, the possibilities are endless!