Home//

Python Extract Emails from Text

Python Extract Emails from Text

Minh Vu

By Minh Vu

Updated Nov 18, 2023

Extracting emails from a text is a common task in Python, especially when you are cleaning the data, or building a list of emails based on a text document.

In this tutorial, I will show you how to extract emails from text in Python using regular expression (RegEx).

Extracting Emails from Text using RegEx

To extract emails from text in Python using RegEx, we'll use the re module, which provides support for regular expressions. Here's a simple example:

extract_emails.py
import re def extract_emails(text): email_regex = r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b' return re.findall(email_regex, text) sample_text = "Please contact me at dminhvu.work@gmail.com or wisecode@gmail.com." emails = extract_emails(sample_text) print(emails)

This code snippet defines a function extract_emails that:

  • input is a string text,
  • output is a list of email addresses extracted from that string based on the RegEx pattern \b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b.

You can run this by using the python extract_emails.py command in the Terminal (or Command Prompt on Windows).

To understand why we can construct this email_regex pattern, you can learn more here, it has the explanation section in the right hand side.

Extracting Emails from a Text File

To extract emails from a text file, we'll read the file's content into a string and then use the same extract_emails function defined earlier.

extract_emails_from_file.py
# ...def extract_emails_from_file(file_path): with open(file_path, 'r') as file: content = file.read() return extract_emails(content) file_path = 'example.txt' emails = extract_emails_from_file(file_path) print(emails)

In this example, extract_emails_from_file reads the entire content of the file located at file_path and then uses the extract_emails function to find all email addresses.

The full code will be:

extract_emails_from_file.py
import re def extract_emails(text): email_regex = r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b' return re.findall(email_regex, text) def extract_emails_from_file(file_path): with open(file_path, 'r') as file: content = file.read() return extract_emails(content) file_path = 'example.txt' emails = extract_emails_from_file(file_path) print(emails)

Handling Large Text Files

When dealing with large text files, reading the entire file into memory might not be feasible. In such cases, we can process the file line by line:

extract_emails_from_large_file.py
# ...def extract_emails_from_large_file(file_path): emails = [] with open(file_path, 'r') as file: for line in file: emails.extend(extract_emails(line)) return emails file_path = 'large_example.txt' emails = extract_emails_from_large_file(file_path) print(emails)

This function iterates over each line in the file, extracts emails from that line, and appends them to the emails list. This approach is more memory-efficient for large files.

Conclusion

In this tutorial, we've learned how to extract email addresses from strings and text files using Python.

In general, to extract email addresses from strings in Python:

  • Use the re package to extract emails,
  • with the \b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b RegEx pattern to math the email pattern.

If you find the pattern does not work, please comment below so I will fix it. Thank you!

You can search for other posts at home page.
Minh Vu

Minh Vu

Software Engineer

Hi guys, I'm the author of WiseCode Blog. I mainly work with the Elastic Stack and build AI & Python projects. I also love writing technical articles, hope you guys have good experience reading my blog!