Python Forum
Thread Rating:
  • 1 Vote(s) - 2 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Email extraction from websites
#13
(Aug-18-2017, 11:34 AM)DeaD_EyE Wrote: You can use regex: http://emailregex.com/

But if you want to open a regular Excel file, you've formatting and maybe binary data inside. The newer format is based on xml, the older Excel format is something else. You can use the hammer method and parse the whole file for e-mail addresses.

import re


email_regex = r"([a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+)"

with open('youfile.xls') as fd:
    emails = set(re.findall(email_regex, fd.read()))

print(emails)
Maybe you have luck and the text is encoded as UTF-8.
But it's better to use a Library to open Excel files or you export the Excel sheet as CSV and use the stdlib of Python for this task.

Thank you for the reply
I do not want to extract emails from a file. The excel file contains urls in a column and I was hoping that is possible to exctract all the emails in the site beside each cell.
I wrote excel because I use it frequently .... what I'm interested of all emails extracted from the links.
Reply


Messages In This Thread
Email extraction from websites - by stefanoste78 - Aug-13-2017, 12:54 PM
RE: Email extraction from websites - by nilamo - Aug-13-2017, 05:07 PM
RE: Email extraction from websites - by nilamo - Aug-17-2017, 01:41 PM
RE: Email extraction from websites - by nilamo - Aug-17-2017, 06:32 PM
RE: Email extraction from websites - by nilamo - Aug-17-2017, 07:00 PM
RE: Email extraction from websites - by wavic - Aug-18-2017, 09:32 AM
RE: Email extraction from websites - by DeaD_EyE - Aug-18-2017, 11:34 AM
RE: Email extraction from websites - by stefanoste78 - Aug-18-2017, 07:46 PM
RE: Email extraction from websites - by snippsat - Aug-18-2017, 08:55 PM

Possibly Related Threads…
Thread Author Replies Views Last Post
  Webscrapping sport betting websites KoinKoin 3 5,631 Nov-08-2023, 03:00 PM
Last Post: LoriBrown
  Web Scraping Sportsbook Websites Khuber79 17 338,733 Mar-17-2021, 12:06 AM
Last Post: Whitesox1
Thumbs Up Issue facing while scraping the data from different websites in single script. Balamani 1 2,176 Oct-20-2020, 09:56 AM
Last Post: Larz60+
  Django send email - email form Remek953 2 2,352 Sep-18-2020, 07:07 AM
Last Post: Remek953
  Python Scrapy Date Extraction Issue tr8585 1 3,415 Aug-05-2020, 04:32 AM
Last Post: tr8585
  Can urlopen be blocked by websites? peterjv26 2 3,484 Jul-26-2020, 06:45 PM
Last Post: peterjv26
  Article Extraction - Wordpress svzekio 7 5,421 Jul-10-2020, 10:18 PM
Last Post: steve_shambles
  Python program to write into websites for you pythonDEV333 3 2,598 Jun-08-2020, 12:06 PM
Last Post: pythonDEV333
  Follow Up: Web Calendar based Extraction AgileAVS 0 1,546 Feb-23-2020, 05:39 AM
Last Post: AgileAVS
  Scraping Websites to post on Telegram kobryan 1 2,723 Oct-19-2019, 07:03 AM
Last Post: metulburr

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020