Read input file and print hyperlinks - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: Web Scraping & Web Development (https://python-forum.io/forum-13.html) +--- Thread: Read input file and print hyperlinks (/thread-602.html) |
Read input file and print hyperlinks - Emmanouil - Oct-22-2016 Hello everybody, sorry for my last post it does not show the picture, Edit admin: No problem,just find "Insert Python tag" button. I am new in python and i am trying to make a program that prompts for an input file, reads it and prints all the lines that contain hyperlinks and the text that follows the hyperlink. For example if the file contains the link : "<a href="http://python-forum.io/search.php?action=unreads">Unread Posts</a>"The output print should be:
RE: Read input file and print hyperlinks - snippsat - Oct-22-2016 You can take a look at my tutorial here Web-Scraping part-1. RE: Read input file and print hyperlinks - Emmanouil - Oct-23-2016 Thank you for the reply, just i have difficulty to make it work for files that are stored in my computer. RE: Read input file and print hyperlinks - sparkz_alot - Oct-23-2016 What have you tried so far? Please post the code you've written. I would suggest starting with a small file, perhaps 3 or 4 lines. To make it easy, make sure the file is in the same location as your script. Your script should start off simple as well, open the file, read a line, write it to the screen, go back read the next line, write it to the screen, and so on. Once you do that and it runs without errors, start refining your script. RE: Read input file and print hyperlinks - snippsat - Oct-23-2016 Here a example with line you have post. from bs4 import BeautifulSoup with open('html_from_disk.txt') as f: html = f.read() soup = BeautifulSoup(html, 'html.parser') text = soup.find('a').text link = soup.find('a') print(text) #--> Unread Posts print(link.get('href')) #--> http:/python-forum.io/search.php?action=unreads RE: Read input file and print hyperlinks - Emmanouil - Oct-23-2016 Hello and thank you for the precious help, with this code I managed to print all hyperlinks in separate lines , but still I can't find how to print also the text that follows every hyperlink. Could I add to the above code a prompt for the user to give me the input file? I tried to add this: test=raw_input('Enter a filename: ') with open('test') as f:but it does not work. RE: Read input file and print hyperlinks - snippsat - Oct-23-2016 You can not have quotes around 'test', then is just a string test. Here with a better variable name. file_name = raw_input('Enter a filename: ') with open(file_name) as f: RE: Read input file and print hyperlinks - Emmanouil - Oct-23-2016 I managed to make it work with this code from bs4 import BeautifulSoup file = raw_input('Type file path: ') with open(file) as f: html = f.read() soup = BeautifulSoup(html, 'html.parser') for link in soup.find_all('a'): print(link.get('href')) print(link.get_text())but I still get links that I do not wont, like the links from img tags, is there any way to exclude them from print? Quote:<img src="http://www.ekdd.gr/ekdda/custom/seminars/bullet_green.png"><a>test1</a></div>from the above I get None test1 None test2 RE: Read input file and print hyperlinks - snippsat - Oct-23-2016 You most learn to not use quote tag on code, i have fixed all you post. In editor there there is "Insert python tag" to right of "Insert quote" button. This is wrong: print(link.get_text()) # Shall be print(link.text) Quote:but I still get links that I do not wont, like the links from img tags, is there any way to exclude them from print? for link in soup.find_all('a'): if 'img' not in link: print(link.get('href')) print(link.text) |