My first Python scraping script not working... - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: Web Scraping & Web Development (https://python-forum.io/forum-13.html) +--- Thread: My first Python scraping script not working... (/thread-8363.html) Pages:
1
2
|
My first Python scraping script not working... - MattH - Feb-17-2018 I'm learning Python and have a huge interest in bots and scraping. I made the code below to extract the h1 text from a web page, but an error comes up when running it in the shell saying "No module named urllib2" Most of this code is from the internet... what do I do to make urllib2 found? Here is my python code: import urllib2 from bs4 import BeautifulSoup quote_page = 'https://amazon.com' page = urllib2.urlopen(quote_page) soup = BeautifulSoup(page, 'html.parser') name_box = soup.find('h1', attrs{class': 'name'}) name = name_box.text.strip() print(name)Does anybody know the issue? (sorry if I'm being stupid here). bs4 cannot also be found... Any clue? RE: My first Python scraping script not working... - metulburr - Feb-17-2018 Urllib2 is python2.x Use requests instead pip install requests beautifulsoup4And check our tutorials section for web scraping https://python-forum.io/Thread-Web-Scraping-part-1 RE: My first Python scraping script not working... - MattH - Feb-17-2018 - I installed in terminal "pip install requests beautifulsoup4" and changed the code in my file to: import urllib.request from bs4 import BeautifulSoup quote_page = 'https://amazon.com' page = urllib.request.urlopen(quote_page) soup = BeautifulSoup(page, 'html.parser') name_box = soup.find('h1', attrs{class': 'name'}) name = name_box.text.strip() print(name)It still says bs4 module cannot be found. Don't get what I'm getting wrong here... RE: My first Python scraping script not working... - wavic - Feb-17-2018 I don't know where you've got the code from but it's not going to work. Instead of copy/paste way to "writing" programs, learn so you can do it by yourself. import urllib.request from bs4 import BeautifulSoup quote_page = 'https://amazon.com' # urllib.request.urlopen returns an object and you have to use read() method to get the content, the web page page = urllib.request.urlopen(quote_page).read() soup = BeautifulSoup(page, 'html.parser') # you can replace this with: name_box = soup.find('h1', _class='name') name_box = soup.find('h1', attrs{'class': 'name'}) # missing quote ('class'). could be a typo name = name_box.text.strip() print(name)I can't tell anything about the missing ms4 module. Windows? RE: My first Python scraping script not working... - buran - Feb-17-2018 what python version do you use? did you install python3 alongside the pre-installed python2, so having two python installations? https://docs.python.org/3/using/mac.html#getting-and-installing-macpython i guess you try to run this code with python3 but installed requests and bs4 for the py3 installation RE: My first Python scraping script not working... - MattH - Feb-17-2018 Thanks guys -with your help, I managed to figure out the "modules don't exist" issues. Edit: The Python I downloaded was straight from Python.org, the version was 3.6.4 - I also changed my code to @Wavics sample code (I got my code from an article on how to make a simple scraper) I have been learning Python for a solid two days now - I'm really enjoying it, just I wanted to make something which gave me satisfaction to keep me going for the main prize; which is to be able to make beautiful softwares in the future. Anyway, enough ranting from me... Pythons shell finally let me run the script... but now it errors with this: Any idea? Thank you for your help guys. RE: My first Python scraping script not working... - buran - Feb-17-2018 please, don't post images. copy/paste full traceback in error tags. also, post the latest version of the code that produce the error, in code tags RE: My first Python scraping script not working... - snippsat - Feb-17-2018 Do not use urllib always Requests and amazon.com is a difficult site to start with.To check that all work. from bs4 import BeautifulSoup import requests url = 'https://www.python.org/' url_get = requests.get(url) soup = BeautifulSoup(url_get.content, 'lxml') print(soup.select('head > title')[0].text) If change to url = 'https://amazon.com' So as mention amazon is a difficult site to start with,switching to Selenium may for sure be needed(Amazon use a lot of JavaScript). That may get pass Robot Check or not. RE: My first Python scraping script not working... - metulburr - Feb-17-2018 Try import requests requests.packages.urllib3.disable_warnings() import ssl try: _create_unverified_https_context = ssl._create_unverified_context except AttributeError: # Legacy Python that doesn't verify HTTPS certificates by default pass else: # Handle target environment that doesn't support HTTPS verification ssl._create_default_https_context = _create_unverified_https_contextBut i would use requests module instead of urllib.request otherwise you are just asking to complicate your code drastically. Not many people use the standard libraries to make bots so your going to get errors that we havent seen in years by going agaisnt the grain of the majority. There is a reason why one made the requests library as it simplifies and automates the boiler plate code RE: My first Python scraping script not working... - MattH - Feb-17-2018 Thanks again for your replies guys. My current code (took the latest sample): Python Code: (Double-click to select all) from bs4 import BeautifulSoup import requests url = 'https://www.python.org/' url_get = requests.get(url) soup = BeautifulSoup(url_get.content, 'lxml') print(soup.select('head > title')[0].text)The current error: I tried to be independent and find the reason for the error, but I've had no luck. I WILL GET THERE EVENTUALLY, lol... Thanks again and sorry for all the questions. Cannot wait until I can troubleshoot effectively myself.
|