![]() |
Problem with searching over Beautiful Soap object - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: Web Scraping & Web Development (https://python-forum.io/forum-13.html) +--- Thread: Problem with searching over Beautiful Soap object (/thread-37230.html) |
RE: Problem with searching over Beautiful Soap object - Pavel_47 - May-28-2022 When using code as it is, the execution fails: Then I suppressed install staff from browser instantiation, i.e. browser = webdriver.Chrome().This way it worked ... but Chrome browser opens. Can it be avoid ? Returning to the blocking issue ... if I understood you correctly, the selenium approach has a kind of blocking immunity ? Another question ... blocking problem aside, does using the BeautifulSoap approach allow us to find the title so easily by searching for "productTitle" ? RE: Problem with searching over Beautiful Soap object - snippsat - May-28-2022 (May-28-2022, 10:20 AM)Pavel_47 Wrote: Then I suppressed install staff from browser instantiation, i.e. browser = webdriver.Chrome().You can not do that,you set --headless (not loading Browser there). The code i posted do not load Browser,it's running headless .(May-28-2022, 10:20 AM)Pavel_47 Wrote: Returning to the blocking issue ... if I understood you correctly, the selenium approach has a kind of blocking immunity ?Selenium automates web browsers,so do that it's act like and is a web browsers then it do net detected as other Scraping tool do. Some site also try to block Selenium, therforew there are stuff like undetected_chromedriver Here an other setup not using Webdriver Manager # amazon_chrome.py from selenium import webdriver from selenium.webdriver.chrome.options import Options from selenium.webdriver.chrome.service import Service from selenium.webdriver.common.by import By import time #--| Setup options = Options() options.add_argument("--headless") options.add_argument("--window-size=1920,1080") options.add_experimental_option('excludeSwitches', ['enable-logging']) ser = Service(r"C:\cmder\bin\chromedriver.exe") browser = webdriver.Chrome(service=ser, options=options) #--| Parse or automation url = "https://www.amazon.com/Advanced-Artificial-Intelligence-Robo-Justice-Georgios-ebook/dp/B0B1H2MZKX/ref=sr_1_1?keywords=9783030982058&qid=1653563461&sr=8-1" browser.get(url) title = browser.find_element(By.CSS_SELECTOR, '#productTitle') print(title.text)Running this only get title back,it do not load Browser.
(May-28-2022, 10:20 AM)Pavel_47 Wrote: Another question ... blocking problem aside, does using the BeautifulSoap approach allow us to find the title so easily by searching for "productTitle" ?Not as long get detected and blocked by Amazon. You should also check what Rules Amazon has for web-scraping. Quote:Pretty much any e-commerce website tries RE: Problem with searching over Beautiful Soap object - Pavel_47 - May-28-2022 Thanks. Still have error: the keyword "service" isn't recognized. Here is where it is happens:ser = Service('/usr/bin/chromedriver') browser = webdriver.Chrome(service=ser, options=options) RE: Problem with searching over Beautiful Soap object - Pavel_47 - May-28-2022 Well ... it works this way: browser = webdriver.Chrome('/usr/bin/chromedriver', options=options) RE: Problem with searching over Beautiful Soap object - snippsat - May-28-2022 You most upgrade your Selenium install. pip install selenium --upgradeTest with show to see that is Version: 4.2.0.λ pip show selenium Name: selenium Version: 4.2.0 Summary: Home-page: https://www.selenium.dev Author: Author-email: License: Apache 2.0 Location: c:\python310\lib\site-packages Requires: trio, trio-websocket, urllib3 Required-by: RE: Problem with searching over Beautiful Soap object - Pavel_47 - May-28-2022 (May-28-2022, 12:39 PM)snippsat Wrote: You most upgrade your Selenium install. Indeed I have 3.141.0. BTW I threw out 2 options: window-size and excludeSwitches. The first, I think is useless because I don't use the browser visually, the second - what is it for? RE: Problem with searching over Beautiful Soap object - Pavel_47 - May-28-2022 Cannot upgrade selenium:
RE: Problem with searching over Beautiful Soap object - Pavel_47 - May-28-2022 I've also tried to find Publisher (i.e. Springer) using By.NAME method. Not only did find_element fail to find the publisher, but also threw an exception. RE: Problem with searching over Beautiful Soap object - snippsat - May-28-2022 Use. pip install --user selenium --upgradeCould use not recommend. sudo pip install selenium --upgradeOr use virtual environment(it's build into Python) Python 3.6 start to get old now and many packages start to drop support for it soon or have already done it. NumPy is used underlaying stuff in lot of packages. NumPy Doc Wrote:The Python versions supported in this release are 3.8-3.10, Python 3.7 has been dropped. RE: Problem with searching over Beautiful Soap object - Pavel_47 - May-28-2022 I've just installed Python 3.10. Trying to upgrade selenium gets this:
|