![]() |
Problem with searching over Beautiful Soap object - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: Web Scraping & Web Development (https://python-forum.io/forum-13.html) +--- Thread: Problem with searching over Beautiful Soap object (/thread-37230.html) |
RE: Problem with searching over Beautiful Soap object - Pavel_47 - May-28-2022 Well ... if I stay at the current version of Selenium, how to find Publisher ? I've just tried publisher = browser.find_element_by_name('Publisher')Search failed and threw exception. RE: Problem with searching over Beautiful Soap object - Pavel_47 - May-28-2022 Well, this instruction do the job: publisher = browser.find_elements_by_xpath("//*[contains(text(), 'Publisher')]")But the real value of Publisher (i.e. Springer) is the next field. How to advance to the next field ? RE: Problem with searching over Beautiful Soap object - snippsat - May-28-2022 This is the old way browser.find_elements_by_xpath (Deprecated) when use Selenium 4 is like this.from selenium import webdriver from selenium.webdriver.chrome.options import Options from selenium.webdriver.chrome.service import Service from selenium.webdriver.common.by import By import time #--| Setup options = Options() options.add_argument("--headless") options.add_argument("--window-size=1920,1080") options.add_experimental_option('excludeSwitches', ['enable-logging']) ser = Service(r"C:\cmder\bin\chromedriver.exe") browser = webdriver.Chrome(service=ser, options=options) #--| Parse or automation url = "https://www.amazon.com/Advanced-Artificial-Intelligence-Robo-Justice-Georgios-ebook/dp/B0B1H2MZKX/ref=sr_1_1?keywords=9783030982058&qid=1653563461&sr=8-1" browser.get(url) title = browser.find_element(By.CSS_SELECTOR, '#productTitle') print(title.text) # with CSS selector publisher = browser.find_element(By.CSS_SELECTOR, '#detailBullets_feature_div > ul > li:nth-child(2) > span > span:nth-child(2)') print(publisher.text) # With XPath publisher1 = browser.find_element(By.XPATH, '//*[@id="detailBullets_feature_div"]/ul/li[2]/span/span[2]') print(publisher1.text)
RE: Problem with searching over Beautiful Soap object - Pavel_47 - May-28-2022 (May-28-2022, 02:15 PM)snippsat Wrote: This is the old wayOk, it works. But this method relies on the layout of this book. With another book, the layout may be slightly different. I think a safer method is to find the tag containing "Publisher", then move to the next tag at the same level of hierarchy, and finally extract the text from that tag. Can selenium provide such methods. If I remember correctly, BeautifulSoup provides a navigation functions over neighboring tags. RE: Problem with searching over Beautiful Soap object - snippsat - May-28-2022 (May-28-2022, 02:30 PM)Pavel_47 Wrote: I think a safer method is to find the tag containing "Publisher", then move to the next tag at the same level of hierarchy, and finally extract the text from that tag.Find the tag that hold all Product details list. publisher = browser.find_element(By.CSS_SELECTOR, '#detailBulletsWrapper_feature_div') print(publisher.text) Get singe element would be.>>> p = publisher.find_elements_by_css_selector('li:nth-child(1) > span > span:nth-child(2)') >>> p [<selenium.webdriver.remote.webelement.WebElement (session="26ba57aa713155834023884ce6f18ab7", element="43c80e5e-eee0-49d8-94d4-fd69305b17ec")>] >>> p[0].text 'No Starch Press; 2nd edition (May 3, 2019)' >>> p = publisher.find_elements_by_css_selector('li:nth-child(5) > span > span:nth-child(2)') >>> p[0].text '978-1593279288' Quote:If I remember correctly, BeautifulSoup provides a navigation functions over neighboring tags.Can use BS with Selenium,eg in post. RE: Problem with searching over Beautiful Soap object - Pavel_47 - May-29-2022 Product details - Ok. Works fine. Exploring this fragment we can extract Publisher and date. I trued also CSS_SELECTOR for finding Author (please see screenshot below) ![]() This method doesn't work for Author. I tried using find_element_by_class_name. Doesn't work either. RE: Problem with searching over Beautiful Soap object - snippsat - May-29-2022 Do you know that you can copy Css selector or XPath when over tag in inspect? This is copy of Css selector '#bylineInfo > span' title = browser.find_element(By.CSS_SELECTOR, '#productTitle') print(title.text) publisher = browser.find_element(By.CSS_SELECTOR, '#bylineInfo > span') print(publisher.text)
RE: Problem with searching over Beautiful Soap object - Pavel_47 - May-29-2022 Not sure that I understood how it works ... I mean using '>' symbol. Searching for Reviews section of this book: https://www.amazon.com/Discovering-Modern-Depth-Peter-Gottschling/dp/0136677649/ I tried a more classic approach: first find the section concerned by unique ID, then search in this ID section for the information to extract using the class name (the class name gives what I want to extract - the string "3.6 by 5") Here is snippet I used for that: reviews_section = browser.find_element_by_id('acrPopover') score = reviews_section.find_elements_by_class_name('a-icon-alt') print(score[0].text)Unfortunately the print output is empty. Here is screenshot of the concerned fragment of book page with outlined "centers of interest": ![]() RE: Problem with searching over Beautiful Soap object - snippsat - May-30-2022 I mean like this click on ... or in some cases right click works.Then is easier as you get correct selector or XPath for chosen tag. ![]() RE: Problem with searching over Beautiful Soap object - Pavel_47 - Jun-30-2022 Tried with css_selector and class name: nothing in print output from selenium import webdriver from selenium.webdriver.chrome.options import Options options = Options() options.add_argument("--headless") browser = webdriver.Chrome('/usr/bin/chromedriver', options=options) url = 'https://www.amazon.com/Discovering-Modern-Depth-Peter-Gottschling/dp/0136677649/' browser.get(url) reviews1 = browser.find_element_by_css_selector('span.a-icon-alt') reviews2 = browser.find_element_by_class_name('a-icon-alt') print(reviews1.text) print(reviews2.text) |