How to use BeautifulSoup4 with pandas series type of html data? - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: Web Scraping & Web Development (https://python-forum.io/forum-13.html) +--- Thread: How to use BeautifulSoup4 with pandas series type of html data? (/thread-9701.html) |
How to use BeautifulSoup4 with pandas series type of html data? - PrateekG - Apr-24-2018 Hi All, I have some html data in the form of pandas Series. For example I am storing this data in a variable-html_series Now when I try to apply BeautifulSoup here as - soup = BeautifulSoup(html_series, "html.parser") print(soup.prettify())I am getting below error- ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all(). Can you please tell me what I am missing here? Thanks! RE: How to use BeautifulSoup4 with pandas series type of html data? - buran - Apr-25-2018 what exactly is html data in the form of pandas Series ? This sounds non-sense to me
RE: How to use BeautifulSoup4 with pandas series type of html data? - PrateekG - Apr-25-2018 Let me explain- 1. Using shoppify api I fetched the json from an ecommerce site. 2. I normalized this json data into dataframe by- df = json_normalize(result)3. From this dataframe I take out the html content by- html_data = df['body_html']4. Now when I use below code I got the error- soup = BeautifulSoup(html_data, "html.parser") print(soup.prettify())Hope I mentioned everything here. RE: How to use BeautifulSoup4 with pandas series type of html data? - snippsat - Apr-25-2018 It look like you try to put json data into BeautifulSoup. What is the contented html_data ?For it to work it's has to be html. from bs4 import BeautifulSoup html_data = '''\ <!DOCTYPE html> <html> <head> <title>Title of document</title> </head> <body> <p>Content of the document</p> </body> </html''' soup = BeautifulSoup(html_data, 'lxml') print(soup.select('head > title')[0].text)
RE: How to use BeautifulSoup4 with pandas series type of html data? - PrateekG - Apr-26-2018 I have resolved the above error it was due to dataframe normalization. Now I have raise another ticket with below url- https://python-forum.io/Thread-How-to-clean-html-content-using-BeautifulSoup-in-Python-3-6 Please see once and let me know if you can help. Thanks! |