Python Forum
web scraping extract particular Div section
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
web scraping extract particular Div section
In my html code I have Div section, and multiple Div sections have the same class name.

<div class="_2GiuhO">Specifications</div>
<div class="_3Rrcbo V39ti-">
<div class="_2RngUh">
<div class="_2lzn0o">General</div>
<table class="_3ENrHu">
<div class="_2RngUh">
<div class="_2lzn0o">Processor And Memory Features</div>
<table class="_3ENrHu">
In above if we see <div class="_2RngUh"> is repeated,

I used beautiful soup soup.find(class_="_2RngUh"), but it always give the first occurence.
but I want to get this occurenace basesd on child name General, Processor And Memory Features how to provide this.
you need:
results = soup.find('div', {'class': '_2RngUh'})
also, place your html in python tags. Even though it's not python, it will maintain indentation.
Thanks for your reply,

results = soup.find('div', {'class': '_2RngUh'})
even this is giving only the first occurrence of class.

But want to fetch 2nd occurrence or 3rd occurrence based on child name(General, Processor And Memory Features)
change find to find_all, and select wanted item
suppose it's the third item:
results = soup.find_all('div', {'class': '_2RngUh'})
desired_result = results[2]
Thank you so much, I can get it based on index.
But can we get index based on its child tag General, Processor And Memory Features...?

<div class="_2GiuhO">Specifications</div>
<div class="_3Rrcbo V39ti-">
<div class="_2RngUh">
<div class="_2lzn0o">General</div>
<table class="_3ENrHu">
<div class="_2RngUh">
<div class="_2lzn0o">Processor And Memory Features</div>
<table class="_3ENrHu">
(May-12-2020, 09:02 AM)AjayBachu Wrote: But can we get index based on its child tag General, Processor And Memory Features...?
from bs4 import BeautifulSoup

html = '''\
<div class="_2GiuhO">Specifications</div>
<div class="_3Rrcbo V39ti-">
<div class="_2RngUh">
<div class="_2lzn0o">General</div>
<table class="_3ENrHu">
<div class="_2RngUh">
<div class="_2lzn0o">Processor And Memory Features</div>
<table class="_3ENrHu">'''

soup = BeautifulSoup(html, 'lxml')
tags = soup.find_all(class_="_2RngUh")
>>> t = tags[1]
>>> t
<div class="_2RngUh">
<div class="_2lzn0o">Processor And Memory Features</div>
<table class="_3ENrHu"></table></div>

>>> t.findChild()
<div class="_2lzn0o">Processor And Memory Features</div>
>>> t.findChild().text
'Processor And Memory Features'
So this is example how you can test stuff out.
There are many function/methods can use dir() to show all.
A good editor or REPL will show you these option in a Autocomplete way.
>>> dir(t)
So would eg find_next() work Think
>>> t.find_next()
<div class="_2lzn0o">Processor And Memory Features</div>
>>> t.find_next().text
'Processor And Memory Features
Larz60+ Wrote:you need:
results = soup.find('div', {'class': '_2RngUh'})
Don't need that @Larz60+,i do not use the dictionary call way anymore.
Because you can just copy class name direct for source code and just add class_ to make it work.
from bs4 import BeautifulSoup

html = '<div class="cities">London</div>'
soup = BeautifulSoup(html, 'lxml')
# Only add _
>>> tag = soup.find(class_="cities")
>>> tag.text

>>> # A dictionary call need more changing of what is organically is and also need a div tag 
>>> tag = soup.find('div', {'class': 'cities'})
>>> tag.text
Thank you so much.. I will use this.

Possibly Related Threads…
Thread Author Replies Views Last Post
  Web scraping read particular section AjayBachu 4 3,167 May-08-2020, 07:33 AM
Last Post: AjayBachu
  how to print out all the link <a> under each h2 section using beautifulsoup HenryJ 2 12,621 Feb-02-2018, 02:55 AM
Last Post: HenryJ
  Monitor a section of a webpage for changes yeto 1 3,215 Dec-05-2017, 08:09 PM
Last Post: nilamo

Forum Jump:

User Panel Messages

Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020