May-06-2020, 01:03 PM
(This post was last modified: May-06-2020, 01:04 PM by warriordazza.)
Hello All,
I am an absolute beginner, below is the script I half wrote, half grabbed bits from multiple online sources.
Essentially getting name, price and dates of sold items on ebay.
I think i've sold the issue of making numeric of thousand numbers (ie: 4,000) but I can't seem to get an answer online about fixing the leap year issue in datetime.strptime, the error being 'ValueError: day is out of range for month'
Another issue is I don't think I'm using numpy right, in the 'pages = np.arange()', sometimes it scrapes info and then repeats, sometimes it scrapes X minus 2 bits of prices/products where I can see on the webpage it is just X (IE i see 69 but I get only 67 or 66). Any help with this section is also needed/appreciated please.
All help appreciated thanks.
I am an absolute beginner, below is the script I half wrote, half grabbed bits from multiple online sources.
Essentially getting name, price and dates of sold items on ebay.
I think i've sold the issue of making numeric of thousand numbers (ie: 4,000) but I can't seem to get an answer online about fixing the leap year issue in datetime.strptime, the error being 'ValueError: day is out of range for month'
Another issue is I don't think I'm using numpy right, in the 'pages = np.arange()', sometimes it scrapes info and then repeats, sometimes it scrapes X minus 2 bits of prices/products where I can see on the webpage it is just X (IE i see 69 but I get only 67 or 66). Any help with this section is also needed/appreciated please.
All help appreciated thanks.
import requests from requests import get from bs4 import BeautifulSoup from urllib.request import urlretrieve from urllib.parse import quote import pandas as pd import numpy as np from datetime import datetime from datetime import date from time import sleep from random import randint #Initialize empty lists where you'll store your data Product_name = [] Price = [] Date_sold = [] Search_name = input("Search for: ") qstr = quote(Search_name) Exclude_terms = input("Exclude these terms (- infront of all): ") qstrr = quote(Exclude_terms) pages = np.arange(1, 1000, 50) for page in pages: page = requests.get("https://www.ebay.com.au/sch/i.html?_from=R40&_nkw=" + qstr + qstrr + "&_sacat=0&LH_TitleDesc=0&_fsrp=1&LH_Complete=1&rt=nc&LH_Sold=1&_pgn=" + str(page)) soup = BeautifulSoup(page.text, 'html.parser') search = soup.find_all('div', class_='s-item__wrapper') sleep(randint(2,10)) for container in search: #Name name = container.h3.text.strip() Product_name.append(name) #Price price = container.find('span', class_='s-item__price').text.strip() if container.find('span', class_='POSITIVE') else '' Price.append(price) #Date Sold sold = container.find('span', class_='s-item__ended-date').text soldd = datetime.strptime(sold, '%b-%d %H:%M') solddd = datetime.strftime(soldd, '%d-%b') Date_sold.append(solddd) #building our Pandas dataframe EBay_Products = pd.DataFrame({ 'Product Name': Product_name, 'Price': Price, 'Sold Day' : Date_sold }) EBay_Products['Price'] = EBay_Products['Price'].map(lambda x: x.lstrip('AU $')) EBay_Products['Price'] = pd.to_numeric(EBay_Products['Price'].str.replace(',',''), errors='coerce') EBay_Products.to_csv(Search_name + " scraped on " + str(date.today()) + '.csv')