Fast way of inspecting web pages for paywalls - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: Web Scraping & Web Development (https://python-forum.io/forum-13.html) +--- Thread: Fast way of inspecting web pages for paywalls (/thread-35392.html) |
Fast way of inspecting web pages for paywalls - aadriyu - Oct-27-2021 It might seem a silly question but I am new to programming. I have a long list of urls and I need to see whether the articles on those webpages are free or they are protected by a paywall. If I acces the source page I can find the string "paywall__content" so I wrote this script: the only problem is that it takes too much time to analyze all the information I have.. is there a faster way to perform the same operation? Thank you! from lxml import html import requests import csv with open('___.csv') as csvfile: file = csv.reader(csvfile, delimiter='\t') next(file, None) for row in file: url = row[14] response = requests.get(url) response = requests.get(url) byte_data = response.content if byte_data.decode().find("paywall__content") != -1: print ("yes") else: print ("no") |