Python Forum
Fast way of inspecting web pages for paywalls - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: Web Scraping & Web Development (https://python-forum.io/forum-13.html)
+--- Thread: Fast way of inspecting web pages for paywalls (/thread-35392.html)



Fast way of inspecting web pages for paywalls - aadriyu - Oct-27-2021

It might seem a silly question but I am new to programming.
I have a long list of urls and I need to see whether the articles on those webpages are free or they are protected by a paywall.
If I acces the source page I can find the string "paywall__content" so I wrote this script: the only problem is that it takes too much time to analyze all the information I have.. is there a faster way to perform the same operation? Thank you!


from lxml import html
import requests

import csv

with open('___.csv') as csvfile:
    file = csv.reader(csvfile, delimiter='\t')
    next(file, None)
    for row in file:
        url = row[14]
        response = requests.get(url)
        response = requests.get(url)
        byte_data = response.content
        if byte_data.decode().find("paywall__content") != -1:
            print ("yes")
        else:
            print ("no")