Python Forum
Python requests.get() returns broken source code instead of expected source code?
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Python requests.get() returns broken source code instead of expected source code?
#1
Made a request on the above Wikipedia page. Specifically I need to scrape "results matrix" from https://en.wikipedia.org/wiki/2017%E2%80...ga#Results

selectedSeasonPage = requests.get('https://en.wikipedia.org/wiki/2017–18_La_Liga', features='html5lib')
Doing pprint.pprint(selectedSeasonPage.text) and jumping to source code of matrix, it can be seen it's incomplete.

Snippet of HTML returned by requests.get():

<table class="wikitable plainrowheaders" style="text-align:center;font-size:100%;">
.
.
<th scope="row" style="text-align:right;"><a href="/wiki/Deportivo_Alav%C3%A9s" title="Deportivo Alavés">Alavés</a></th>
<td style="font-weight: normal;background-color:transparent;">— </td>
<td style="white-space:nowrap;font-weight: normal;background-color:transparent;"></td>
<td style="white-space:nowrap;font-weight: normal;background-color:transparent;"></td>
<td style="white-space:nowrap;font-weight: normal;background-color:transparent;"></td>
<td style="white-space:nowrap;font-weight: normal;background-color:transparent;"></td>
<td style="white-space:nowrap;font-weight: normal;background-color:transparent;"></td>
<td style="white-space:nowrap;font-weight: normal;background-color:#BBF3FF;">2–1</td>
HTML returned by requests.get() viewed through browser and as expected its not complete. Can check this image for reference.

Snippet from view-source and the output needed:

<table class="wikitable plainrowheaders" style="text-align:center;font-size:100%;">
.
.
<a href="/wiki/Deportivo_Alav%C3%A9s" title="Deportivo Alavés">Alavés</a></th>
<td style="font-weight: normal;background-color:transparent;">—</td>
<td style="white-space:nowrap;font-weight: normal;background-color:#BBF3FF;">3–1</td>
<td style="white-space:nowrap;font-weight: normal;background-color:#FFBBBB;">0–1</td>
<td style="white-space:nowrap;font-weight: normal;background-color:#FFBBBB;">0–2</td>
<td style="white-space:nowrap;font-weight: normal;background-color:#BBF3FF;">2–1</td>
<td style="white-space:nowrap;font-weight: normal;background-color:#BBF3FF;">1–0</td>
<td style="white-space:nowrap;font-weight: normal;background-color:#FFBBBB;">1–2</td>
Posting a sample HTML for reference since posting entire output is not possible. Can post more specific parts if required.

My question is how to get entire source of matrix without resulting in loss of values?

From what I understand going through previous questions, requests fails to return expected output if some part of page is rendered by JavaScript. But this page seems to be simple HTML and CSS (at least the part that is required). Cannot use Selenium since I need to scrape multiple pages. Would be grateful for solution using requests or something equivalent.

Requests version is 2.19.1. Python version is 3.7.0.

Is anything missing? I am new to this stuff, any help appreciated.

Cross posting from:
https://stackoverflow.com/questions/5242...ource-code
Reply


Messages In This Thread
Python requests.get() returns broken source code instead of expected source code? - by FatalPythonError - Sep-20-2018, 06:05 PM

Possibly Related Threads…
Thread Author Replies Views Last Post
  Populating list items to html code and create individualized html code files ChainyDaisy 0 1,640 Sep-21-2022, 07:18 PM
Last Post: ChainyDaisy
  can you please help me with this python code MetsxxFan01 2 2,232 Apr-27-2022, 10:44 PM
Last Post: MetsxxFan01
  POST requests - different requests return the same response Default_001 3 2,054 Mar-10-2022, 11:26 PM
Last Post: Default_001
  Hide source code from python process itself xmghe 2 1,941 Jan-27-2021, 04:04 PM
Last Post: xmghe
  Scraping Whole Page Source GJG 1 2,202 Jan-13-2021, 03:19 PM
Last Post: GJG
  Code example for entering input in a textbox with requests/selenium object peterjv26 1 1,798 Sep-26-2020, 04:34 PM
Last Post: Larz60+
  Problem with logging in on website - python w/ requests GoldeNx 6 5,479 Sep-25-2020, 10:52 AM
Last Post: snippsat
  Optimizing Or Better Python COde samlee916 1 1,855 Jul-13-2020, 03:00 PM
Last Post: Gribouillis
  How to perform a successful login(signin) through Requests in Python Kalet 1 2,426 Apr-24-2020, 01:44 AM
Last Post: Larz60+
  scraping from a website that hides source code PIWI_Protein 1 2,029 Mar-27-2020, 05:08 PM
Last Post: Larz60+

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020