Async for making requests - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: Web Scraping & Web Development (https://python-forum.io/forum-13.html) +--- Thread: Async for making requests (/thread-40435.html) |
Async for making requests - DangDuong - Jul-27-2023 I'm currently making an application to get definition of datapoints. There are three files involved. UniverseService.py get all the datapoints related to a datapoint's name. get_definition.py will get a definition of a datapoint. DataPointService will be the endpoint where it gets all the definition of those datapoints all at once. Because some datapoint might have 40-50 related data points which equal that amount of api calls. I would like to do async on the calls but I'm not sure how. I've been trying several ways with asyncio but it still doesn't work. The end goal is to import DataPointService as ds in a jupyter notebook and just call a function. This is DataPointService async def get_related_datapoints_definition(universe, name, quantity=None): universeService = UniverseService(universe) # Asynchronously find similar data points using run_in_executor loop = asyncio.get_event_loop() data_point_list = await loop.run_in_executor(None, universeService.find_similar_datapoint, name, quantity) # Asynchronously get universe datapoints name-ID pairs using run_in_executor pairs = await loop.run_in_executor(None, universeService.get_universe_datapoints_nameID_pairs) # Use asyncio.gather to fetch the definitions concurrently tasks = [get_datapoint_definition_async(universeService, pairs[data_point][0]) for data_point in data_point_list] result = await asyncio.gather(*tasks) return result async def get_datapoint_definition_async(universeService, datapoint_id): # Asynchronously get the definition for the given data point ID definition = await universeService.get_datapoint_definition_by_id(datapoint_id) return definitionThis is get_definition.py def get_definition_url(datapoint_id,): datapoint_definition=requests.get( "abc", headers={ 'Authorization': f'Bearer {abc}', 'X-API-ProductId': "abc" }, verify=False ) url = datapoint_definition.json() if type(url) is list: # if there is a definition, the data will be return in list return url[0]["abc"] else: # return in dictionary like {"message":"no data point defition"} return -1 def get_definition(url): if url == -1: return "There is no definition for this Data Point" definition = requests.get(url) soup = BeautifulSoup(definition.content,'html.parser') #Parse html data try: # Attempt to find <p> element p_element = soup.find('p') if p_element: # Case 1: Only <p> tag is found # Check if the <p> element contains a <span> element target_element = p_element.find('span') if target_element: target_element = p_element.find('span') else: target_element = p_element else: # Case 3: Only <body> tag is found target_element = soup.find('body') pass # Do something with target_element except AttributeError: # Handle any other exceptions or errors here target_element = None # or any other default value you want to assign """s Split text since the first word and the second always separate by a new line """ definition_text = target_element.get_text(strip=True) print(definition_text) # definition_split = definition_text.split(" ",1) # first_word = " ".join(definition_split[0].split("\r\n")) # parsed_definition = first_word + " " + definition_split[1] return definition_textthis is UniverseService async def get_datapoint_definition_by_id(self, id): url = await get_definition_url(id) definition = await get_definition(url) return definition RE: Async for making requests - deborahlockwood - Aug-07-2023 To implement asynchronous calls in your code using asyncio, you'll need to make a few modifications. Here's an updated version of your code with explanations: In DataPointService.py: import asyncio async def get_related_datapoints_definition(universe, name, quantity=None): universeService = UniverseService(universe) # Asynchronously find similar data points using run_in_executor loop = asyncio.get_event_loop() data_point_list = await loop.run_in_executor(None, universeService.find_similar_datapoint, name, quantity) # Asynchronously get universe datapoints name-ID pairs using run_in_executor pairs = await loop.run_in_executor(None, universeService.get_universe_datapoints_nameID_pairs) # Use asyncio.gather to fetch the definitions concurrently tasks = [get_datapoint_definition_async(universeService, pairs[data_point][0]) for data_point in data_point_list] result = await asyncio.gather(*tasks) return result async def get_datapoint_definition_async(universeService, datapoint_id): # Asynchronously get the definition for the given data point ID definition = await universeService.get_datapoint_definition_by_id(datapoint_id) return definitionExplanation: The get_related_datapoints_definition function is now an async function. It uses the asyncio library to run tasks concurrently. The get_datapoint_definition_async function is also async and retrieves the definition for a given data point ID. In get_definition.py: import asyncio import requests from bs4 import BeautifulSoup async def get_definition_url(datapoint_id): # Make your requests asynchronously using aiohttp library instead of requests library if available # ... # Sample synchronous implementation for the sake of example datapoint_definition = requests.get( "abc", headers={ 'Authorization': f'Bearer {abc}', 'X-API-ProductId': "abc" }, verify=False ) url = datapoint_definition.json() if type(url) is list: return url[0]["abc"] else: return -1 async def get_definition(url): if url == -1: return "There is no definition for this Data Point" definition = requests.get(url) soup = BeautifulSoup(definition.content, 'html.parser') try: p_element = soup.find('p') if p_element: target_element = p_element.find('span') or p_element else: target_element = soup.find('body') except AttributeError: target_element = None definition_text = target_element.get_text(strip=True) return definition_textExplanation: The functions get_definition_url and get_definition are now async functions. You can switch to an asynchronous HTTP library like aiohttp to make asynchronous HTTP requests, which is more suitable for this use case. However, I have kept the requests library in the example for simplicity. To call these functions from a Jupyter notebook, you can create an instance of DataPointService and use the get_related_datapoints_definition function. Here's an example: In your Jupyter notebook: from DataPointService import get_related_datapoints_definition async def main(): universe = "your_universe" name = "your_datapoint_name" quantity = 50 # specify the desired quantity result = await get_related_datapoints_definition(universe, name, quantity) print(result) # or do something with the result # Run the event loop to execute the async code await main()Explanation: The main function is an async function that calls get_related_datapoints_definition (which is an async function) and awaits the result. Finally, it prints the result or performs any desired operations. Make sure to import the necessary libraries (UniverseService, requests, BeautifulSoup, etc.) in the appropriate files for your code to work correctly. Please note that the code provided is an example and may require further adjustments based on your specific implementation and dependencies. |