Python Forum
Search the entire web - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: Web Scraping & Web Development (https://python-forum.io/forum-13.html)
+--- Thread: Search the entire web (/thread-6963.html)



Search the entire web - DT909 - Dec-15-2017

Hi guys,

I'm new with Python. I want to start a project which consist to create a search engine which is supposed to look on the entire web and print the websites which contains some keywords or are responding to some variables which will be chosen by the user.
For example, let's assume that all company websites have similar pages ("About Us", "Products", "Clients", "Technologies", "Contact",...etc)
If the "Products" page contains "Table" then the web address will be printed...etc.

How do you think I should start. Which libraries I need, Do you think it's to complex...

Thanks in advance for your advises


I can already hear some of you saying that "google" is already doing this. But in my case, not really. My searches are so specific that google can't really help.


RE: Search the entire web - Larz60+ - Dec-15-2017

Which model cray's are in your cluster?


RE: Search the entire web - DT909 - Dec-15-2017

(Dec-15-2017, 05:25 PM)Larz60+ Wrote: Which model cray's are in your cluster?

Sorry I don't understand your question.


RE: Search the entire web - nilamo - Dec-15-2017

What is considered a part of the "entire web"? Not everything can be indexed, or crawled, and not everything is accessible over http.

You really only need the requests module to get a page. Finding all links in that page would be easier with beautifulsoup (the package name is bs4). And unless you have infinite time, you probably want to store an indexed version of the page in some way, using some sort of database, which would be another package.


RE: Search the entire web - buran - Dec-15-2017

(Dec-15-2017, 05:58 PM)DT909 Wrote:
(Dec-15-2017, 05:25 PM)Larz60+ Wrote: Which model cray's are in your cluster?

Sorry I don't understand your question. 

Yeah, we can guess that from your original question :-)
https://en.wikipedia.org/wiki/Cray