Comment on page
Scraping with Python
Build your scraper from zero with ChatGPT
You have many ways to do it with ChatGPT, but for this example we won't ask for a complete code since the beginning, we'll do it step by step. Let's start!
First, we'll just ask for a small piece of our scraper and in the next steps we will start to improve it also using ChatGPT. The first function we'll create will be for extract the URLs present in the first website we visit.
I want to create a scraper using Python. The first thing I need to do is to create
a function that let me visit a website and extract its URLs
ChatGPT giving base code for scraper made in Python
As we can see, we will have to install BeautifulSoup and requests:
$ pip install beautifulsoup4 requests
When I ran this code with the url https://docs.gpt4devs.com I found many urls starting with "/" so I can't visit them automatically, because they're invalid. Let's say to ChatGPT that we need to fix the URL when it's starting with a "/"
It works perfect, but I have a problem. Many URLs that it detects are starting
with "/" and not with the domain, so I can't visit them easy. I need you
fix it please
ChatGPT fixing core function of our scraper
We will start to extract some information of each URL. For this example I just wanted to extract the title, but you can ask for extract other tags or information
Perfect! Now I need to add another feature: I need to visit the URLs it finds
recursively and extract the title of each one
Adding feature to our scraper made with ChatGPT and Python