looking for a house... or somehow a way to collect real estate market data. It could be interesting to know the prices trend during this pandemic period.
First of let's build a spider to collect data from some real estate websites. In this article I just focus on one site: casa.it
As I have already done with the jobs web sites I will collect all the data in a database through a pipeline, but this is a topic for the my next article.
I use the famous scrapy framework to build the spider. I am interested to get the information marked in the red squares:
The crawler looks like that:
The start_urls contains a list comprehension where urls are generated from a python list containing only some relevant cities.
In this case it is easy to generate a url for each city:
https://www.casa.it/srp/?tr=vendita&ft=<city_name>
for example
https://www.casa.it/srp/?tr=vendita&ft=roma
The crawler rules follow the button "seguente"(next ) at the end of the webpage:
All togheter:


No comments:
Post a Comment