Friday, 14 May 2021

Looking for a house (Part 1)

looking for a house... or somehow a way to collect real estate market data. It could be interesting to know the prices trend during this pandemic period.

First of  let's build a spider to collect data from some real estate websites. In this article I just focus on one site: casa.it

As I have already done with the jobs web sites I will collect all the data in a database through a pipeline, but this is a topic for the my next article.

I use the famous scrapy framework to build the spider. I am interested to get the information marked in the red squares:




The crawler looks like that:

The start_urls contains a list comprehension where urls are generated from a python list containing only some relevant cities.

In this case it is easy to generate a url for each city:

 https://www.casa.it/srp/?tr=vendita&ft=<city_name>

for example

https://www.casa.it/srp/?tr=vendita&ft=roma

The crawler rules follow the button "seguente"(next ) at the end of the webpage:



in the second gist the xpath-queries.

All togheter:


from this point now ready to store through a dedicated pipeline(an example here)


No comments:

Post a Comment