Mirkos3ck3: Looking for a house (Part 1)

looking for a house... or somehow a way to collect real estate market data. It could be interesting to know the prices trend during this pandemic period.

First of let's build a spider to collect data from some real estate websites. In this article I just focus on one site: casa.it

As I have already done with the jobs web sites I will collect all the data in a database through a pipeline, but this is a topic for the my next article.

I use the famous scrapy framework to build the spider. I am interested to get the information marked in the red squares:

The crawler looks like that:

The start_urls contains a list comprehension where urls are generated from a python list containing only some relevant cities.

In this case it is easy to generate a url for each city:

https://www.casa.it/srp/?tr=vendita&ft=<city_name>

for example

https://www.casa.it/srp/?tr=vendita&ft=roma

The crawler rules follow the button "seguente"(next ) at the end of the webpage:

in the second gist the xpath-queries.

All togheter:

from this point now ready to store through a dedicated pipeline(an example here)

Mirkos3ck3

Friday, 14 May 2021

Looking for a house (Part 1)

No comments:

Post a Comment