Séance 2 - correction¶
In [1]:
import matplotlib.pyplot
import scipy.stats
import numpy
import pandas
import seaborn
Récupérer les données dans un DataFrame¶
In [2]:
url = "listings.csv"
data = pandas.read_csv(url, header = 0, sep = ",")
data.head()
Out[2]:
| id | name | host_id | host_name | neighbourhood_group | neighbourhood | latitude | longitude | room_type | price | minimum_nights | number_of_reviews | last_review | reviews_per_month | calculated_host_listings_count | availability_365 | number_of_reviews_ltm | license | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 371299 | Marais Rue des Archives refait à neuf février ... | 1870265 | Thomas | NaN | Hôtel-de-Ville | 48.85751 | 2.35511 | Entire home/apt | 185.0 | 3 | 601 | 2024-09-06 | 3.97 | 3 | 307 | 54 | 7510300547558 |
| 1 | 371862 | loft in Paris (Belleville) | 1872631 | Veronique | NaN | Entrepôt | 48.87151 | 2.37219 | Entire home/apt | 250.0 | 4 | 50 | 2023-08-14 | 0.34 | 1 | 9 | 0 | 7511000320406 |
| 2 | 372879 | Appartement complet au centre de Paris. | 1876396 | Samuel | NaN | Gobelins | 48.83593 | 2.35108 | Entire home/apt | 85.0 | 30 | 171 | 2024-08-15 | 2.01 | 3 | 151 | 1 | Available with a mobility lease only ("bail mo... |
| 3 | 375434 | Luxurious Art & Design Flat, 180m2, Champs Ely... | 683140 | Oscar | NaN | Élysée | 48.86680 | 2.30972 | Entire home/apt | NaN | 3 | 22 | 2023-07-15 | 0.15 | 1 | 0 | 0 | 7510806561916 |
| 4 | 378897 | Little flat near Montmartre | 1902818 | Lorraine | NaN | Opéra | 48.88285 | 2.33462 | Entire home/apt | 110.0 | 3 | 28 | 2024-07-30 | 0.19 | 1 | 6 | 2 | 7511805895013 |
Donner le nombre de logements total¶
In [3]:
data.shape[0]
Out[3]:
95461
Lister les différentes valeurs que prend la variable neighbourhood¶
In [4]:
data["neighbourhood"].unique()
Out[4]:
array(['Hôtel-de-Ville', 'Entrepôt', 'Gobelins', 'Élysée', 'Opéra',
'Batignolles-Monceau', 'Buttes-Montmartre', 'Reuilly', 'Temple',
'Vaugirard', 'Ménilmontant', 'Popincourt', 'Buttes-Chaumont',
'Passy', 'Luxembourg', 'Palais-Bourbon', 'Observatoire', 'Louvre',
'Bourse', 'Panthéon'], dtype=object)
Lister les différentes valeurs que prend la variable room_type¶
In [5]:
data["room_type"].unique()
Out[5]:
array(['Entire home/apt', 'Private room', 'Shared room', 'Hotel room'],
dtype=object)
Donner les 3 arrondissements avec le plus de logements¶
In [6]:
pandas.crosstab(data["neighbourhood"], "NbLogements").sort_values("NbLogements", ascending = False).head(3)
Out[6]:
| col_0 | NbLogements |
|---|---|
| neighbourhood | |
| Buttes-Montmartre | 10532 |
| Popincourt | 8392 |
| Vaugirard | 7727 |
Donner les 3 arrondissements pour lesquels le prix est le plus élevé¶
In [7]:
data.groupby("neighbourhood")["price"].agg(["mean"]).sort_values("mean", ascending = False).head(3).round(2)
Out[7]:
| mean | |
|---|---|
| neighbourhood | |
| Élysée | 442.04 |
| Palais-Bourbon | 409.84 |
| Passy | 407.62 |
Décrire les variables price, number_of_reviews et reviews_per_month¶
In [8]:
data.filter(["price", "number_of_reviews", "reviews_per_month"]).describe().round(2)
Out[8]:
| price | number_of_reviews | reviews_per_month | |
|---|---|---|---|
| count | 64230.00 | 95461.00 | 68319.00 |
| mean | 256.02 | 20.95 | 1.09 |
| std | 522.27 | 53.28 | 1.32 |
| min | 8.00 | 0.00 | 0.01 |
| 25% | 103.00 | 0.00 | 0.23 |
| 50% | 155.00 | 4.00 | 0.67 |
| 75% | 256.00 | 19.00 | 1.48 |
| max | 30400.00 | 3295.00 | 41.88 |
price¶
In [9]:
g = seaborn.catplot(data = data, x = "price", kind = "box")
g.fig.axes[0].set_xscale('log')
number_of_reviews¶
In [10]:
g = seaborn.catplot(data = data, x = "number_of_reviews", kind = "box")
g.fig.axes[0].set_xscale('log')
reviews_per_month¶
In [11]:
g = seaborn.catplot(data = data, x = "reviews_per_month", kind = "box")
g.fig.axes[0].set_xscale('log')
In [12]:
tab = pandas.concat(
[
pandas.crosstab(data["room_type"], "#"),
(pandas.crosstab(data["room_type"], "%", normalize = True) * 100).round(2)
],
axis = 1).sort_values("#", ascending = False)
tab
Out[12]:
| col_0 | # | % |
|---|---|---|
| room_type | ||
| Entire home/apt | 85268 | 89.32 |
| Private room | 9055 | 9.49 |
| Hotel room | 752 | 0.79 |
| Shared room | 386 | 0.40 |
In [13]:
matplotlib.pyplot.figure(figsize = (16, 6))
seaborn.countplot(x = "room_type", data = data, order = tab.index);
neighbourhood¶
In [14]:
tab = pandas.concat(
[
pandas.crosstab(data["neighbourhood"], "#"),
(pandas.crosstab(data["neighbourhood"], "%", normalize = True) * 100).round(2)
],
axis = 1).sort_values("#", ascending = False)
tab
Out[14]:
| col_0 | # | % |
|---|---|---|
| neighbourhood | ||
| Buttes-Montmartre | 10532 | 11.03 |
| Popincourt | 8392 | 8.79 |
| Vaugirard | 7727 | 8.09 |
| Batignolles-Monceau | 6673 | 6.99 |
| Entrepôt | 6464 | 6.77 |
| Passy | 6225 | 6.52 |
| Buttes-Chaumont | 5387 | 5.64 |
| Ménilmontant | 5183 | 5.43 |
| Opéra | 4652 | 4.87 |
| Reuilly | 3988 | 4.18 |
| Temple | 3908 | 4.09 |
| Observatoire | 3603 | 3.77 |
| Gobelins | 3274 | 3.43 |
| Bourse | 3137 | 3.29 |
| Panthéon | 3026 | 3.17 |
| Élysée | 2965 | 3.11 |
| Hôtel-de-Ville | 2857 | 2.99 |
| Luxembourg | 2724 | 2.85 |
| Palais-Bourbon | 2702 | 2.83 |
| Louvre | 2042 | 2.14 |
In [15]:
matplotlib.pyplot.figure(figsize = (16, 6))
seaborn.barplot(y = "neighbourhood", x = "#", hue = "#", data = tab,
order = tab.index, legend = False, palette = "Blues");
Décrire le lien entre price et room_type¶
In [16]:
tab = data.groupby("room_type")["price"] \
.agg(PrixMoyen = "mean") \
.sort_values("PrixMoyen", ascending = False) \
.round(2)
tab
Out[16]:
| PrixMoyen | |
|---|---|
| room_type | |
| Hotel room | 322.04 |
| Entire home/apt | 262.37 |
| Private room | 189.62 |
| Shared room | 115.54 |
In [17]:
g = seaborn.catplot(data = data, x = "room_type", y = "price", kind = "box",
order = tab.index,
height = 6, aspect = 2)
g.fig.axes[0].set_yscale('log')
Décrire le lien entre price et neighboorhood¶
In [18]:
tab = data.groupby("neighbourhood")["price"] \
.agg(PrixMoyen = "mean") \
.sort_values("PrixMoyen", ascending = False) \
.round(2)
tab
Out[18]:
| PrixMoyen | |
|---|---|
| neighbourhood | |
| Élysée | 442.04 |
| Palais-Bourbon | 409.84 |
| Passy | 407.62 |
| Louvre | 338.56 |
| Luxembourg | 336.12 |
| Hôtel-de-Ville | 291.04 |
| Batignolles-Monceau | 265.82 |
| Opéra | 257.74 |
| Bourse | 257.03 |
| Panthéon | 252.12 |
| Temple | 249.78 |
| Vaugirard | 245.56 |
| Entrepôt | 217.84 |
| Reuilly | 216.95 |
| Popincourt | 210.76 |
| Observatoire | 202.41 |
| Buttes-Chaumont | 189.86 |
| Gobelins | 181.35 |
| Buttes-Montmartre | 179.00 |
| Ménilmontant | 160.13 |
In [19]:
g = seaborn.catplot(data = data, y = "neighbourhood", x = "price",
kind = "box", order = tab.index,
height = 6, aspect = 2)
g.fig.axes[0].set_xscale('log')
Décrire le lien entre room_type et neighboorhood¶
In [20]:
pandas.crosstab(data["neighbourhood"], data["room_type"], normalize = "index").round(2)
Out[20]:
| room_type | Entire home/apt | Hotel room | Private room | Shared room |
|---|---|---|---|---|
| neighbourhood | ||||
| Batignolles-Monceau | 0.90 | 0.01 | 0.09 | 0.00 |
| Bourse | 0.93 | 0.00 | 0.07 | 0.00 |
| Buttes-Chaumont | 0.88 | 0.00 | 0.11 | 0.01 |
| Buttes-Montmartre | 0.91 | 0.01 | 0.08 | 0.00 |
| Entrepôt | 0.89 | 0.00 | 0.10 | 0.01 |
| Gobelins | 0.85 | 0.00 | 0.14 | 0.01 |
| Hôtel-de-Ville | 0.92 | 0.00 | 0.07 | 0.00 |
| Louvre | 0.89 | 0.02 | 0.09 | 0.00 |
| Luxembourg | 0.88 | 0.03 | 0.09 | 0.00 |
| Ménilmontant | 0.89 | 0.00 | 0.10 | 0.01 |
| Observatoire | 0.85 | 0.01 | 0.14 | 0.00 |
| Opéra | 0.84 | 0.02 | 0.14 | 0.00 |
| Palais-Bourbon | 0.90 | 0.02 | 0.08 | 0.00 |
| Panthéon | 0.88 | 0.01 | 0.10 | 0.01 |
| Passy | 0.92 | 0.01 | 0.07 | 0.00 |
| Popincourt | 0.91 | 0.00 | 0.08 | 0.00 |
| Reuilly | 0.87 | 0.01 | 0.12 | 0.01 |
| Temple | 0.94 | 0.00 | 0.05 | 0.01 |
| Vaugirard | 0.89 | 0.01 | 0.10 | 0.00 |
| Élysée | 0.87 | 0.03 | 0.10 | 0.00 |
In [21]:
seaborn.catplot(x = "room_type", data = data, kind = "count", col = "neighbourhood", col_wrap = 4);
Représenter les logements dans un nuage de points, en mettant une couleur par neighboorhood¶
In [22]:
matplotlib.pyplot.figure(figsize = (16, 12))
seaborn.scatterplot(data = data, x = "longitude", y = "latitude", hue = "neighbourhood",
palette = "Set1");
Ajouter l'information sur price pour chaque point dans ce graphique¶
Noter qu'il est difficile de réellement voir l'effet prix sur cette visualisation.
In [23]:
matplotlib.pyplot.figure(figsize = (16, 12))
seaborn.scatterplot(data = data, x = "longitude", y = "latitude", hue = "neighbourhood", size = "price",
palette = "Set1");