import pandas
Nous reprenons les données sur le vin disponible sur cette page du site l'UCI MLR. Voici le code pour récupérer les données
url = "https://archive.ics.uci.edu/ml/machine-learning-databases/wine/wine.data"
wine = pandas.read_csv(url, header = None, sep = ",")
wine.columns = ["class", "Alcohol", "Malic acid", "Ash", "Alcalinity of ash", "Magnesium",
"Total phenols", "Flavanoids", "Nonflavanoid phenols", "Proanthocyanins",
"Color intensity", "Hue", "OD280/OD315 of diluted wines", "Proline"]
wine
class | Alcohol | Malic acid | Ash | Alcalinity of ash | Magnesium | Total phenols | Flavanoids | Nonflavanoid phenols | Proanthocyanins | Color intensity | Hue | OD280/OD315 of diluted wines | Proline | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 14.23 | 1.71 | 2.43 | 15.6 | 127 | 2.80 | 3.06 | 0.28 | 2.29 | 5.64 | 1.04 | 3.92 | 1065 |
1 | 1 | 13.20 | 1.78 | 2.14 | 11.2 | 100 | 2.65 | 2.76 | 0.26 | 1.28 | 4.38 | 1.05 | 3.40 | 1050 |
2 | 1 | 13.16 | 2.36 | 2.67 | 18.6 | 101 | 2.80 | 3.24 | 0.30 | 2.81 | 5.68 | 1.03 | 3.17 | 1185 |
3 | 1 | 14.37 | 1.95 | 2.50 | 16.8 | 113 | 3.85 | 3.49 | 0.24 | 2.18 | 7.80 | 0.86 | 3.45 | 1480 |
4 | 1 | 13.24 | 2.59 | 2.87 | 21.0 | 118 | 2.80 | 2.69 | 0.39 | 1.82 | 4.32 | 1.04 | 2.93 | 735 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
173 | 3 | 13.71 | 5.65 | 2.45 | 20.5 | 95 | 1.68 | 0.61 | 0.52 | 1.06 | 7.70 | 0.64 | 1.74 | 740 |
174 | 3 | 13.40 | 3.91 | 2.48 | 23.0 | 102 | 1.80 | 0.75 | 0.43 | 1.41 | 7.30 | 0.70 | 1.56 | 750 |
175 | 3 | 13.27 | 4.28 | 2.26 | 20.0 | 120 | 1.59 | 0.69 | 0.43 | 1.35 | 10.20 | 0.59 | 1.56 | 835 |
176 | 3 | 13.17 | 2.59 | 2.37 | 20.0 | 120 | 1.65 | 0.68 | 0.53 | 1.46 | 9.30 | 0.60 | 1.62 | 840 |
177 | 3 | 14.13 | 4.10 | 2.74 | 24.5 | 96 | 2.05 | 0.76 | 0.56 | 1.35 | 9.20 | 0.61 | 1.60 | 560 |
178 rows × 14 columns
La banque mondiale fournit un grand nombre de données, dont des indicateurs de gouvernance au niveau mondial (voir ici). Le code ci-dessous importe les données 2019 présentes dans le fichier WGI_Data.csv (que vous devez donc télécharger). Les informations concernant la définition des indicateurs sont les suivantes :
CC
: Control of CorruptionGE
: Government EffectivenessPV
:Political Stability and Absence of Violence/TerrorismRQ
: Regulatory QualityRL
: Rule of LawVA
: Voice and Accountabilitywgi = pandas.read_csv("https://fxjollois.github.io/donnees/WGI/wgi2019.csv")
wgi
Country | Code | Voice and Accountability | Political Stability and Absence of Violence/Terrorism | Government Effectiveness | Regulatory Quality | Rule of Law | Control of Corruption | |
---|---|---|---|---|---|---|---|---|
0 | Aruba | ABW | 1.294189 | 1.357372 | 1.029933 | 0.857360 | 1.263128 | 1.217238 |
1 | Andorra | ADO | 1.139154 | 1.615139 | 1.908749 | 1.228176 | 1.579939 | 1.234392 |
2 | Afghanistan | AFG | -0.988032 | -2.649407 | -1.463875 | -1.120555 | -1.713527 | -1.401076 |
3 | Angola | AGO | -0.777283 | -0.311101 | -1.117144 | -0.893871 | -1.054343 | -1.054683 |
4 | Anguilla | AIA | NaN | 1.367357 | 0.815824 | 0.846231 | 0.355737 | 1.234392 |
... | ... | ... | ... | ... | ... | ... | ... | ... |
209 | Serbia | SRB | 0.026626 | -0.091665 | 0.019079 | 0.113867 | -0.119070 | -0.445551 |
210 | South Africa | ZAF | 0.670388 | -0.217931 | 0.367380 | 0.156172 | -0.076408 | 0.084924 |
211 | Congo, Dem. Rep. | ZAR | -1.365966 | -1.808007 | -1.627429 | -1.509667 | -1.786088 | -1.538931 |
212 | Zambia | ZMB | -0.286199 | -0.102216 | -0.675215 | -0.554269 | -0.462069 | -0.640345 |
213 | Zimbabwe | ZWE | -1.141875 | -0.920179 | -1.205337 | -1.463199 | -1.257009 | -1.238796 |
214 rows × 8 columns
Vous devez donc réaliser les étapes suivantes :
A partir des données Spotify disponibles sur cette page, nous souhaitons savoir s'il y a des classes qui existent dans ces 35853 chansons. Comme le nombre est assez important, nous allons réaliser une classification hybride.
import pandas
spotify = pandas.read_csv("https://fxjollois.github.io/donnees/spotify_dataset.csv")
spotify
track | artist | uri | danceability | energy | key | loudness | mode | speechiness | acousticness | instrumentalness | liveness | valence | tempo | duration_ms | time_signature | chorus_hit | sections | popularity | decade | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Jealous Kind Of Fella | Garland Green | spotify:track:1dtKN6wwlolkM8XZy2y9C1 | 0.417 | 0.620 | 3 | -7.727 | 1 | 0.0403 | 0.4900 | 0.000000 | 0.0779 | 0.8450 | 185.655 | 173533 | 3 | 32.94975 | 9 | 1 | 60s |
1 | Initials B.B. | Serge Gainsbourg | spotify:track:5hjsmSnUefdUqzsDogisiX | 0.498 | 0.505 | 3 | -12.475 | 1 | 0.0337 | 0.0180 | 0.107000 | 0.1760 | 0.7970 | 101.801 | 213613 | 4 | 48.82510 | 10 | 0 | 60s |
2 | Melody Twist | Lord Melody | spotify:track:6uk8tI6pwxxdVTNlNOJeJh | 0.657 | 0.649 | 5 | -13.392 | 1 | 0.0380 | 0.8460 | 0.000004 | 0.1190 | 0.9080 | 115.940 | 223960 | 4 | 37.22663 | 12 | 0 | 60s |
3 | Mi Bomba Sonó | Celia Cruz | spotify:track:7aNjMJ05FvUXACPWZ7yJmv | 0.590 | 0.545 | 7 | -12.058 | 0 | 0.1040 | 0.7060 | 0.024600 | 0.0610 | 0.9670 | 105.592 | 157907 | 4 | 24.75484 | 8 | 0 | 60s |
4 | Uravu Solla | P. Susheela | spotify:track:1rQ0clvgkzWr001POOPJWx | 0.515 | 0.765 | 11 | -3.515 | 0 | 0.1240 | 0.8570 | 0.000872 | 0.2130 | 0.9060 | 114.617 | 245600 | 4 | 21.79874 | 14 | 0 | 60s |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
41094 | Lotus Flowers | Yolta | spotify:track:4t1TljQWJ6ZuoSY67zVvBI | 0.172 | 0.358 | 9 | -14.430 | 1 | 0.0342 | 0.8860 | 0.966000 | 0.3140 | 0.0361 | 72.272 | 150857 | 4 | 24.30824 | 7 | 0 | 10s |
41095 | Calling My Spirit | Kodak Black | spotify:track:2MShy1GSSgbmGUxADNIao5 | 0.910 | 0.366 | 1 | -9.954 | 1 | 0.0941 | 0.0996 | 0.000000 | 0.2610 | 0.7400 | 119.985 | 152000 | 4 | 32.53856 | 8 | 1 | 10s |
41096 | Teenage Dream | Katy Perry | spotify:track:55qBw1900pZKfXJ6Q9A2Lc | 0.719 | 0.804 | 10 | -4.581 | 1 | 0.0355 | 0.0132 | 0.000003 | 0.1390 | 0.6050 | 119.999 | 227760 | 4 | 20.73371 | 7 | 1 | 10s |
41097 | Stormy Weather | Oscar Peterson | spotify:track:4o9npmYHrOF1rUxxTVH8h4 | 0.600 | 0.177 | 7 | -16.070 | 1 | 0.0561 | 0.9890 | 0.868000 | 0.1490 | 0.5600 | 120.030 | 213387 | 4 | 21.65301 | 14 | 0 | 10s |
41098 | Dust | Hans Zimmer | spotify:track:2khIaVUkbMmDHB596lyMG3 | 0.121 | 0.123 | 4 | -23.025 | 0 | 0.0443 | 0.9640 | 0.696000 | 0.1030 | 0.0297 | 95.182 | 341396 | 4 | 71.05343 | 15 | 0 | 10s |
41099 rows × 20 columns