acp-demande

In [1]:

import pandas

url = "https://archive.ics.uci.edu/ml/machine-learning-databases/wine/wine.data"
wine = pandas.read_csv(url, header = None, sep = ",")
wine.columns = ["class", "Alcohol", "Malic acid", "Ash", "Alcalinity of ash", "Magnesium", 
                "Total phenols", "Flavanoids", "Nonflavanoid phenols", "Proanthocyanins", 
                "Color intensity", "Hue", "OD280/OD315 of diluted wines", "Proline"]
wine

Out[1]:

	class	Alcohol	Malic acid	Ash	Alcalinity of ash	Magnesium	Total phenols	Flavanoids	Nonflavanoid phenols	Proanthocyanins	Color intensity	Hue	OD280/OD315 of diluted wines	Proline
0	1	14.23	1.71	2.43	15.6	127	2.80	3.06	0.28	2.29	5.64	1.04	3.92	1065
1	1	13.20	1.78	2.14	11.2	100	2.65	2.76	0.26	1.28	4.38	1.05	3.40	1050
2	1	13.16	2.36	2.67	18.6	101	2.80	3.24	0.30	2.81	5.68	1.03	3.17	1185
3	1	14.37	1.95	2.50	16.8	113	3.85	3.49	0.24	2.18	7.80	0.86	3.45	1480
4	1	13.24	2.59	2.87	21.0	118	2.80	2.69	0.39	1.82	4.32	1.04	2.93	735
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
173	3	13.71	5.65	2.45	20.5	95	1.68	0.61	0.52	1.06	7.70	0.64	1.74	740
174	3	13.40	3.91	2.48	23.0	102	1.80	0.75	0.43	1.41	7.30	0.70	1.56	750
175	3	13.27	4.28	2.26	20.0	120	1.59	0.69	0.43	1.35	10.20	0.59	1.56	835
176	3	13.17	2.59	2.37	20.0	120	1.65	0.68	0.53	1.46	9.30	0.60	1.62	840
177	3	14.13	4.10	2.74	24.5	96	2.05	0.76	0.56	1.35	9.20	0.61	1.60	560

178 rows × 14 columns

Travail à faire¶

Vous devez donc réaliser les étapes suivantes :

Décrire les données
Réaliser une ACP centrée ou normée (choix à justifier)
Produire les graphiques nécessaires à l’interprétation
Identifier les classes sur le plan factoriel
Que peut-on dire globalement ?

Worldwide Governance Indicators¶

La banque mondiale fournit un grand nombre de données, dont des indicateurs de gouvernance au niveau mondial (voir ici). Le code ci-dessous importe les données 2019 présentes dans le fichier WGI_Data.csv (que vous devez donc télécharger). Les informations concernant la définition des indicateurs sont les suivantes :

CC : Control of Corruption
- Control of Corruption captures perceptions of the extent to which public power is exercised for private gain, including both petty and grand forms of corruption, as well as capture" of the state by elites and private interests. Estimate gives the country's score on the aggregate indicator
GE : Government Effectiveness
- Government Effectiveness captures perceptions of the quality of public services, the quality of the civil service and the degree of its independence from political pressures, the quality of policy formulation and implementation, and the credibility of the government's commitment to such policies. Estimate gives the country's score on the aggregate indicator, in units of a standard normal distribution, i.e. ranging from approximately -2.5 to 2.5.
PV :Political Stability and Absence of Violence/Terrorism
- Political Stability and Absence of Violence/Terrorism measures perceptions of the likelihood of political instability and/or politically-motivated violence, including terrorism. Estimate gives the country's score on the aggregate indicator, in units of a standard normal distribution, i.e. ranging from approximately -2.5 to 2.5.
RQ : Regulatory Quality
- Regulatory Quality captures perceptions of the ability of the government to formulate and implement sound policies and regulations that permit and promote private sector development. Estimate gives the country's score on the aggregate indicator, in units of a standard normal distribution, i.e. ranging from approximately -2.5 to 2.5.
RL : Rule of Law
- Rule of Law captures perceptions of the extent to which agents have confidence in and abide by the rules of society, and in particular the quality of contract enforcement, property rights, the police, and the courts, as well as the likelihood of crime and violence. Estimate gives the country's score on the aggregate indicator, in units of a standard normal distribution, i.e. ranging from approximately -2.5 to 2.5.
VA : Voice and Accountability
- Voice and Accountability captures perceptions of the extent to which a country's citizens are able to participate in selecting their government, as well as freedom of expression, freedom of association, and a free media. Estimate gives the country's score on the aggregate indicator, in units of a standard normal distribution, i.e. ranging from approximately -2.5 to 2.5.

In [1]:

import pandas

wgi = pandas.read_csv("https://fxjollois.github.io/donnees/WGI/wgi2019.csv")
wgi

Out[1]:

	Country	Code	Voice and Accountability	Political Stability and Absence of Violence/Terrorism	Government Effectiveness	Regulatory Quality	Rule of Law
0	Aruba	ABW	1.294189	1.357372	1.029933	0.857360	1.263128
1	Andorra	ADO	1.139154	1.615139	1.908749	1.228176	1.579939
2	Afghanistan	AFG	-0.988032	-2.649407	-1.463875	-1.120555	-1.713527
3	Angola	AGO	-0.777283	-0.311101	-1.117144	-0.893871	-1.054343
4	Anguilla	AIA	NaN	1.367357	0.815824	0.846231	0.355737
...	...	...	...	...	...	...	...
209	Serbia	SRB	0.026626	-0.091665	0.019079	0.113867	-0.119070
210	South Africa	ZAF	0.670388	-0.217931	0.367380	0.156172	-0.076408
211	Congo, Dem. Rep.	ZAR	-1.365966	-1.808007	-1.627429	-1.509667	-1.786088
212	Zambia	ZMB	-0.286199	-0.102216	-0.675215	-0.554269	-0.462069
213	Zimbabwe	ZWE	-1.141875	-0.920179	-1.205337	-1.463199	-1.257009

214 rows × 7 columns

Travail à faire¶

Vous devez donc réaliser les étapes suivantes :

Décrire rapidement les données
Réaliser une ACP centrée ou normée (choix à justifier), sur les données
Produire les graphiques nécessaires à l’interprétation
Identifier les pays les plus représentatifs de chaque axe (en se basant sur leurs coordonnées par exempe)
Que peut-on dire globalement ?

Température mondiale¶

Nous allons travailler ici sur les données de température mondiale HadCRUT4, fournies par Climate Research Unit. Vous trouverez plus d’informations sur ces données sur ce lien. Nous avons ici l’historique des anomalies moyennes mensuelles et annuelles depuis 1850, au niveau mondial, par rapport à la période 1961-1990.

Le code ci-dessous télécharge directement les dernières données disponibles et les met dans un DataFrame dont vous avez un aperçu en dessous (en supprimant l’année 2021, incomplète).

In [2]:

temp = pandas.read_table(
    "https://crudata.uea.ac.uk/cru/data/temperature/HadCRUT4-gl.dat", 
    sep = "\s+", 
    names = ["Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec", "Annual"])
temp = temp.iloc[::2]
temp

Out[2]:

	Jan	Feb	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct	Nov	Dec	Annual
1850	-0.700	-0.286	-0.732	-0.563	-0.327	-0.213	-0.125	-0.237	-0.439	-0.451	-0.187	-0.257	-0.373
1851	-0.296	-0.356	-0.479	-0.441	-0.295	-0.197	-0.212	-0.157	-0.101	-0.057	-0.020	-0.051	-0.218
1852	-0.315	-0.477	-0.502	-0.557	-0.211	-0.040	-0.018	-0.202	-0.125	-0.216	-0.193	0.073	-0.228
1853	-0.182	-0.327	-0.309	-0.355	-0.268	-0.175	-0.059	-0.148	-0.404	-0.362	-0.255	-0.437	-0.269
1854	-0.365	-0.282	-0.286	-0.353	-0.233	-0.219	-0.227	-0.167	-0.119	-0.192	-0.367	-0.233	-0.248
...	...	...	...	...	...	...	...	...	...	...	...	...	...
2017	0.739	0.845	0.873	0.737	0.659	0.641	0.651	0.714	0.557	0.571	0.554	0.600	0.677
2018	0.554	0.528	0.615	0.627	0.587	0.573	0.594	0.586	0.598	0.678	0.590	0.638	0.597
2019	0.738	0.662	0.874	0.780	0.610	0.708	0.706	0.719	0.713	0.752	0.693	0.880	0.736
2020	0.982	1.001	1.017	0.800	0.714	0.682	0.695	0.735	0.714	0.617	0.761	0.516	0.768
2021	0.538	0.492	0.634	0.598	0.627	0.672	0.725	0.726	0.000	0.000	0.000	0.000	0.624

172 rows × 13 columns

ACP - Travail à faire¶

Mastère ESD - Introduction au Machine Learning¶

Wine¶

Travail à faire¶

Worldwide Governance Indicators¶

Travail à faire¶

Température mondiale¶

Travail à faire¶