Langage de programmation en pleine croissance dans le domaine de la Data Science et du Big Data
Utilisation en mode console : Langage scripté (comme R, bash, SQL...)
2 utilisations classiques
.py)jupyter à installer[] et tuple () (liste constante) : éléments pas forcément du même type{} : ensemble de couples clé-valeur+, -, *, /, //, %, `**>, >=, <, <=, ==, !=|, &, not()'') ou doubles ("")len()a = 'bonjour'
print(a)
len(a)
bonjour
7
a = 'bonjour'
a[1:5]
'onjo'
a[0:3]
'bon'
a[:3]
'bon'
a[3:len(a)]
'jour'
a[3:]
'jour'
a[::]
'bonjour'
a[::-1]
'ruojnob'
a[::2]
'bnor'
a[1:5:2]
'oj'
a[5:1:-2]
'uj'
upper() ou lower()capitalize()find() - première occurencereplace()count()split()a = 'bonjour'
a.upper()
'BONJOUR'
a.capitalize()
'Bonjour'
a.find('j')
3
a.replace('jour', 'soir')
'bonsoir'
a.count('o')
2
a.split('j')
['bon', 'our']
()a = (3, 1, 9, 7)
print(a)
a[0]
(3, 1, 9, 7)
3
[]a = [3, 1, 9, 7]
print(a)
len(a)
[3, 1, 9, 7]
4
a[0]
3
a[1:3]
[1, 9]
reverse()sort() (avec option reverse=True pour tri décroissant)pop()append()insert() (position + valeur)remove() (celles passées en paramètres)Il est nécessaire d'afficher la liste
apour voir le résultat de la fonction
a = [3, 1, 9, 7]
a.reverse()
a
[7, 9, 1, 3]
a.sort()
a
[1, 3, 7, 9]
a.sort(reverse=True)
a
[9, 7, 3, 1]
a.pop()
1
a
[9, 7, 3]
a.append(5)
a
[9, 7, 3, 5]
a.insert(0, 6)
a
[6, 9, 7, 3, 5]
a.remove(7)
a
[6, 9, 3, 5]
+*a = [3, 1, 9, 7]
b = [1, 2]
a + b
[3, 1, 9, 7, 1, 2]
a * 2
[3, 1, 9, 7, 3, 1, 9, 7]
if)a = [3, 1, 9, 7]
[x**2 for x in a]
[9, 1, 81, 49]
[x**2 for x in a if x >= 4]
[81, 49]
[x**2 if x >= 4 else -x for x in a]
[-3, -1, 81, 49]
Attention : passage de référence lorsqu'on copie une liste
a = [1, 2, 3, 4]
b = a
a[0] = 5
b[1] = 9
print(a, b)
[5, 9, 3, 4] [5, 9, 3, 4]
copy()a = [1, 2, 3, 4]
b = a.copy()
a[0] = 5
b[1] = 9
print(a, b)
[5, 2, 3, 4] [1, 9, 3, 4]
dict : listes nommées définies via des {}a = {
"nom": "Jollois",
"prenom": "FX",
"langues": ["R", "Python", "SQL", "SAS"],
"labo": { "nom": "LIPADE", "lieu": "CUSP"}
}
print(a)
len(a)
{'nom': 'Jollois', 'prenom': 'FX', 'langues': ['R', 'Python', 'SQL', 'SAS'], 'labo': {'nom': 'LIPADE', 'lieu': 'CUSP'}}
4
a = {
"nom": "Jollois",
"prenom": "FX",
"langues": ["R", "Python", "SQL", "SAS"],
"labo": { "nom": "LIPADE", "lieu": "CUSP"}
}
a["nom"]
'Jollois'
a["langues"]
['R', 'Python', 'SQL', 'SAS']
a["langues"][0]
'R'
a["labo"]
{'nom': 'LIPADE', 'lieu': 'CUSP'}
a["labo"]["lieu"]
'CUSP'
get()keys()keys(): values()popitem()pop()Afficher le dictionnaire
apour voir le résultat des deux dernières fonctions aussi
a = {
"nom": "Jollois",
"prenom": "FX",
"langues": ["R", "Python", "SQL", "SAS"],
"labo": { "nom": "LIPADE", "lieu": "CUSP"}
}
a["type"] = "MCF"
a.get("nom")
'Jollois'
a.keys()
dict_keys(['nom', 'prenom', 'langues', 'labo', 'type'])
a.values()
dict_values(['Jollois', 'FX', ['R', 'Python', 'SQL', 'SAS'], {'nom': 'LIPADE', 'lieu': 'CUSP'}, 'MCF'])
a.popitem()
('type', 'MCF')
a
{'nom': 'Jollois',
'prenom': 'FX',
'langues': ['R', 'Python', 'SQL', 'SAS'],
'labo': {'nom': 'LIPADE', 'lieu': 'CUSP'}}
a.pop("nom")
'Jollois'
a
{'prenom': 'FX',
'langues': ['R', 'Python', 'SQL', 'SAS'],
'labo': {'nom': 'LIPADE', 'lieu': 'CUSP'}}
a = { "nom": "Jollois", "prenom": "FX" }
b = a
b["prenom"] = "Xavier"
print(a, b)
print(a)
{'nom': 'Jollois', 'prenom': 'Xavier'} {'nom': 'Jollois', 'prenom': 'Xavier'}
{'nom': 'Jollois', 'prenom': 'Xavier'}
copy() : copie indépendante de l'objet initiala = { "nom": "Jollois", "prenom": "FX" }
b = a.copy()
b["prenom"] = "Xavier"
print(a, b)
{'nom': 'Jollois', 'prenom': 'FX'} {'nom': 'Jollois', 'prenom': 'Xavier'}
if et else si besoin)fruits = ["pommes", "bananes", "poires", "oranges"]
nombres = [5, 2, 10, 4]
{fruits[i]:nombres[i] for i in range(4)}
{'pommes': 5, 'bananes': 2, 'poires': 10, 'oranges': 4}
dict() appliquée sur le résultat de zip() des deux listes : même résultat.fruits = ["pommes", "bananes", "poires", "oranges"]
nombres = [5, 2, 10, 4]
dict(zip(fruits, nombres))
{'pommes': 5, 'bananes': 2, 'poires': 10, 'oranges': 4}
if¶if, else et elifswitch ou casea = 3
if (a > 2):
print("sup")
sup
if (a > 2):
print("dans le IF")
print("sup")
dans le IF sup
if (a > 2):
print("sup")
elif (a > 0):
print("mid")
else:
print("inf")
sup
for¶forrange() pour avoir les valeurs entre 0 (par défaut) et la valeur passée en paramètre -1i persistant à la boucle (garde la dernière valeur)for i in range(5):
print(i)
print("dernière valeur de i :", i)
0 1 2 3 4 dernière valeur de i : 4
Exemple de range() |
Valeurs prises |
|---|---|
| range(5) | 0, 1, 2, 3, 4 |
| range(5, 10) | 5, 6, 7, 8, 9 |
| range(5, 10, 2) | 5, 7, 9 |
| range(10, 5, -1) | 10, 9, 8, 7, 6 |
list ou un tuple for i in [4, 1, 10] → 4, 1, 10for l in "Bonjour" → "b", "o", "n", "j", "o", "u", "r"for l in ("jour", "soir") → "jour", "soir"enumerate() permet de récupérer à la fois les indices des valeurs et les valeursfor i, x in enumerate([3, 1, 9, 4])i → : 0, 1, 2, 3x → : 3, 1, 9, 4zip() : travail sur plusieurs groupes de valeurs (ici deux listes)for i, j in zip([3, 1, 9, 4], ["trois", "un", "neuf"])i → : 3, 1, 9j → : "trois", "un", "neuf"for i in { 0: "lundi", 1: "mardi", 6: "dimanche"}i → : 0, 1, 6while¶while : teste en début de boucle si une condition est toujours vérifiéei += 1 raccourci pour i = i + 1i = 0
while i < 5:
print(i)
i += 1
print("Valeur de i :", i)
0 1 2 3 4 Valeur de i : 5
def pour la définirreturn indiquant le résultat à renvoyer def pi():
res = 3.141593
return res
pi()
3.141593
def afficheBonjour():
print("Bonjour")
afficheBonjour()
Bonjour
def afficheBonjour(nom):
print("Bonjour", nom)
afficheBonjour("Jollois")
Bonjour Jollois
def afficheBonjour(nom, prenom):
print("Bonjour", prenom, nom)
afficheBonjour("Jollois", "FX")
afficheBonjour(nom = "Jollois", prenom = "FX")
afficheBonjour(prenom = "FX", nom = "Jollois")
Bonjour FX Jollois Bonjour FX Jollois Bonjour FX Jollois
def afficheBonjour(nom, prenom = "?"):
print("Bonjour", prenom, nom)
afficheBonjour("Jollois", "FX")
afficheBonjour("Jollois")
Bonjour FX Jollois Bonjour ? Jollois
tryexcept: marche à suivre en cas d'erreur lors de l'exécution du bloc tryfinally : suite d'instructions à réaliser après la gestion de l'erreur (optionnel)def somme(v):
try:
res = sum(v)
except:
print("Erreur : somme impossible !")
res = None
finally:
return res
a = somme([1, 3, 5])
print(a)
a = somme(["un", 3, 5])
print(a)
9 Erreur : somme impossible ! None
numpy : calcul scientifiquescipy : extension de numpymatplotlib : visualisationpandas : manipulation de donnéesseaborn : visualisation statistique de donnéesLa dernière ligne permettra de voir le résultat des graphiques dans le document.
import matplotlib.pyplot
import scipy.stats
import numpy
import pandas
import seaborn
%matplotlib inline
tips = pandas.read_csv("https://fxjollois.github.io/donnees/tips.csv", header = 0, sep = ",")
tips.head()
| total_bill | tip | sex | smoker | day | time | size | |
|---|---|---|---|---|---|---|---|
| 0 | 16.99 | 1.01 | Female | No | Sun | Dinner | 2 |
| 1 | 10.34 | 1.66 | Male | No | Sun | Dinner | 3 |
| 2 | 21.01 | 3.50 | Male | No | Sun | Dinner | 3 |
| 3 | 23.68 | 3.31 | Male | No | Sun | Dinner | 2 |
| 4 | 24.59 | 3.61 | Female | No | Sun | Dinner | 4 |
| Fonction | Commentaire |
|---|---|
tips.head() |
Premières lignes du tableau (5 par défaut) |
tips.shape |
Nombre de lignes et de colonnes |
tips.count() |
Nombre de valeurs non nulles pour chaque colonne |
tips.info() |
Combinaisons de plusieurs infos |
tips.columns |
Noms des colonnes |
list(tips) |
Liste des noms de colonnes |
Comment réaliser les opérations classiques en bases de données ?
Note Bene : certaines fonctions renvoient un nouvel objet qu'il faudra donc stocker dans une variable (nouvelle ou la même). Par contre, d'autres fonctions modifient directement l'objet en question.
query()tips.query('total_bill > 48') # que les factures de plus de 48
| total_bill | tip | sex | smoker | day | time | size | |
|---|---|---|---|---|---|---|---|
| 59 | 48.27 | 6.73 | Male | No | Sat | Dinner | 4 |
| 156 | 48.17 | 5.00 | Male | No | Sun | Dinner | 6 |
| 170 | 50.81 | 10.00 | Male | Yes | Sat | Dinner | 3 |
| 212 | 48.33 | 9.00 | Male | No | Sat | Dinner | 4 |
tips.query('day.isin(("Sat", "Sun"))') # que les factures ayant eu lieu un samedi ou un dimanche
| total_bill | tip | sex | smoker | day | time | size | |
|---|---|---|---|---|---|---|---|
| 0 | 16.99 | 1.01 | Female | No | Sun | Dinner | 2 |
| 1 | 10.34 | 1.66 | Male | No | Sun | Dinner | 3 |
| 2 | 21.01 | 3.50 | Male | No | Sun | Dinner | 3 |
| 3 | 23.68 | 3.31 | Male | No | Sun | Dinner | 2 |
| 4 | 24.59 | 3.61 | Female | No | Sun | Dinner | 4 |
| ... | ... | ... | ... | ... | ... | ... | ... |
| 238 | 35.83 | 4.67 | Female | No | Sat | Dinner | 3 |
| 239 | 29.03 | 5.92 | Male | No | Sat | Dinner | 3 |
| 240 | 27.18 | 2.00 | Female | Yes | Sat | Dinner | 2 |
| 241 | 22.67 | 2.00 | Male | Yes | Sat | Dinner | 2 |
| 242 | 17.82 | 1.75 | Male | No | Sat | Dinner | 2 |
163 rows × 7 columns
tips.query('size > 4 & sex == "Male"') # que les tables de plus de 4 convives et payées par un homme
| total_bill | tip | sex | smoker | day | time | size | |
|---|---|---|---|---|---|---|---|
| 141 | 34.30 | 6.7 | Male | No | Thur | Lunch | 6 |
| 142 | 41.19 | 5.0 | Male | No | Thur | Lunch | 5 |
| 156 | 48.17 | 5.0 | Male | No | Sun | Dinner | 6 |
| 185 | 20.69 | 5.0 | Male | No | Sun | Dinner | 5 |
| 187 | 30.46 | 2.0 | Male | Yes | Sun | Dinner | 5 |
| 216 | 28.15 | 3.0 | Male | Yes | Sat | Dinner | 5 |
@a = 48
tips.query("total_bill > @a") # idem première ligne ci-dessus
| total_bill | tip | sex | smoker | day | time | size | |
|---|---|---|---|---|---|---|---|
| 59 | 48.27 | 6.73 | Male | No | Sat | Dinner | 4 |
| 156 | 48.17 | 5.00 | Male | No | Sun | Dinner | 6 |
| 170 | 50.81 | 10.00 | Male | Yes | Sat | Dinner | 3 |
| 212 | 48.33 | 9.00 | Male | No | Sat | Dinner | 4 |
filter() : items : liste des noms à garderaxis : axe à regarder (colonne ou ligne - colonne par défaut pour DataFrame)like : comparaison à une chaîneregex : utilisation d'expressions régulièrestips.filter(["sex", "total_bill"]) # que sex et total_bill donc
| sex | total_bill | |
|---|---|---|
| 0 | Female | 16.99 |
| 1 | Male | 10.34 |
| 2 | Male | 21.01 |
| 3 | Male | 23.68 |
| 4 | Female | 24.59 |
| ... | ... | ... |
| 239 | Male | 29.03 |
| 240 | Female | 27.18 |
| 241 | Male | 22.67 |
| 242 | Male | 17.82 |
| 243 | Female | 18.78 |
244 rows × 2 columns
tips.filter(like = "ti") # que les variables ayant "ti" dans leur nom
| tip | time | |
|---|---|---|
| 0 | 1.01 | Dinner |
| 1 | 1.66 | Dinner |
| 2 | 3.50 | Dinner |
| 3 | 3.31 | Dinner |
| 4 | 3.61 | Dinner |
| ... | ... | ... |
| 239 | 5.92 | Dinner |
| 240 | 2.00 | Dinner |
| 241 | 2.00 | Dinner |
| 242 | 1.75 | Dinner |
| 243 | 3.00 | Dinner |
244 rows × 2 columns
tips.filter(regex = "t.*i")
# que les variables ayant la lettre "t" puis la lettre "i" (avec ou sans caractères entre)
| total_bill | tip | time | |
|---|---|---|---|
| 0 | 16.99 | 1.01 | Dinner |
| 1 | 10.34 | 1.66 | Dinner |
| 2 | 21.01 | 3.50 | Dinner |
| 3 | 23.68 | 3.31 | Dinner |
| 4 | 24.59 | 3.61 | Dinner |
| ... | ... | ... | ... |
| 239 | 29.03 | 5.92 | Dinner |
| 240 | 27.18 | 2.00 | Dinner |
| 241 | 22.67 | 2.00 | Dinner |
| 242 | 17.82 | 1.75 | Dinner |
| 243 | 18.78 | 3.00 | Dinner |
244 rows × 3 columns
drop_duplicates()tips.filter(["sex", "smoker"]).drop_duplicates()
| sex | smoker | |
|---|---|---|
| 0 | Female | No |
| 1 | Male | No |
| 56 | Male | Yes |
| 67 | Female | Yes |
sort_values()ascending=False (True par défaut)tips.sort_values(by = "total_bill") # Tri par total croissant
| total_bill | tip | sex | smoker | day | time | size | |
|---|---|---|---|---|---|---|---|
| 67 | 3.07 | 1.00 | Female | Yes | Sat | Dinner | 1 |
| 92 | 5.75 | 1.00 | Female | Yes | Fri | Dinner | 2 |
| 111 | 7.25 | 1.00 | Female | No | Sat | Dinner | 1 |
| 172 | 7.25 | 5.15 | Male | Yes | Sun | Dinner | 2 |
| 149 | 7.51 | 2.00 | Male | No | Thur | Lunch | 2 |
| ... | ... | ... | ... | ... | ... | ... | ... |
| 182 | 45.35 | 3.50 | Male | Yes | Sun | Dinner | 3 |
| 156 | 48.17 | 5.00 | Male | No | Sun | Dinner | 6 |
| 59 | 48.27 | 6.73 | Male | No | Sat | Dinner | 4 |
| 212 | 48.33 | 9.00 | Male | No | Sat | Dinner | 4 |
| 170 | 50.81 | 10.00 | Male | Yes | Sat | Dinner | 3 |
244 rows × 7 columns
tips.sort_values(by = "total_bill", ascending = False) # Tri décroissant
| total_bill | tip | sex | smoker | day | time | size | |
|---|---|---|---|---|---|---|---|
| 170 | 50.81 | 10.00 | Male | Yes | Sat | Dinner | 3 |
| 212 | 48.33 | 9.00 | Male | No | Sat | Dinner | 4 |
| 59 | 48.27 | 6.73 | Male | No | Sat | Dinner | 4 |
| 156 | 48.17 | 5.00 | Male | No | Sun | Dinner | 6 |
| 182 | 45.35 | 3.50 | Male | Yes | Sun | Dinner | 3 |
| ... | ... | ... | ... | ... | ... | ... | ... |
| 149 | 7.51 | 2.00 | Male | No | Thur | Lunch | 2 |
| 111 | 7.25 | 1.00 | Female | No | Sat | Dinner | 1 |
| 172 | 7.25 | 5.15 | Male | Yes | Sun | Dinner | 2 |
| 92 | 5.75 | 1.00 | Female | Yes | Fri | Dinner | 2 |
| 67 | 3.07 | 1.00 | Female | Yes | Sat | Dinner | 1 |
244 rows × 7 columns
tips.sort_values(by = ["smoker", "total_bill"], ascending = [True, False]) # Tri avec smoker croissant et total décroissant
| total_bill | tip | sex | smoker | day | time | size | |
|---|---|---|---|---|---|---|---|
| 212 | 48.33 | 9.00 | Male | No | Sat | Dinner | 4 |
| 59 | 48.27 | 6.73 | Male | No | Sat | Dinner | 4 |
| 156 | 48.17 | 5.00 | Male | No | Sun | Dinner | 6 |
| 142 | 41.19 | 5.00 | Male | No | Thur | Lunch | 5 |
| 23 | 39.42 | 7.58 | Male | No | Sat | Dinner | 4 |
| ... | ... | ... | ... | ... | ... | ... | ... |
| 222 | 8.58 | 1.92 | Male | Yes | Fri | Lunch | 1 |
| 218 | 7.74 | 1.44 | Male | Yes | Sat | Dinner | 2 |
| 172 | 7.25 | 5.15 | Male | Yes | Sun | Dinner | 2 |
| 92 | 5.75 | 1.00 | Female | Yes | Fri | Dinner | 2 |
| 67 | 3.07 | 1.00 | Female | Yes | Sat | Dinner | 1 |
244 rows × 7 columns
head() (resp. tail())tips.head() # 5 premières lignes par défaut
| total_bill | tip | sex | smoker | day | time | size | |
|---|---|---|---|---|---|---|---|
| 0 | 16.99 | 1.01 | Female | No | Sun | Dinner | 2 |
| 1 | 10.34 | 1.66 | Male | No | Sun | Dinner | 3 |
| 2 | 21.01 | 3.50 | Male | No | Sun | Dinner | 3 |
| 3 | 23.68 | 3.31 | Male | No | Sun | Dinner | 2 |
| 4 | 24.59 | 3.61 | Female | No | Sun | Dinner | 4 |
tips.head(10) # 10 premières lignes
| total_bill | tip | sex | smoker | day | time | size | |
|---|---|---|---|---|---|---|---|
| 0 | 16.99 | 1.01 | Female | No | Sun | Dinner | 2 |
| 1 | 10.34 | 1.66 | Male | No | Sun | Dinner | 3 |
| 2 | 21.01 | 3.50 | Male | No | Sun | Dinner | 3 |
| 3 | 23.68 | 3.31 | Male | No | Sun | Dinner | 2 |
| 4 | 24.59 | 3.61 | Female | No | Sun | Dinner | 4 |
| 5 | 25.29 | 4.71 | Male | No | Sun | Dinner | 4 |
| 6 | 8.77 | 2.00 | Male | No | Sun | Dinner | 2 |
| 7 | 26.88 | 3.12 | Male | No | Sun | Dinner | 4 |
| 8 | 15.04 | 1.96 | Male | No | Sun | Dinner | 2 |
| 9 | 14.78 | 3.23 | Male | No | Sun | Dinner | 2 |
tips.tail(3) # 3 dernières lignes
| total_bill | tip | sex | smoker | day | time | size | |
|---|---|---|---|---|---|---|---|
| 241 | 22.67 | 2.00 | Male | Yes | Sat | Dinner | 2 |
| 242 | 17.82 | 1.75 | Male | No | Sat | Dinner | 2 |
| 243 | 18.78 | 3.00 | Female | No | Thur | Dinner | 2 |
DataFrame modifié directementtips['n_row'] = range(244)
tips['nouv'] = "nouvelle valeur"
tips.head()
| total_bill | tip | sex | smoker | day | time | size | n_row | nouv | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | 16.99 | 1.01 | Female | No | Sun | Dinner | 2 | 0 | nouvelle valeur |
| 1 | 10.34 | 1.66 | Male | No | Sun | Dinner | 3 | 1 | nouvelle valeur |
| 2 | 21.01 | 3.50 | Male | No | Sun | Dinner | 3 | 2 | nouvelle valeur |
| 3 | 23.68 | 3.31 | Male | No | Sun | Dinner | 2 | 3 | nouvelle valeur |
| 4 | 24.59 | 3.61 | Female | No | Sun | Dinner | 4 | 4 | nouvelle valeur |
assign()DataFrame modifié (original non modifié)# attention ici, l.size ferait référence à la taille de l, car c'est un mot clé de python
tips.assign(per_person = lambda l: round(l.total_bill / l['size'], 2))
| total_bill | tip | sex | smoker | day | time | size | n_row | nouv | per_person | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 16.99 | 1.01 | Female | No | Sun | Dinner | 2 | 0 | nouvelle valeur | 8.49 |
| 1 | 10.34 | 1.66 | Male | No | Sun | Dinner | 3 | 1 | nouvelle valeur | 3.45 |
| 2 | 21.01 | 3.50 | Male | No | Sun | Dinner | 3 | 2 | nouvelle valeur | 7.00 |
| 3 | 23.68 | 3.31 | Male | No | Sun | Dinner | 2 | 3 | nouvelle valeur | 11.84 |
| 4 | 24.59 | 3.61 | Female | No | Sun | Dinner | 4 | 4 | nouvelle valeur | 6.15 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 239 | 29.03 | 5.92 | Male | No | Sat | Dinner | 3 | 239 | nouvelle valeur | 9.68 |
| 240 | 27.18 | 2.00 | Female | Yes | Sat | Dinner | 2 | 240 | nouvelle valeur | 13.59 |
| 241 | 22.67 | 2.00 | Male | Yes | Sat | Dinner | 2 | 241 | nouvelle valeur | 11.34 |
| 242 | 17.82 | 1.75 | Male | No | Sat | Dinner | 2 | 242 | nouvelle valeur | 8.91 |
| 243 | 18.78 | 3.00 | Female | No | Thur | Dinner | 2 | 243 | nouvelle valeur | 9.39 |
244 rows × 10 columns
aggregate() (ou agg())tips.filter(["total_bill", "tip", "size"]).aggregate(["count", "mean"])
| total_bill | tip | size | |
|---|---|---|---|
| count | 244.000000 | 244.000000 | 244.000000 |
| mean | 19.785943 | 2.998279 | 2.569672 |
tips.filter(["total_bill", "tip", "size"]).mean()
total_bill 19.785943 tip 2.998279 size 2.569672 dtype: float64
groupby()tips.filter(["sex", "total_bill", "tip", "size"]).groupby("sex").mean()
| total_bill | tip | size | |
|---|---|---|---|
| sex | |||
| Female | 18.056897 | 2.833448 | 2.459770 |
| Male | 20.744076 | 3.089618 | 2.630573 |
tips.filter(["sex", "smoker", "total_bill", "tip", "size"]).groupby(["sex", "smoker"]).mean()
| total_bill | tip | size | ||
|---|---|---|---|---|
| sex | smoker | |||
| Female | No | 18.105185 | 2.773519 | 2.592593 |
| Yes | 17.977879 | 2.931515 | 2.242424 | |
| Male | No | 19.791237 | 3.113402 | 2.711340 |
| Yes | 22.284500 | 3.051167 | 2.500000 |
melt() n_row car pas d'identifiant unique par lignetips2 = tips.melt(id_vars = "n_row")
tips2
| n_row | variable | value | |
|---|---|---|---|
| 0 | 0 | total_bill | 16.99 |
| 1 | 1 | total_bill | 10.34 |
| 2 | 2 | total_bill | 21.01 |
| 3 | 3 | total_bill | 23.68 |
| 4 | 4 | total_bill | 24.59 |
| ... | ... | ... | ... |
| 1947 | 239 | nouv | nouvelle valeur |
| 1948 | 240 | nouv | nouvelle valeur |
| 1949 | 241 | nouv | nouvelle valeur |
| 1950 | 242 | nouv | nouvelle valeur |
| 1951 | 243 | nouv | nouvelle valeur |
1952 rows × 3 columns
pivot()pandas.pivot(tips2, index = "n_row", columns = "variable", values = "value")
| variable | day | nouv | sex | size | smoker | time | tip | total_bill |
|---|---|---|---|---|---|---|---|---|
| n_row | ||||||||
| 0 | Sun | nouvelle valeur | Female | 2 | No | Dinner | 1.01 | 16.99 |
| 1 | Sun | nouvelle valeur | Male | 3 | No | Dinner | 1.66 | 10.34 |
| 2 | Sun | nouvelle valeur | Male | 3 | No | Dinner | 3.50 | 21.01 |
| 3 | Sun | nouvelle valeur | Male | 2 | No | Dinner | 3.31 | 23.68 |
| 4 | Sun | nouvelle valeur | Female | 4 | No | Dinner | 3.61 | 24.59 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 239 | Sat | nouvelle valeur | Male | 3 | No | Dinner | 5.92 | 29.03 |
| 240 | Sat | nouvelle valeur | Female | 2 | Yes | Dinner | 2.00 | 27.18 |
| 241 | Sat | nouvelle valeur | Male | 2 | Yes | Dinner | 2.00 | 22.67 |
| 242 | Sat | nouvelle valeur | Male | 2 | No | Dinner | 1.75 | 17.82 |
| 243 | Thur | nouvelle valeur | Female | 2 | No | Dinner | 3.00 | 18.78 |
244 rows × 8 columns
Représentations numériques et graphiques
describe() pour décrire toutes les variables quantitativestips.describe()
| total_bill | tip | size | n_row | |
|---|---|---|---|---|
| count | 244.000000 | 244.000000 | 244.000000 | 244.000000 |
| mean | 19.785943 | 2.998279 | 2.569672 | 121.500000 |
| std | 8.902412 | 1.383638 | 0.951100 | 70.580923 |
| min | 3.070000 | 1.000000 | 1.000000 | 0.000000 |
| 25% | 13.347500 | 2.000000 | 2.000000 | 60.750000 |
| 50% | 17.795000 | 2.900000 | 2.000000 | 121.500000 |
| 75% | 24.127500 | 3.562500 | 3.000000 | 182.250000 |
| max | 50.810000 | 10.000000 | 6.000000 | 243.000000 |
tips.describe().round(2)
| total_bill | tip | size | n_row | |
|---|---|---|---|---|
| count | 244.00 | 244.00 | 244.00 | 244.00 |
| mean | 19.79 | 3.00 | 2.57 | 121.50 |
| std | 8.90 | 1.38 | 0.95 | 70.58 |
| min | 3.07 | 1.00 | 1.00 | 0.00 |
| 25% | 13.35 | 2.00 | 2.00 | 60.75 |
| 50% | 17.80 | 2.90 | 2.00 | 121.50 |
| 75% | 24.13 | 3.56 | 3.00 | 182.25 |
| max | 50.81 | 10.00 | 6.00 | 243.00 |
tips.total_bill.describe()
count 244.000000 mean 19.785943 std 8.902412 min 3.070000 25% 13.347500 50% 17.795000 75% 24.127500 max 50.810000 Name: total_bill, dtype: float64
tips["total_bill"].describe()
count 244.000000 mean 19.785943 std 8.902412 min 3.070000 25% 13.347500 50% 17.795000 75% 24.127500 max 50.810000 Name: total_bill, dtype: float64
tips.total_bill.mean()
19.78594262295082
tips.total_bill.std()
8.902411954856856
tips.total_bill.var()
79.25293861397827
tips.total_bill.min()
3.07
tips.total_bill.max()
50.81
tips.total_bill.median()
17.795
tips.total_bill.quantile([.01, .1, .9, .99])
0.01 7.250 0.10 10.340 0.90 32.235 0.99 48.227 Name: total_bill, dtype: float64
scipy.stats.normaltest(tips.total_bill)
NormaltestResult(statistic=45.11781912347332, pvalue=1.5951078766352608e-10)
scipy.stats.shapiro(tips.total_bill)
ShapiroResult(statistic=0.9197188019752502, pvalue=3.3245434183371003e-10)
pandas : Fonction hist() ou plot() avec kind="hist"tips.total_bill.hist()
<AxesSubplot:>
tips.total_bill.hist(bins = 20)
<AxesSubplot:>
tips.total_bill.plot(kind = "hist")
<AxesSubplot:ylabel='Frequency'>
tips.total_bill.plot(kind = "hist", density = True)
<AxesSubplot:ylabel='Frequency'>
tips.total_bill.plot(kind = "kde")
<AxesSubplot:ylabel='Density'>
# A mettre ensemble pour avoir densité + histogramme sur le même graphique
tips.total_bill.plot(kind = "hist", density = True, color = "lightgrey")
tips.total_bill.plot(kind = "kde")
<AxesSubplot:ylabel='Density'>
seaborn: Fonction hisplot()seaborn.histplot(tips.total_bill)
<AxesSubplot:xlabel='total_bill', ylabel='Count'>
seaborn.histplot(data = tips, x = "total_bill")
<AxesSubplot:xlabel='total_bill', ylabel='Count'>
seaborn.histplot(data = tips, x = "total_bill", bins = 20)
<AxesSubplot:xlabel='total_bill', ylabel='Count'>
seaborn.histplot(data = tips, x = "total_bill", bins = [0, 10, 25, 60], stat = "density")
<AxesSubplot:xlabel='total_bill', ylabel='Density'>
seaborn.histplot(data = tips, x = "total_bill", kde = True)
<AxesSubplot:xlabel='total_bill', ylabel='Count'>
pandas : Fonction boxplot()tips.boxplot()
<AxesSubplot:>
tips.boxplot(column = "total_bill")
<AxesSubplot:>
tips.boxplot(column = "total_bill", grid = False)
<AxesSubplot:>
seaborn : Fonction boxplot()seaborn.boxplot(x = "total_bill", data = tips)
<AxesSubplot:xlabel='total_bill'>
seaborn.boxplot(y = "total_bill", data = tips)
<AxesSubplot:ylabel='total_bill'>
seaborn.boxplot(x = "total_bill", data = tips, whis = 3)
<AxesSubplot:xlabel='total_bill'>
seabornseaborn.pointplot(x = "total_bill", data = tips)
<AxesSubplot:xlabel='total_bill'>
seaborn.violinplot(x = "total_bill", data = tips)
<AxesSubplot:xlabel='total_bill'>
seaborn.stripplot(x = "total_bill", data = tips, jitter = True)
<AxesSubplot:xlabel='total_bill'>
tips.sex.describe()
count 244 unique 2 top Male freq 157 Name: sex, dtype: object
tips.sex.unique()
array(['Female', 'Male'], dtype=object)
tips.sex.value_counts()
Male 157 Female 87 Name: sex, dtype: int64
pandas.crosstab(tips.sex, "freq")
| col_0 | freq |
|---|---|
| sex | |
| Female | 87 |
| Male | 157 |
pandas.crosstab(tips.sex, "freq", normalize = True) # Proportion
| col_0 | freq |
|---|---|
| sex | |
| Female | 0.356557 |
| Male | 0.643443 |
t = pandas.crosstab(tips.sex, "freq", normalize=True)
scipy.stats.chisquare(t.freq)
Power_divergenceResult(statistic=0.08230314431604406, pvalue=0.774200187925369)
scipy.stats.chisquare(t.freq, (.2, .8))
Power_divergenceResult(statistic=0.1531888269282451, pvalue=0.6955064385613343)
pandas : Fonction bar() ou type "bar" pour plot()t = pandas.crosstab(tips.sex, "freq")
t.plot.bar()
<AxesSubplot:xlabel='sex'>
t.plot(kind = "bar")
<AxesSubplot:xlabel='sex'>
# En proportion
t = pandas.crosstab(tips.sex, "freq", normalize=True)
t.plot(kind = "bar")
<AxesSubplot:xlabel='sex'>
# En pourcentage
(t * 100).plot(kind = "bar")
<AxesSubplot:xlabel='sex'>
seaborn : Fonctions countplot() ou barplot()seaborn.countplot(x = "sex", data = tips)
<AxesSubplot:xlabel='sex', ylabel='count'>
# En pourcentage
t = pandas.crosstab(tips.sex, "freq", normalize=True)
t = t.assign(sex = t.index, freq = 100 * t.freq)
seaborn.barplot(x = "sex", y = "freq", data = t)
<AxesSubplot:xlabel='sex', ylabel='freq'>
pandas : Fonction plot.pie()t = pandas.crosstab(tips.sex, "freq")
t.plot.pie(subplots = True, figsize = (6, 6))
array([<AxesSubplot:ylabel='freq'>], dtype=object)
seaborntips.corr()
| total_bill | tip | size | n_row | |
|---|---|---|---|---|
| total_bill | 1.000000 | 0.675734 | 0.598315 | 0.044526 |
| tip | 0.675734 | 1.000000 | 0.489299 | -0.026709 |
| size | 0.598315 | 0.489299 | 1.000000 | 0.008061 |
| n_row | 0.044526 | -0.026709 | 0.008061 | 1.000000 |
tips.total_bill.cov(tips.tip)
8.323501629224854
tips.total_bill.corr(tips.tip)
0.6757341092113641
scipy.stats.pearsonr(tips.total_bill, tips.tip)
(0.6757341092113647, 6.6924706468630016e-34)
scipy.stats.kendalltau(tips.total_bill, tips.tip)
KendalltauResult(correlation=0.517180972142381, pvalue=2.4455728480214792e-32)
pandas : Fonctions plot.scatter() et scatter_matrix()tips.plot.scatter("total_bill", "tip")
<AxesSubplot:xlabel='total_bill', ylabel='tip'>
pandas.plotting.scatter_matrix(tips)
/usr/local/lib/python3.9/site-packages/pandas/plotting/_matplotlib/tools.py:400: MatplotlibDeprecationWarning: The is_first_col function was deprecated in Matplotlib 3.4 and will be removed two minor releases later. Use ax.get_subplotspec().is_first_col() instead. if ax.is_first_col():
array([[<AxesSubplot:xlabel='total_bill', ylabel='total_bill'>,
<AxesSubplot:xlabel='tip', ylabel='total_bill'>,
<AxesSubplot:xlabel='size', ylabel='total_bill'>,
<AxesSubplot:xlabel='n_row', ylabel='total_bill'>],
[<AxesSubplot:xlabel='total_bill', ylabel='tip'>,
<AxesSubplot:xlabel='tip', ylabel='tip'>,
<AxesSubplot:xlabel='size', ylabel='tip'>,
<AxesSubplot:xlabel='n_row', ylabel='tip'>],
[<AxesSubplot:xlabel='total_bill', ylabel='size'>,
<AxesSubplot:xlabel='tip', ylabel='size'>,
<AxesSubplot:xlabel='size', ylabel='size'>,
<AxesSubplot:xlabel='n_row', ylabel='size'>],
[<AxesSubplot:xlabel='total_bill', ylabel='n_row'>,
<AxesSubplot:xlabel='tip', ylabel='n_row'>,
<AxesSubplot:xlabel='size', ylabel='n_row'>,
<AxesSubplot:xlabel='n_row', ylabel='n_row'>]], dtype=object)
seaborn : Fonctions jointplot(), regplot() (ou lmplot()) et pairplot()seaborn.jointplot(x = "total_bill", y = "tip", data = tips)
<seaborn.axisgrid.JointGrid at 0x1398803d0>
seaborn.jointplot(x = "total_bill", y = "tip", data = tips, kind = "reg")
<seaborn.axisgrid.JointGrid at 0x1398726d0>
seaborn.jointplot(x = "total_bill", y = "tip", data = tips, kind = "hex")
<seaborn.axisgrid.JointGrid at 0x139a9a9d0>
seaborn.jointplot(x = "total_bill", y = "tip", data = tips, kind = "kde")
<seaborn.axisgrid.JointGrid at 0x139be9250>
seaborn.regplot(x = "total_bill", y = "tip", data = tips)
<AxesSubplot:xlabel='total_bill', ylabel='tip'>
seaborn.regplot(x = "total_bill", y = "tip", data = tips, fit_reg = False)
<AxesSubplot:xlabel='total_bill', ylabel='tip'>
seaborn.regplot(x = "total_bill", y = "tip", data = tips, scatter = False)
<AxesSubplot:xlabel='total_bill', ylabel='tip'>
seaborn.pairplot(data = tips, vars = ["total_bill", "tip", "size"])
<seaborn.axisgrid.PairGrid at 0x139e6f6d0>
pandas.crosstab(tips.sex, tips.smoker)
| smoker | No | Yes |
|---|---|---|
| sex | ||
| Female | 54 | 33 |
| Male | 97 | 60 |
pandas.crosstab(tips.sex, tips.smoker, margins = True)
| smoker | No | Yes | All |
|---|---|---|---|
| sex | |||
| Female | 54 | 33 | 87 |
| Male | 97 | 60 | 157 |
| All | 151 | 93 | 244 |
pandas.crosstab(tips.sex, tips.smoker, normalize = True)
| smoker | No | Yes |
|---|---|---|
| sex | ||
| Female | 0.221311 | 0.135246 |
| Male | 0.397541 | 0.245902 |
pandas.crosstab(tips.sex, tips.smoker, normalize = "index")
| smoker | No | Yes |
|---|---|---|
| sex | ||
| Female | 0.620690 | 0.379310 |
| Male | 0.617834 | 0.382166 |
pandas.crosstab(tips.sex, tips.smoker, normalize = "index", margins = True)
| smoker | No | Yes |
|---|---|---|
| sex | ||
| Female | 0.620690 | 0.379310 |
| Male | 0.617834 | 0.382166 |
| All | 0.618852 | 0.381148 |
pandas.crosstab(tips.sex, tips.smoker, normalize = "columns")
| smoker | No | Yes |
|---|---|---|
| sex | ||
| Female | 0.357616 | 0.354839 |
| Male | 0.642384 | 0.645161 |
pandas.crosstab(tips.sex, tips.smoker, normalize = "columns", margins = True)
| smoker | No | Yes | All |
|---|---|---|---|
| sex | |||
| Female | 0.357616 | 0.354839 | 0.356557 |
| Male | 0.642384 | 0.645161 | 0.643443 |
t = pandas.crosstab(tips.sex, tips.smoker)
scipy.stats.chi2_contingency(t)
(0.008763290531773594,
0.925417020494423,
1,
array([[53.84016393, 33.15983607],
[97.15983607, 59.84016393]]))
pandas t = pandas.crosstab(tips.sex, tips.smoker)
t.plot.bar()
<AxesSubplot:xlabel='sex'>
t = pandas.crosstab(tips.sex, tips.smoker, normalize=True)
t.plot.bar()
<AxesSubplot:xlabel='sex'>
t = pandas.crosstab(tips.sex, tips.smoker, normalize="index")
t.plot.bar(stacked=True)
<AxesSubplot:xlabel='sex'>
t = pandas.crosstab(tips.sex, tips.smoker)
t.plot.pie(subplots=True, figsize = (12, 6))
/usr/local/lib/python3.9/site-packages/pandas/plotting/_matplotlib/tools.py:400: MatplotlibDeprecationWarning: The is_first_col function was deprecated in Matplotlib 3.4 and will be removed two minor releases later. Use ax.get_subplotspec().is_first_col() instead. if ax.is_first_col():
array([<AxesSubplot:ylabel='No'>, <AxesSubplot:ylabel='Yes'>],
dtype=object)
seabornseaborn.countplot(x = "sex", hue = "smoker", data = tips)
<AxesSubplot:xlabel='sex', ylabel='count'>
t = pandas.crosstab(tips.sex, tips.smoker, normalize = "columns")
t = t.assign(sex = t.index)
tm = pandas.melt(t, id_vars = "sex")
tm = tm.assign(value = 100 * tm.value)
seaborn.catplot(x = "sex", y = "value", col = "smoker", data = tm, kind = "bar")
<seaborn.axisgrid.FacetGrid at 0x13a3252e0>
tips.groupby("sex").mean()
| total_bill | tip | size | n_row | |
|---|---|---|---|---|
| sex | ||||
| Female | 18.056897 | 2.833448 | 2.459770 | 128.080460 |
| Male | 20.744076 | 3.089618 | 2.630573 | 117.853503 |
tips.groupby("sex")["total_bill"].agg([numpy.mean, numpy.std, numpy.median, numpy.min, numpy.max])
| mean | std | median | amin | amax | |
|---|---|---|---|---|---|
| sex | |||||
| Female | 18.056897 | 8.009209 | 16.40 | 3.07 | 44.30 |
| Male | 20.744076 | 9.246469 | 18.35 | 7.25 | 50.81 |
billFemale = tips.total_bill[tips.sex == "Female"]
billMale = tips.total_bill[tips.sex == "Male"]
scipy.stats.ttest_ind(billFemale, billMale)
Ttest_indResult(statistic=-2.2777940289803134, pvalue=0.0236116668468594)
billGrouped = [tips.total_bill[tips.sex == s] for s in list(tips.sex.unique())]
scipy.stats.f_oneway(*billGrouped)
F_onewayResult(statistic=5.188345638458361, pvalue=0.023611666846859697)
pandas :tips.hist(column = "total_bill", by = "sex")
/usr/local/lib/python3.9/site-packages/pandas/plotting/_matplotlib/tools.py:400: MatplotlibDeprecationWarning: The is_first_col function was deprecated in Matplotlib 3.4 and will be removed two minor releases later. Use ax.get_subplotspec().is_first_col() instead. if ax.is_first_col():
array([<AxesSubplot:title={'center':'Female'}>,
<AxesSubplot:title={'center':'Male'}>], dtype=object)
tips.boxplot(by = "sex")
array([[<AxesSubplot:title={'center':'n_row'}, xlabel='[sex]'>,
<AxesSubplot:title={'center':'size'}, xlabel='[sex]'>],
[<AxesSubplot:title={'center':'tip'}, xlabel='[sex]'>,
<AxesSubplot:title={'center':'total_bill'}, xlabel='[sex]'>]],
dtype=object)
tips.boxplot(column = "total_bill", by = "sex")
<AxesSubplot:title={'center':'total_bill'}, xlabel='sex'>
seaborn :p = seaborn.FacetGrid(tips, row = "sex")
p.map(seaborn.distplot, "total_bill")
/usr/local/lib/python3.9/site-packages/seaborn/distributions.py:2557: FutureWarning: `distplot` is a deprecated function and will be removed in a future version. Please adapt your code to use either `displot` (a figure-level function with similar flexibility) or `histplot` (an axes-level function for histograms). warnings.warn(msg, FutureWarning) /usr/local/lib/python3.9/site-packages/seaborn/distributions.py:2557: FutureWarning: `distplot` is a deprecated function and will be removed in a future version. Please adapt your code to use either `displot` (a figure-level function with similar flexibility) or `histplot` (an axes-level function for histograms). warnings.warn(msg, FutureWarning)
<seaborn.axisgrid.FacetGrid at 0x13a933e80>
seaborn.catplot(x = "sex", y = "total_bill", data = tips, kind = "box")
<seaborn.axisgrid.FacetGrid at 0x13a937730>
seaborn.catplot(x = "sex", y = "total_bill", data = tips, kind = "point", join = False)
<seaborn.axisgrid.FacetGrid at 0x13aa84640>
seaborn.catplot(x = "sex", y = "total_bill", data = tips, kind = "violin")
<seaborn.axisgrid.FacetGrid at 0x13ab2b370>
seaborn.catplot(x = "sex", y = "total_bill", data = tips, kind = "strip")
<seaborn.axisgrid.FacetGrid at 0x13abb1550>
seaborn¶catplot() permettant de faire plusieurs types de graphiquesseaborn.catplot(y = "total_bill", data = tips, kind = "box")
<seaborn.axisgrid.FacetGrid at 0x13abeea90>
seaborn.catplot(x = "total_bill", data = tips, kind = "point")
<seaborn.axisgrid.FacetGrid at 0x13ac3d640>
seaborn.catplot(x = "total_bill", data = tips, kind = "violin")
<seaborn.axisgrid.FacetGrid at 0x13ac629d0>
seaborn.catplot(x = "total_bill", data = tips, kind = "strip", jitter = True)
<seaborn.axisgrid.FacetGrid at 0x13ad039a0>
seaborn.catplot(x = "sex", data = tips, kind = "count")
<seaborn.axisgrid.FacetGrid at 0x13ac625e0>
seaborn.catplot(x = "sex", hue = "smoker", data = tips, kind = "count")
<seaborn.axisgrid.FacetGrid at 0x13ad153d0>
seaborn.catplot(x = "sex", col = "smoker", data = tips, kind = "count")
<seaborn.axisgrid.FacetGrid at 0x13ae2e2e0>
t = pandas.crosstab(pandas.cut(tips.total_bill, bins = 6),
tips["size"],
values = tips.tip, aggfunc = numpy.mean)
seaborn.heatmap(t)
<AxesSubplot:xlabel='size', ylabel='total_bill'>
seaborn.lmplot(x = "total_bill", y = "tip", hue = "sex", col = "sex", data = tips)
<seaborn.axisgrid.FacetGrid at 0x13afbccd0>
p = seaborn.FacetGrid(tips, row = "sex", col = "smoker")
p.map(seaborn.histplot, "total_bill")
<seaborn.axisgrid.FacetGrid at 0x13b078df0>
seaborn.catplot(x = "sex", y = "total_bill", hue = "smoker", data = tips, kind = "box")
<seaborn.axisgrid.FacetGrid at 0x13b1df1c0>
seaborn.catplot(x = "sex", y = "total_bill", hue = "sex", col = "smoker", data = tips,
kind = "point", join = False)
<seaborn.axisgrid.FacetGrid at 0x13b052cd0>
seaborn.catplot(x = "sex", y = "total_bill", hue = "smoker", data = tips, kind = "violin")
<seaborn.axisgrid.FacetGrid at 0x13b2ee490>
seaborn.catplot(x = "sex", y = "total_bill", hue = "smoker", col = "smoker", data = tips,
kind = "strip", jitter = True)
<seaborn.axisgrid.FacetGrid at 0x13b3a8df0>
seaborn.catplot(x = "sex", row = "smoker", col = "time", data = tips, kind = "count")
<seaborn.axisgrid.FacetGrid at 0x13b4dd3a0>
seaborn.catplot(x = "sex", hue = "smoker", col = "time", data = tips, kind = "count")
<seaborn.axisgrid.FacetGrid at 0x13b62d760>
t = pandas.crosstab([tips.smoker, tips.time], tips.sex, normalize = "index")
t = t.reset_index().assign(smoker_time = lambda x: x.smoker + "_" + x.time).drop(columns = ["smoker", "time"])
tm = pandas.melt(t, id_vars = "smoker_time")
tm = tm.assign(value = 100 * tm.value)
seaborn.catplot(x = "smoker_time", y = "value", hue = "sex", data = tm, kind = "bar")
<seaborn.axisgrid.FacetGrid at 0x13aeda0d0>
seaborn¶suptitle dans fig : titre globalset_axis_labels : titre des axespalette : choix d'une palette de couleursheight et aspect : hauteur et ratio entre hauteur et largeur, pour chaque facette (une seule ici)seaborn.set(font_scale=2, style="white")
p = seaborn.catplot(x = "size", y = "tip", hue = "sex", data = tips, kind = "box",
palette = "Set2", height = 8, aspect = 2, legend = False)
p.fig.suptitle("Taille et pourboire en fonction du sexe")
p.set_axis_labels("Nombre de convives", "Pourboire")
matplotlib.pyplot.legend(title='Sexe', loc='upper right')
matplotlib.pyplot.show()