Notice
Recent Posts
Recent Comments
Link
일 | 월 | 화 | 수 | 목 | 금 | 토 |
---|---|---|---|---|---|---|
1 | 2 | 3 | 4 | 5 | 6 | 7 |
8 | 9 | 10 | 11 | 12 | 13 | 14 |
15 | 16 | 17 | 18 | 19 | 20 | 21 |
22 | 23 | 24 | 25 | 26 | 27 | 28 |
29 | 30 | 31 |
Tags
- 클러스터링
- frontend
- python
- TooBigToInnovate
- docker
- DBSCAN
- r
- 캐글
- 심층신경망
- 프론트엔드
- Kubernetes
- LDA
- 타입스크립트
- 파이썬
- 대감집 체험기
- 구글
- ADP
- Machine Learning
- Kaggle
- bigquery
- ADP 실기
- 리액트
- React
- 빅쿼리
- 최적화
- 머신러닝
- 쿠버네티스
- 대감집
- do it
- 차원 축소
Archives
- Today
- Total
No Story, No Ecstasy
[Kaggle Visualization] Python basic code 본문
import pandas as pd
pd.plotting.register_matplotlib_converters()
import matplotlib.pyplot as plot
%matplotlib inline
import seaborn as sns
# Path of the file to read
flight_filepath = "../input/flight_delays.csv"
# Read the file into a variable flight_data
flight_data = pd.read_csv(flight_filepath, index_col="Month")
plt.figure(figsize=(16,6))
# Add title
plt.title("Daily Global Streams of Popular Songs in 2017-2018")
# Line chart showing how FIFA rankings evolved over time
sns.lineplot(data=fifa_data)
# Plot a subset of the data
# Set the width and height of the figure
plt.figure(figsize=(14,6))
# Add title
plt.title("Daily Global Streams of Popular Songs in 2017-2018")
# Line chart showing daily global streams of 'Shape of You'
sns.lineplot(data=spotify_data['Shape of You'], label="Shape of You")
# Line chart showing daily global streams of 'Despacito'
sns.lineplot(data=spotify_data['Despacito'], label="Despacito")
# Add label for horizontal axis
plt.xlabel("Date")
# Bar Charts
# Bar chart showing average arrival delay for Spirit Airlines flights by month
# *Important Note: We always have to use this special notation to select the indexing column.
sns.barplot(x=flight_data.index, y=flight_data['NK'])
# Add label for vertical axis
plt.ylabel("Arrival delay (in minutes)")
# Heatmap
# Heatmap showing average arrival delay for each airline by month
sns.heatmap(data=flight_data, annot=True) #This ensures that the values for each cell appear on the chart.
# Add label for horizontal axis
plt.xlabel("Airline")
# Scatter Plots
sns.scatterplot(x=insurance_data['bmi'], y=insurance_data['charges'])
sns.regplot(x=insurance_data['bmi'], y=insurance_data['charges'])
#Color-coded plots
sns.scatterplot(x=insurance_data['bmi'], y=insurance_data['charges'], hue=insurance_data['smoker'])
sns.lmplot(x="bmi", y="charges", hue="smoker", data=insurance_data)
#Scatter plots for categorical variables
sns.swarmplot(x=insurance_data['smoker'], y=insurance_data['charges'])
# Distributions
#Histogram
sns.distplot(a=iris_data['Petal Length (cm)'], kde=False)
#KDE (Kernel Density Estimate) plot
sns.kdeplot(data=iris_data['Petal Length (cm)'], shade=True)
#2D KDE plot
sns.jointplot(x=iris_data['Petal Length (cm)'], y=iris_data['Sepal Width (cm)'], kind="kde")
#Color-coded plots
sns.distplot(a=iris_set_data['Petal Length (cm)'], label="Iris-setosa", kde=False)
sns.distplot(a=iris_ver_data['Petal Length (cm)'], label="Iris-versicolor", kde=False)
sns.distplot(a=iris_vir_data['Petal Length (cm)'], label="Iris-virginica", kde=False)
plt.title("Histogram of Petal Lengths, by Species")
plt.legend()
'Data Science Series' 카테고리의 다른 글
토픽 모델링 - LDA, Mallet LDA, Guided LDA (0) | 2021.05.27 |
---|---|
HDBSCAN (Hierarchical DBSCAN) (0) | 2021.05.25 |
[Kaggle Pandas] Python basic code (0) | 2021.04.28 |
[Kaggle Feature Engineering] Python basic code (0) | 2021.04.28 |
[Kaggle Data Cleaning] Python basic code (0) | 2021.04.28 |