일 | 월 | 화 | 수 | 목 | 금 | 토 |
---|---|---|---|---|---|---|
1 | 2 | 3 | 4 | |||
5 | 6 | 7 | 8 | 9 | 10 | 11 |
12 | 13 | 14 | 15 | 16 | 17 | 18 |
19 | 20 | 21 | 22 | 23 | 24 | 25 |
26 | 27 | 28 | 29 | 30 | 31 |
- do it
- Kaggle
- Kubernetes
- ADP 실기
- 대감집
- python
- 차원 축소
- 쿠버네티스
- 구글
- 타입스크립트
- frontend
- 머신러닝
- 빅쿼리
- r
- docker
- bigquery
- 최적화
- 대감집 체험기
- 리액트
- 클러스터링
- ADP
- DBSCAN
- LDA
- 파이썬
- 캐글
- 프론트엔드
- Machine Learning
- React
- 심층신경망
- TooBigToInnovate
- Today
- Total
목록Kaggle (6)
No Story, No Ecstasy
import pandas as pd pd.plotting.register_matplotlib_converters() import matplotlib.pyplot as plot %matplotlib inline import seaborn as sns # Path of the file to read flight_filepath = "../input/flight_delays.csv" # Read the file into a variable flight_data flight_data = pd.read_csv(flight_filepath, index_col="Month") plt.figure(figsize=(16,6)) # Add title plt.title("Daily Global Streams of Popul..
# Pandas import pandas as pd # Creating pd.DataFrame({'Yes': [50,21], 'No': [131, 2]}) df = pd.DataFrame({'Bob': ['I liked it.', 'It was awful.'], 'Sue': ['Pretty good.', 'Bland.']}, index=['Product A', 'Product B']) print(df) print(pd.Series([30, 35, 40], index=['2015 Sales', '2016 Sales', '2017 Sales'], name='Product A')) #Reading df = pd.read_csv("asdf", index_col=0) 더보기 #Choosing between loc..
1. Feature Engineering # Feature Engineering # Example) improve performance through feature engineering X = df.copy() y = X.pop("CompressiveStrength") # Train and score baseline model baseline = RandomForestRegressor(criterion="mae", random_state=0) baseline_score = cross_val_score(baseline, X, y, cv=5, scoring="neg_mean_absolute_error") baseline_score = -1 * baseline_score.mean() print(f"MAE Ba..
# Data Cleaning import pandas as pd import numpy as np df = pd.DataFrame() # 1. Handling Missing Values # Check missing values count missing_values_count = df.isnull().sum() total_cells = np.product(df.shape) missing_cells = missing_values_count.sum() percent_missing = missing_cells / total_cells * 100 print(percent_missing) # Drop missing values # Row df.dropna() # drop rows if it have at least..
1. 기초 import pandas as pd import numpy as np from sklearn.model_selection import train_test_split # Read the data X_full = pd.read_csv('../input/train.csv', index_col='Id') X_test_full = pd.read_csv('../input/test.csv', index_col='Id') # Remove rows with missing target, separate target from predictors X_full.dropna(axis=0, subset=['SalePrice'], inplace=True) y = X_full.SalePrice X_full.drop(['Sa..
# Basic Data Exploration import pandas as pd data = pd.read_csv('melb_data.csv') print(data.describe()) print(data.dtypes) print(data.head()) # Selecting Data for Modeling print(data.columns) data = data.dropna(axis=0) X = data.copy() #Selecting the prediction target y = X.pop('Price') #print(y.head()) #Choosing "Features" cand_features = ['Rooms', 'Bathroom', 'Landsize', 'Lattitude', 'Longtitud..