Notice
Recent Posts
Recent Comments
Link
일 | 월 | 화 | 수 | 목 | 금 | 토 |
---|---|---|---|---|---|---|
1 | 2 | 3 | 4 | 5 | 6 | 7 |
8 | 9 | 10 | 11 | 12 | 13 | 14 |
15 | 16 | 17 | 18 | 19 | 20 | 21 |
22 | 23 | 24 | 25 | 26 | 27 | 28 |
29 | 30 | 31 |
Tags
- 캐글
- 구글
- 차원 축소
- ADP
- LDA
- 머신러닝
- python
- docker
- frontend
- 파이썬
- ADP 실기
- Kubernetes
- 클러스터링
- 쿠버네티스
- do it
- 빅쿼리
- DBSCAN
- r
- Machine Learning
- 최적화
- Kaggle
- 심층신경망
- bigquery
- 대감집 체험기
- 리액트
- 대감집
- TooBigToInnovate
- React
- 타입스크립트
- 프론트엔드
Archives
- Today
- Total
No Story, No Ecstasy
[Kaggle Intro to Machine Learning] Python basic code 본문
Data Science Series
[Kaggle Intro to Machine Learning] Python basic code
heave_17 2021. 4. 28. 21:47# Basic Data Exploration
import pandas as pd
data = pd.read_csv('melb_data.csv')
print(data.describe())
print(data.dtypes)
print(data.head())
# Selecting Data for Modeling
print(data.columns)
data = data.dropna(axis=0)
X = data.copy()
#Selecting the prediction target
y = X.pop('Price')
#print(y.head())
#Choosing "Features"
cand_features = ['Rooms', 'Bathroom', 'Landsize', 'Lattitude', 'Longtitude']
X = X[cand_features]
#print(X.describe())
#Building your model
from sklearn.tree import DecisionTreeRegressor
home_model = DecisionTreeRegressor(random_state=1)
home_model.fit(X, y)
print(y.head())
print(home_model.predict(X.head()))
# Model Validation
# There are many metrics for summarizing model quality, but we'll start with one called Mean Absolute Error (also called MAE).
# error = actual - predicted
from sklearn.metrics import mean_absolute_error
predicted_home_prices = home_model.predict(X)
print(mean_absolute_error(y, predicted_home_prices))
#Split validation data
from sklearn.model_selection import train_test_split
train_X, val_X, train_y, val_y = train_test_split(X, y, random_state=0)
home_model = DecisionTreeRegressor()
home_model.fit(train_X, train_y)
val_predictions = home_model.predict(val_X)
print(mean_absolute_error(val_y, val_predictions))
# Underfitting and Overfitting
#Experimenting with Different Models
def get_mae(max_leaf_nodes, train_X, val_X, train_y, val_y):
model = DecisionTreeRegressor(max_leaf_nodes=max_leaf_nodes, random_state=0)
model.fit(train_X, train_y)
preds_val = model.predict(val_X)
mae = mean_absolute_error(val_y, preds_val)
return(mae)
for max_leaf_nodes in [5, 50, 500, 5000]:
my_mae = get_mae(max_leaf_nodes, train_X, val_X, train_y, val_y)
print("Max leaf nodes: %d \t\t Mean Absolute Error: %d" %(max_leaf_nodes, my_mae))
# Random Forests
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_absolute_error
forest_model = RandomForestRegressor(random_state=1)
forest_model.fit(train_X, train_y)
preds = forest_model.predict(val_X)
print(mean_absolute_error(val_y, preds))
'Data Science Series' 카테고리의 다른 글
[Kaggle Data Cleaning] Python basic code (0) | 2021.04.28 |
---|---|
[Kaggle Intermediate Machine Learning] Python basic code (0) | 2021.04.28 |
[ADP 실기 with R] 11. 텍스트마이닝: 문자열 전처리, 한국어(KoNLP), 영어(SnowballC), SNA (0) | 2020.12.12 |
[ADP 실기 with R] 10. 시계열 분석 (Time Series Analysis) (0) | 2020.12.12 |
[ADP 실기 with R] 9. 연관분석: Apriori, FP-Growth (0) | 2020.12.12 |