Statsmodel _ 1

๊น€์ง€์œคยท2023๋…„ 4์›” 15์ผ
0

Numpy

๋ชฉ๋ก ๋ณด๊ธฐ
10/11
post-thumbnail

abalone data

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy import stats
import statsmodels.api as sm
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/abalone/abalone.data'
pd_data = pd.read_csv(url, header=None)
#print(pd_data.head())
np_data = pd_data.to_numpy()

.
.

๐Ÿ›ปLength : ๋…๋ฆฝ๋ณ€์ˆ˜ / Diameter : ์ข…์†๋ณ€์ˆ˜

โŒจ๏ธ plot, regression line

ยป indep_var (๋…๋ฆฝ๋ณ€์ˆ˜, x) : Length
ยป dep_var (์ข…์†๋ณ€์ˆ˜, y) : Diameter

x = np_data[:, 1].astype(np.float64)  # Length
y = np_data[:, 2].astype(np.float64)  # Diameter

fit_line = np.polyfit(x,y,1)  # regresstion line ์ถ”์ •
f = np.poly1d(fit_line)   # 
print(f)

# result 
# 0.8155 x - 0.01941
_, axe = plt.subplots()
axe.scatter(x,y)
axe.plot(x, fit_line[0]*x + fit_line[1],color="y")

โŒจ๏ธ statsmodels

  • OLS : Ordinary Least Square ํ•จ์ˆ˜
x = sm.add_constant(x)   # constant๋ฅผ ๊ณ„์‚ฐํ•  ๊ณต๊ฐ„์„ ํ•˜๋‚˜ ์ค˜์•ผํ•จ.
print(x)

# result
# [[1.    0.455]
   [1.    0.35 ]
   [1.    0.53 ]
   ...
   [1.    0.6  ]
   [1.    0.625]
   [1.    0.71 ]]
reg_model = sm.OLS(y,x)
reg_result = reg_model.fit()

reg_result.summary()
reg_result.params

# result 
# array([-0.01941371,  0.81546069]) : [y์ ˆํŽธ, x๊ณ„์ˆ˜]

reg_result.rsquared   
# result : 0.9737971035056835

.
.

๐Ÿ›ป Length, Diameter, Height : ๋…๋ฆฝ๋ณ€์ˆ˜ / Rings (๋‚˜์ด) : ์ข…์†๋ณ€์ˆ˜

ยป indep_var (๋…๋ฆฝ๋ณ€์ˆ˜, x1, x2, x3) : Length(x1), Diameter(x2), Height(x3)
ยป dep_var (์ข…์†๋ณ€์ˆ˜, y) : Rings

x = np_data[:,1:4].astype(np.float64) # Length, Diameter, Height
y = np_data[:,-1].astype(np.float64)  # Rings (๋‚˜์ด)

x = sm.add_constant(x)
reg_result = sm.OLS(y, x).fit()
reg_result.summary()

โ€ฃ ์ถ”์ •๋œ ํšŒ๊ท€์‹ :
y = 2.8365 - 11.9327ร—Length +25.7661ร—Diameter + 20.3582ร—Height

profile
๋ฐ์ดํ„ฐ ๋ถ„์„ / ๋ฐ์ดํ„ฐ ์‚ฌ์ด์–ธํ‹ฐ์ŠคํŠธ / AI ๋”ฅ๋Ÿฌ๋‹

0๊ฐœ์˜ ๋Œ“๊ธ€