I tried titanic data preprocessing
'Age' column is NaN.
usually drop column 'Age'
I think predict 'Age' column value model
because titanic dataset leak data values
so I try to make predict 'Age' value model(:LinearRegression)
through model predict return value subtitude NaN
but I swam worry, that was apply predict values to NaN
train_data.isna().sum()[result]
Survived        0
Sex             0
Age           177
Fare            0
Pclass_2        0
Pclass_3        0
Embarked_Q      0
Embarked_S      0
dtype: int64train_temp = train_data.loc[train_data['Age'].notnull(),]
temp_x = train_temp.drop(columns='Age')
temp_y = train_temp['Age']
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score
model = LinearRegression()
model.fit(temp_x, temp_y)
pred = model.predict(temp_x)
print(r2_score(temp_y, pred))[result]
0.20847354768995863 😅distribute NaN values
xx = train_data.loc[train_data['Age'].isna()]
xxx = xx.drop(columns='Age')
save_index = xxx.reset_index()['index']pred = model.predict(xxx)
temp_df = pd.DataFrame({'index':save_index, 'Age':pred})
temp_df[result]
index	Age
0	5	28.772607
1	17	27.621829
2	19	18.663342
3	26	25.665526
4	28	21.794739
...	...	...
172	859	25.665349
173	863	23.592961
174	868	27.083844
175	878	27.151203
176	888	25.528662
177 rows × 2 columnsfor i in temp_df['index'].values:
    train_data.loc[i, 'Age'] = temp_df.loc[temp_df['index'] == i, 'Age'].valuesI worried this code to make long time
expect more better working model
2022.10.21. first commit