Feature Engineering

San Sung 'Paul' Park·2022년 4월 13일
0

Feature = Column or a Dimension of a DataFrame

Feature Engineering = Combining/Restructuring the existing datasets to create a new feature

Important Steps:

  • dtypes/df.info()
    - Can check data type and other basic information
  • Missing Values
    - NaN/None, Null, NA etc.
    • Ways to Deal with Missing Values:
      1. isnull() - returns T/F Booleans for missing values
      2. notnull() - the opposite of isnull()
      3. dropna() - drops missing values
      4. fillna() - replaces the missing value with another (ex. 0, mean, mode, max etc.)
      5. sum() -- used to count the total number of existing missing values
  • Strings -> Numerics:
    - 25,970 + 82,524 should equal 108,464, but Python will read it as 25,97082,524 because the data above are both strings
    - Ways to Convert into Numerical Data
    1. string replace - string variable.replace("delete",") (will replace into white space)
    2. type-casting -- uses built-in functions:
    - int() = returns integers
    - str() = returns strings
    - float() = returns floats
    3. As Functions
    - Make your own function to convert data type
    - Then Apply
def toInt(string):
  return int(string.replace(',',''))

df['income'].apply(toInt)
profile
a Philosopher aspiring to become an AI/ML/DL Engineer and Data Scientist.

0개의 댓글