Exploratory Data Analysis

San Sung 'Paul' Park·2022년 4월 4일
0

A process of reordering and restructuring data in a manner that is fit for analysis. An essential process that helps the user understand the data he/she is working with. Also useful in setting hypotheses as well as providing visual aid about the data at hand.

Main Processes:
1. Data Visualization -- Finding Patterns
2. Checking for Anomalies -- missing values, duplicates etc.
3. Setting Hypotheses -- using statistics and graphic aid.

Two Types of EDA:
1. Graphical - checks/explains data using charts or figures
2. Non-Graphical - checks/explains data using summary statistics

The Object of EDA:

  • To measure/observe the inter/intra relationship between variable(s)
  1. Univariate - 1 variable

    • Numerical Data: Statistics(Mean, Mode, Outliers etc.)
    • Categorical Data: Frequency Distributions, Cross Tabulations etc.
    • Charts/Graphs Used: Histogram, Pie Chart, Box Pl0t etc.
  2. Multivariate - multiple variables

    • Cross Tabulations, Covariance, Correlation, etc.
    • Charts/Graphs Used: Box Plots, Stacked Bar Charts etc.

The Cycle of Data Preprocessing:
1. Data Cleaning - Handling Missing Values, Noise, Anomalies etc.
2. Data Merging - Combining Data
3. Data Transformation - Scaling. Converting from one format to another
4. Dimensionality Reduction - Picking out the most meaningful data (ex. Principle Component Analysis)

profile
a Philosopher aspiring to become an AI/ML/DL Engineer and Data Scientist.

0개의 댓글