๐Ÿ“‘[Project] Olist E-Commerce ๋ฐ์ดํ„ฐ ๋ถ„์„ ํ”„๋กœ์ ํŠธ ๋ฆฌ๋ทฐ

thisk336ยท2023๋…„ 10์›” 28์ผ
6

SQL

๋ชฉ๋ก ๋ณด๊ธฐ
17/17
post-thumbnail

ํ”„๋กœ์ ํŠธ ๊ฐœ์š”

  • ํŒ€ ๊ตฌ์„ฑ : 3๋ช…
  • ์ผ์ • : 10/17 ~ 10/27
  • ํ™œ์šฉ ๋ฐ์ดํ„ฐ ์…‹ : Brazilian E-Commerce Public Dataset by Olist
    ์บ๊ธ€-Olist Dataset
  • ํ™œ์šฉ ์–ธ์–ด : SQL, Python, ์‹œ๊ฐํ™” ํˆด : Python, Excel, Power BI

๋ฐ์ดํ„ฐ ์…‹ ์†Œ๊ฐœ

  • ํ™œ์šฉ ๋ฐ์ดํ„ฐ ์…‹์€ Kaggle์— ์žˆ๋Š” Olist E-Commerce ๋ฐ์ดํ„ฐ ์…‹์œผ๋กœ ๋ธŒ๋ผ์งˆ ์ „์ž์ƒ๊ฑฐ๋ž˜ ๋ฐ์ดํ„ฐ๋‹ค. 2016๋…„๋ถ€ํ„ฐ 2018๋…„๊นŒ์ง€ ๋งˆ์ผ“ํ”Œ๋ ˆ์ด์Šค์—์„œ ์ด๋ฃจ์–ด์ง„ ์•ฝ 10๋งŒ ๊ฑด์˜ ์ฃผ๋ฌธ ์ •๋ณด๊ฐ€ ํฌํ•จ๋˜์–ด ์žˆ์œผ๋ฉฐ, ๊ฐ๊ฐ์˜ ๋ฐ์ดํ„ฐ ์…‹๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.

๋ฐ์ดํ„ฐ ์ „์ฒ˜๋ฆฌ

โ–ถ olist_orders_dataset ํ…Œ์ด๋ธ”
1-1. order_status = 'canceled' ํ–‰ ์‚ญ์ œ
1-2. ํŒ๋งค ์‚ฌ์ดํด์— ๋งž์ง€ ์•Š๋Š” ๋ฐ์ดํ„ฐ ์‚ญ์ œ

SELECT * FROM olist_orders_dataset WHERE order_status = 'canceled';
DELECT FROM olist_orders_dataset WHERE order_status = 'canceled';

โ–ถ olist_sellers_dataset ํ…Œ์ด๋ธ”
2. seller_city๊ฐ€ seller_state์— ํฌํ•จ๋˜์ง€ ์•Š์€ ์ด์Šˆ ๋ฐœ์ƒ

SELECT SD.seller_id, SD.seller_zip_code_prefix, GD.geolocation_state
FROM olist_sellers_dataset SD
JOIN 
(SELECT 
	DISTINCT geolocation_zip_code_prefix AS zip_code, 
    geolocation_state FROM olist_geolocation_dataset) GD 
ON SD.seller_zip_code_prefix = GD.zip_code

-> ์ด์Šˆ ํ•ด๊ฒฐ

โ–ถ ๋ชจ๋“  ๋ฐ์ดํ„ฐ ์กฐ์ธ

# ๋ฐ์ดํ„ฐ ํ•ฉ์น˜๊ธฐ
olist_df = pd.merge(orders_df, order_payments_df, on = 'order_id')
olist_df = olist_df.merge(customer_df, on = 'customer_id')
olist_df = olist_df.merge(order_items_df, on = 'order_id')
olist_df = olist_df.merge(products_df, on = 'product_id')
olist_df = olist_df.merge(category_name_df, on = 'product_category_name')
olist_df = olist_df.merge(order_reviews_df, on = 'order_id')
olist_df = olist_df.merge(sellers_df, on = 'seller_id')

olist_df.info()

# ์ค‘๋ณต ํ–‰ ์ œ๊ฑฐ
olist_df.duplicated(subset = 'order_id').value_counts()
olist_df = olist_df[~olist_df.duplicated(['order_id'])]

# ์ธ๋ฑ์Šค ์ดˆ๊ธฐํ™”
olist_df = olist_df.reset_index()
olist_df = olist_df.drop(columns = {'index'})

โ–ถ ๋‚ ์งœ ์ปฌ๋Ÿผ ํ˜•๋ณ€ํ™˜ํ›„ ์—ฐ, ์›”, ์š”์ผ๋ณ„๋กœ ๋ถ„ํ•ด

# datetime ํ˜•์‹์œผ๋กœ ๊ตฌ๋งค ์ผ์ž ๋ณ€๊ฒฝ
date_time = olist_df['order_purchase_timestamp'].str.split()

date_list = []
time_list = []
for x in range(date_time.shape[0]) :
    date_list.append(date_time[x][0])
    time_list.append(date_time[x][1])
    
olist_df['purchase_date'], olist_df['purchase_time'] = date_list, time_list
olist_df = olist_df.drop(columns = {'order_purchase_timestamp'})

olist_df['purchase_date'] = pd.to_datetime(olist_df['purchase_date'])

# ๊ตฌ๋งค ์ผ์ž๋ฅผ ์—ฐ, ์›”, ์š”์ผ๋ณ„๋กœ ๋ถ„ํ•ด
olist_df['year'] = olist_df['purchase_date'].dt.year
olist_df['month'] = olist_df['purchase_date'].dt.month
olist_df['day_of_week'] = olist_df['purchase_date'].dt.day_name()

# olist_df['purchase_time']์— ์‹œ๊ฐ„ ๊ฐ’๋งŒ ์ €์žฅ
olist_df['purchase_time'] = olist_df['purchase_time'].str.slice(0, 2)

โ–ถ ๋ธŒ๋ผ์งˆ์˜ ์ฃผ ์ด๋ฆ„์˜ ํ•œ๊ธ€ ๋ช…์„ ์ปฌ๋Ÿผ์œผ๋กœ ๋”ฐ๋กœ ์ €์žฅ(์‹œ๊ฐํ™”์— ์šฉ์ดํ•˜๊ธฐ ์œ„ํ•ด)

# ์ฃผ ์ด๋ฆ„์„ ํ•œ๊ธ€๋กœ ๋ฐ”๊พธ๋Š” ํ•จ์ˆ˜
def get_kor_state(state) :
    if state == 'AC' :
        state_kor_name = '์•„ํฌ๋ฆฌ์ฃผ'
    elif state == 'AL' :
        state_kor_name = '์•Œ๋ผ๊ณ ์•„์Šค์ฃผ'
    elif state == 'AP' :
        state_kor_name = '์•„๋งˆํŒŒ์ฃผ'
    elif state == 'AM' :
        state_kor_name = '์•„๋งˆ์กฐ๋‚˜์Šค์ฃผ'
    elif state == 'BA' :
        state_kor_name = '๋ฐ”์ด์•„์ฃผ'
    elif state == 'CE' :
        state_kor_name = '์„ธ์•„๋ผ์ฃผ'
    elif state == 'DF' :
        state_kor_name = '์—ฐ๋ฐฉ๊ตฌ'
    elif state == 'ES' :
        state_kor_name = '์ด์Šคํ”ผ๋ฆฌํˆฌ์‚ฐํˆฌ์ฃผ'
    elif state == 'GO' :
        state_kor_name = '๊ณ ์ด์•„์Šค์ฃผ'
    elif state == 'MA' :
        state_kor_name = '๋งˆ๋ผ๋ƒฅ์ฃผ'
    elif state == 'MT' :
        state_kor_name = '๋งˆํˆฌ๊ทธ๋กœ์ˆ˜์ฃผ'
    elif state == 'MG' :
        state_kor_name = '๋ฏธ๋‚˜์Šค์ œ๋ผ์ด์Šค์ฃผ'
    elif state == 'PA' :
        state_kor_name = 'ํŒŒ๋ผ์ฃผ'
    elif state == 'PB' :
        state_kor_name = 'ํŒŒ๋ผ์ด๋ฐ”์ฃผ'
    elif state == 'PR' :
        state_kor_name = 'ํŒŒ๋ผ๋‚˜์ฃผ'
    elif state == 'PE' :
        state_kor_name = 'ํŽ˜๋ฅด๋‚จ๋ถ€์ฟ ์ฃผ'
    elif state == 'PI' :
        state_kor_name = 'ํ”ผ์•„์šฐ์ด์ฃผ'
    elif state == 'RJ' :
        state_kor_name = '๋ฆฌ์šฐ๋ฐ์ž๋„ค์ด๋ฃจ์ฃผ'
    elif state == 'RN' :
        state_kor_name = 'ํžˆ์šฐ๊ทธ๋ž€์ง€๋‘๋…ธ๋ฅด์น˜์ฃผ'
    elif state == 'RS' :
        state_kor_name = 'ํžˆ์šฐ๊ทธ๋ž€์ง€๋‘์ˆ ์ฃผ'
    elif state == 'RO' :
        state_kor_name = 'ํ˜ผ๋„๋‹ˆ์•„์ฃผ'
    elif state == 'RR' :
        state_kor_name = 'ํ˜ธ๋ผ์ด๋งˆ์ฃผ'
    elif state == 'SC' :
        state_kor_name = '์‚ฐํƒ€์นดํƒ€๋ฆฌ๋‚˜์ฃผ'
    elif state == 'SP' :
        state_kor_name = '์ƒํŒŒ์šธ๋ฃจ์ฃผ'
    elif state == 'SE' :
        state_kor_name = '์„ธ๋ฅด์ง€ํ”ผ์ฃผ'
    elif state == 'MS' :
        state_kor_name = '๋งˆํˆฌ๊ทธ๋กœ์ˆ˜๋‘์ˆ '
    else :
        state_kor_name = 'ํ† ์นธ์นญ์Šค์ฃผ'
    return state_kor_name
    
# ์ฃผ์˜ ํ•œ๊ธ€ ์ด๋ฆ„ ์ปฌ๋Ÿผ ์ถ”๊ฐ€
for index, row in olist_df.iterrows():
    state = row['customer_state']
    kor_name = get_kor_state(state)
    olist_df.at[index,'kor_state'] = kor_name

โ–ถ ์ฃผ์˜ ์œ„์น˜์— ๋”ฐ๋ผ ์ง€์—ญ๋ณ„๋กœ ๋‚˜๋ˆ„์–ด ์ƒˆ๋กœ์šด ์ปฌ๋Ÿผ์— ์ €์žฅ

# state๋ฅผ ์ž…๋ ฅ๋ฐ›์œผ๋ฉด ์ง€์—ญ์„ returnํ•˜๋Š” ํ•จ์ˆ˜
def get_region(state) :
    if (state == 'SP' or state == 'MG' or state == 'ES' or state == 'RJ') : 
        region = '๋‚จ๋™๋ถ€'
    elif (state == 'PR' or state == 'SC' or state == 'RS') : 
        region = '๋‚จ๋ถ€'
    elif (state == 'BA' or state == 'PE' or state == 'CE' or state == 'RN' or state == 'PI' or state == 'MA' or state == 'SE' or state == 'AL' or state == 'PB') : 
        region = '๋ถ๋™๋ถ€'
    elif (state == 'GO' or state == 'MT' or state == 'MS' or state == 'DF') : 
        region = '์ค‘์„œ๋ถ€'
    else : 
        region = '๋ถ๋ถ€'
    return region
    
# ํŒ๋งค์ž ๊ฑฐ์ฃผ ์ง€์—ญ ์ปฌ๋Ÿผ ์ถ”๊ฐ€
for index, row in olist_df.iterrows():
    state = row['seller_state']
    region = get_region(state)
    olist_df.at[index,'seller_region'] = region

# ๊ตฌ๋งค์ž ๊ฑฐ์ฃผ ์ง€์—ญ ์ปฌ๋Ÿผ ์ถ”๊ฐ€
for index, row in olist_df.iterrows():
    state = row['customer_state']
    region = get_region(state)
    olist_df.at[index,'customer_region'] = region

EDA

์ฃผ๋ณ„ ๊ณ ๊ฐ ๋ถ„ํฌ

customer_state_df = olist_df['customer_state']
customer_state_df = pd.DataFrame(customer_state_df.value_counts())
customer_state_df = customer_state_df.reset_index().rename(columns = {'index' : 'state'})

px.pie(customer_state_df, values='customer_state', names='state', 
       color_discrete_sequence=px.colors.qualitative.Pastel1 + px.colors.qualitative.Pastel2)

  • ๊ณ ๊ฐ์ด ๊ฐ€์žฅ ๋งŽ์ด ๊ฑฐ์ฃผํ•˜๊ณ  ์žˆ๋Š” ์ฃผ๋กœ๋Š” ์ƒํŒŒ์šธ๋ฃจ์ฃผ๋กœ ๊ณ ๊ฐ์˜ 42%๊ฐ€ ์ƒํŒŒ์šธ๋ฃจ์ฃผ์— ๊ฑฐ์ฃผํ•˜๊ณ  ์žˆ๋Š” ๊ฒƒ์„ ์•Œ ์ˆ˜ ์žˆ๋‹ค. ๋‹ค์Œ์œผ๋กœ๋Š” ๋ฆฌ์šฐ๋ฐ์ž๋„ค์ด๋ฃจ์ฃผ, ๋ฏธ๋‚˜์Šค์ œ๋ผ์ด์Šค์ฃผ, ํžˆ์šฐ๊ทธ๋ž€์ง€๋‘์ˆ ์ฃผ ์ˆœ์œผ๋กœ ์ด์–ด์ง€๋ฉฐ ์ด๋Š” ์ฃผ๋ณ„ ์ธ๊ตฌ์ˆ˜์™€ ๋น„์Šทํ•œ ๊ฒฐ๊ณผ๋ฅผ ๋‚˜ํƒ€๋‚ด๊ณ  ์žˆ๋‹ค.

์นดํ…Œ๊ณ ๋ฆฌ ๋ณ„ ์ฃผ๋ฌธ ๊ฑด์ˆ˜

category_df = pd.DataFrame(olist_df['product_category_name_english'].value_counts())
category_df = category_df.reset_index().rename(columns = {'index' : 'category', 'product_category_name_english' : 'count'})
category_df

px.bar(category_df, x="category", y='count',
       labels={"category":"์นดํ…Œ๊ณ ๋ฆฌ","count":"์ด ์ฃผ๋ฌธ๊ฑด์ˆ˜"}, 
       title='์นดํ…Œ๊ณ ๋ฆฌ๋ณ„ ์ฃผ๋ฌธ ๊ฑด์ˆ˜', color = px.colors.qualitative.Pastel1 + px.colors.qualitative.Pastel2 + px.colors.qualitative.Pastel + px.colors.qualitative.Light24 + px.colors.qualitative.Safe + px.colors.qualitative.Set2, 
       color_discrete_map='identity')

  • ๋‹ค์Œ ๊ทธ๋ž˜ํ”„์—์„œ๋Š” ์นดํ…Œ๊ณ ๋ฆฌ๋ณ„ ์ฃผ๋ฌธ ๊ฑด์ˆ˜๋ฅผ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค. ์ฃผ๋ฌธ ๊ฑด์ˆ˜๊ฐ€ ๊ฐ€์žฅ ๋งŽ์€ ์ƒ์œ„ 5๊ฐœ์˜ ์นดํ…Œ๊ณ ๋ฆฌ๋กœ๋Š” 'bed_bath_table', 'health_beauty', 'sports_leisure', 'computers_accessories', 'furniture_decor' ์ˆœ์ด๊ณ , ์ฃผ๋ฌธ ๊ฑด์ˆ˜๊ฐ€ ๊ฐ€์žฅ ์ ์€ ํ•˜์œ„ 5๊ฐœ์˜ ์นดํ…Œ๊ณ ๋ฆฌ๋Š” 'home_comfort_2', 'la_cuisine', 'cds_dvds_musicals', 'fashion_childrens_clothes', 'security_and_services' ์ด๋‹ค.

๋งค์ถœ์•ก ์ถ”์ด

date_purchase_df = olist_df[['payment_value', 'purchase_date', 'year', 'month', 'day_of_week']]
date_purchase_df['purchase_date'].describe()

  • ์ทจ์†Œ๋œ ์ฃผ๋ฌธ ๊ฑด์ˆ˜๋ฅผ ์ œ์™ธํ•˜๋ฉด 2016๋…„ 9์›” 4์ผ๋ถ€ํ„ฐ 2018๋…„ 9์›” 3์ผ ๊นŒ์ง€์˜ ๋ฐ์ดํ„ฐ์ธ๊ฒƒ์„ ์•Œ ์ˆ˜ ์žˆ๋‹ค.
date_purchase_pt = pd.pivot_table(data = date_purchase_df, 
                                   index = 'purchase_date', 
                                   values = 'payment_value', 
                                   aggfunc = 'sum').reset_index()
date_purchase_pt

  • ํ”ผ๋ฒ—ํ…Œ์ด๋ธ”์˜ aggfunc์„ ํ•ฉ๊ณ„๋กœ ์„ค์ •ํ•˜์—ฌ ์ผ๋ณ„ ๋งค์ถœ์•ก์„ ์•Œ์•„๋ณด๊ณ ์ž ํ•œ๋‹ค.
px.line(date_purchase_pt, x = 'purchase_date', y = 'payment_value',
        labels={"purchase_date":"๋‚ ์งœ","payment_value":"๋งค์ถœ์•ก"},
        title='๋งค์ถœ์•ก ์ถ”์ด')

  • ๋งค์ถœ์•ก ์ถ”์ด๋ฅผ ๋‚˜ํƒ€๋‚ธ ๊ทธ๋ž˜ํ”„๋กœ, ์„œ๋น„์Šค๊ฐ€ ์˜คํ”ˆํ•œ ํ›„๋ถ€ํ„ฐ 2017๋…„ ์ดˆ๊นŒ์ง€๋Š” ๋งค์ถœ์•ก์ด ์ €์กฐํ•˜๋‹ค๊ฐ€ 2017๋…„ 2~3์›” ๋ถ€ํ„ฐ ๋งค์ถœ์•ก์ด ์ฆ๊ฐ€ํ•˜๊ธฐ ์‹œ์ž‘ํ•˜๋Š” ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค. ๋‹ค์Œ ๊ทธ๋ž˜ํ”„์—์„œ ์œ ๋… ๋งค์ถœ์•ก์ด ๋น„์ •์ƒ์ ์œผ๋กœ ์ƒ์Šนํ•œ ๋‚ ์งœ๋ฅผ ์‚ดํŽด๋ณผ ์ˆ˜ ์žˆ๋Š”๋ฐ ์ด๋Š” 2017๋…„ 11์›” 24์ผ๋กœ ์ด ๋‚ ์€ ๋ธ”๋ž™ํ”„๋ผ์ด๋ฐ์ด๋ผ๊ณ  ํ•œ๋‹ค.

์—ฐ๋„๋ณ„ ๋งค์ถœ์•ก

# ์—ฐ๋„๋ณ„ ๋งค์ถœ์•ก์˜ ํ•ฉ๊ณ„ ํ”ผ๋ฒ—ํ…Œ์ด๋ธ” ์ƒ์„ฑ
year_purchase_pt = pd.pivot_table(data = date_purchase_df, 
                                  index = 'year',
                                  values = 'payment_value',
                                  aggfunc = 'sum').reset_index()
# ์—ฐ๋„๋ณ„ ์ฃผ๋ฌธ ๊ฑด์ˆ˜ ํ”ผ๋ฒ—ํ…Œ์ด๋ธ” ์ƒ์„ฑ
year_purchase_pt2 = pd.pivot_table(data = date_purchase_df, 
                                  index = 'year',
                                  values = 'payment_value',
                                  aggfunc = 'count').reset_index()
year_purchase_pt = pd.merge(year_purchase_pt, year_purchase_pt2, on = 'year')
year_purchase_pt = year_purchase_pt.rename(columns = {'payment_value_x' : 'value_sum', 'payment_value_y' : 'value_count'})
year_purchase_pt

# ๋ง‰๋Œ€๊ทธ๋ž˜ํ”„ = ์—ฐ๋„๋ณ„ ์ฃผ๋ฌธ ๊ฑด์ˆ˜, ์„ ๊ทธ๋ž˜ํ”„ = ์—ฐ๋„๋ณ„ ๋งค์ถœ์•ก
matplotlib.rc_file_defaults()
ax1 = sns.set_style(style=None, rc=None )

fig, ax1 = plt.subplots(figsize=(12,6))

sns.lineplot(data = year_purchase_pt['value_sum'], marker='o', sort = False, ax=ax1)
ax2 = ax1.twinx()

sns.barplot(year_purchase_pt, x='year', y='value_count', alpha=0.5, ax=ax2)

  • ํ•ด๋‹น ๊ทธ๋ž˜ํ”„๋Š” ์—ฐ๋„๋ณ„ ๋งค์ถœ์•ก๊ณผ ์ฃผ๋ฌธ๊ฑด์ˆ˜๋ฅผ ์‹œ๊ฐํ™”ํ•œ ๊ทธ๋ž˜ํ”„๋กœ 2016๋…„๋ถ€ํ„ฐ 2018๋…„๊นŒ์ง€ ์‹œ๊ฐ„์ด ์ง€๋‚ ์ˆ˜๋ก ๋งค์ถœ์•ก๊ณผ ์ฃผ๋ฌธ๊ฑด์ˆ˜๊ฐ€ ์ฆ๊ฐ€ํ–ˆ์Œ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค.

์›”๋ณ„ ๋งค์ถœ์•ก

# ์›”๋ณ„ ๋งค์ถœ์•ก์˜ ํ•ฉ๊ณ„ ํ”ผ๋ฒ—ํ…Œ์ด๋ธ” ์ƒ์„ฑ
month_purchase_pt = pd.pivot_table(data = date_purchase_df,
                                   index = 'month',
                                   values = 'payment_value',
                                   aggfunc = 'sum').reset_index()
# ์›”๋ณ„ ์ฃผ๋ฌธ ๊ฑด์ˆ˜ ํ”ผ๋ฒ—ํ…Œ์ด๋ธ” ์ƒ์„ฑ
month_purchase_pt2 = pd.pivot_table(data = date_purchase_df, 
                                  index = 'month',
                                  values = 'payment_value',
                                  aggfunc = 'count').reset_index()
month_purchase_pt = pd.merge(month_purchase_pt, month_purchase_pt2, on = 'month')
month_purchase_pt = month_purchase_pt.rename(columns = {'payment_value_x' : 'value_sum', 'payment_value_y' : 'value_count'})
month_purchase_pt

# ๋ง‰๋Œ€๊ทธ๋ž˜ํ”„ = ์›”๋ณ„ ์ฃผ๋ฌธ ๊ฑด์ˆ˜, ์„ ๊ทธ๋ž˜ํ”„ = ์›”๋ณ„ ๋งค์ถœ์•ก
matplotlib.rc_file_defaults()
ax1 = sns.set_style(style=None, rc=None )

fig, ax1 = plt.subplots(figsize=(12,6))

sns.lineplot(data = month_purchase_pt['value_sum'], marker='o', sort = False, ax=ax1)
ax2 = ax1.twinx()

sns.barplot(month_purchase_pt, x='month', y='value_count', alpha=0.5, ax=ax2)

  • ํ•ด๋‹น ๊ทธ๋ž˜ํ”„๋Š” ์›”๋ณ„ ๋งค์ถœ์•ก๊ณผ ์ฃผ๋ฌธ ๊ฑด์ˆ˜๋ฅผ ์‹œ๊ฐํ™”ํ•œ ๊ทธ๋ž˜ํ”„๋กœ 9 ~ 12์›” ๋ฐ์ดํ„ฐ์™€ 1 ~ 2์›” ๋ฐ์ดํ„ฐ๊ฐ€ ์„œ๋น„์Šค๋ฅผ ์˜คํ”ˆํ•œ์ง€ ์–ผ๋งˆ ๋˜์ง€์•Š์•˜๋˜ 2016๋…„ ๋ฐ์ดํ„ฐ์™€ 2017๋…„ ๋ฐ์ดํ„ฐ ๋ฐ–์— ์กด์žฌํ•˜์ง€ ์•Š์•„ ๋‹ค๋ฅธ ๋‹ฌ๋ณด๋‹ค ๋งค์ถœ์•ก๊ณผ ์ฃผ๋ฌธ๊ฑด์ˆ˜๊ฐ€ ์ ๊ฒŒ ํ‘œํ˜„๋œ ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค. ์ด๋ฅผ ์ œ์™ธํ•˜๊ณ  5์›”๊ณผ 8์›”์— ๋งค์ถœ์•ก๊ณผ ์ฃผ๋ฌธ ๊ฑด์ˆ˜๊ฐ€ ๋‹ค๋ฅธ ๋‹ฌ์— ๋น„ํ•ด ๋†’์€ ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค.

์š”์ผ๋ณ„ ๋งค์ถœ์•ก

# ์š”์ผ๋ณ„ ๋งค์ถœ์•ก์˜ ํ•ฉ๊ณ„ ํ”ผ๋ฒ—ํ…Œ์ด๋ธ” ์ƒ์„ฑ
dayofweek_purchase_pt = pd.pivot_table(data = date_purchase_df,
                                       index = 'day_of_week',
                                       values = 'payment_value',
                                       aggfunc = 'sum').reset_index()
# ์š”์ผ๋ณ„ ์ฃผ๋ฌธ ๊ฑด์ˆ˜ ํ”ผ๋ฒ—ํ…Œ์ด๋ธ” ์ƒ์„ฑ
dayofweek_purchase_pt2 = pd.pivot_table(data = date_purchase_df, 
                                  index = 'day_of_week',
                                  values = 'payment_value',
                                  aggfunc = 'count').reset_index()
dayofweek_purchase_pt = pd.merge(dayofweek_purchase_pt, dayofweek_purchase_pt2, on = 'day_of_week')
dayofweek_purchase_pt = dayofweek_purchase_pt.rename(columns = {'payment_value_x' : 'value_sum', 'payment_value_y' : 'value_count'})

# ์ธ๋ฑ์Šค ์žฌ๋ฐฐ์—ด
dayofweek_purchase_pt = dayofweek_purchase_pt.reindex([3, 1, 5, 6, 4, 0, 2])
dayofweek_purchase_pt = dayofweek_purchase_pt.reset_index()
dayofweek_purchase_pt = dayofweek_purchase_pt.drop(columns = {'index'})
dayofweek_purchase_pt

# ๋ง‰๋Œ€๊ทธ๋ž˜ํ”„ = ์š”์ผ๋ณ„ ์ฃผ๋ฌธ ๊ฑด์ˆ˜, ์„ ๊ทธ๋ž˜ํ”„ = ์š”์ผ๋ณ„ ๋งค์ถœ์•ก
matplotlib.rc_file_defaults()
ax1 = sns.set_style(style=None, rc=None )

fig, ax1 = plt.subplots(figsize=(12,6))

sns.lineplot(data = dayofweek_purchase_pt['value_sum'], marker='o', sort = False, ax=ax1)
ax2 = ax1.twinx()

sns.barplot(dayofweek_purchase_pt, x='day_of_week', y='value_count', alpha=0.5, ax=ax2)

์‹œ๊ฐ„๋Œ€๋ณ„ ๋งค์ถœ์•ก

time_purchase_df = olist_df[['payment_value', 'purchase_time']]

# ์‹œ๊ฐ„๋ณ„ ๋งค์ถœ์•ก์˜ ํ•ฉ๊ณ„ ํ”ผ๋ฒ—ํ…Œ์ด๋ธ” ์ƒ์„ฑ
time_purchase_pt = pd.pivot_table(data = time_purchase_df, 
                                   index = 'purchase_time', 
                                   values = 'payment_value', 
                                   aggfunc = 'sum').reset_index()
# ์‹œ๊ฐ„๋ณ„ ์ฃผ๋ฌธ ๊ฑด์ˆ˜ ํ”ผ๋ฒ—ํ…Œ์ด๋ธ” ์ƒ์„ฑ
time_purchase_pt2 = pd.pivot_table(data = time_purchase_df, 
                                  index = 'purchase_time',
                                  values = 'payment_value',
                                  aggfunc = 'count').reset_index()
time_purchase_pt = pd.merge(time_purchase_pt, time_purchase_pt2, on = 'purchase_time')
time_purchase_pt = time_purchase_pt.rename(columns = {'payment_value_x' : 'value_sum', 'payment_value_y' : 'value_count'})
time_purchase_pt

# ๋ง‰๋Œ€๊ทธ๋ž˜ํ”„ = ์‹œ๊ฐ„๋Œ€๋ณ„ ์ฃผ๋ฌธ ๊ฑด์ˆ˜, ์„ ๊ทธ๋ž˜ํ”„ = ์‹œ๊ฐ„๋Œ€๋ณ„ ๋งค์ถœ์•ก
matplotlib.rc_file_defaults()
ax1 = sns.set_style(style=None, rc=None )

fig, ax1 = plt.subplots(figsize=(12,6))

sns.lineplot(data = time_purchase_pt['value_sum'], marker='o', sort = False, ax=ax1)
ax2 = ax1.twinx()

sns.barplot(time_purchase_pt, x='purchase_time', y='value_count', alpha=0.5, ax=ax2)

  • ๋‹ค์Œ์€ ์‹œ๊ฐ„๋Œ€๋ณ„ ๋งค์ถœ์•ก๊ณผ ์ฃผ๋ฌธ๊ฑด์ˆ˜์— ๋Œ€ํ•œ ๊ทธ๋ž˜ํ”„๋กœ ์‚ฌ๋žŒ๋“ค์ด ์ž ์— ๋“œ๋Š” ์˜คํ›„ 11์‹œ ~ ์˜ค์ „ 8์‹œ๊นŒ์ง€๋Š” ๋งค์ถœ์•ก๊ณผ ์ฃผ๋ฌธ ๊ฑด์ˆ˜๊ฐ€ ์ ์€ ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ๊ณ , ์˜ค์ „ 10์‹œ ~ ์˜คํ›„ 4์‹œ ์‹œ๊ฐ„๋Œ€์— ๊ฐ€์žฅ ์‡ผํ•‘์„ ๋งŽ์ด ํ•˜๋Š” ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค.

์นดํ…Œ๊ณ ๋ฆฌ๋ณ„ ํ‰๊ท  ๋ฐฐ์†ก ์‹œ๊ฐ„

category_delivery_df = olist_df[['product_category_name_english', 'purchase_date', 'order_delivered_customer_date']].dropna(axis = 0).reset_index().drop(columns = {'index'})

# ๋ฐฐ์†ก๋œ ๋‚ ์งœ๋งŒ ์ถ”์ถœ
delivered_date = category_delivery_df['order_delivered_customer_date'].str.split()

delivered_date_list = []
for x in range(delivered_date.shape[0]) :
    delivered_date_list.append(delivered_date[x][0])

# ์ปฌ๋Ÿผ์— ์ €์žฅ   
category_delivery_df['order_delivered_customer_date'] = pd.to_datetime(delivered_date_list)

# ์ฃผ๋ฌธ ๋‚ ์งœ์™€ ๋ฐฐ์†ก ๋„์ฐฉ ๋‚ ์งœ์˜ ์ฐจ์ด๋ฅผ ๊ณ„์‚ฐํ•˜์—ฌ ์ปฌ๋Ÿผ์— ์ €์žฅ 
category_delivery_df['delivered_date'] = category_delivery_df['order_delivered_customer_date'] - category_delivery_df['purchase_date']
category_delivery_df = category_delivery_df.drop(columns = {'order_delivered_customer_date', 'purchase_date'})

# ์นดํ…Œ๊ณ ๋ฆฌ๋ณ„ ๋ฐฐ์†ก๊นŒ์ง€ ๊ฑธ๋ฆฐ ๋‚ ์งœ์˜ ํ‰๊ท  ํ”ผ๋ฒ—ํ…Œ์ด๋ธ” ์ƒ์„ฑ
category_delivery_pt = pd.pivot_table(data = category_delivery_df, 
                                   index = 'product_category_name_english', 
                                   values = 'delivered_date', 
                                   aggfunc = 'mean').reset_index()
                                   
# ๋ฐฐ์†ก๊นŒ์ง€ ๊ฑธ๋ฆฐ ๋‚ ์งœ๋ฅผ ๊ธฐ์ค€์œผ๋กœ ์˜ค๋ฆ„์ฐจ์ˆœ์œผ๋กœ ์ •๋ ฌ                                   
category_delivery_pt = category_delivery_pt.sort_values(by = 'delivered_date', ascending=True)
category_delivery_pt['delivered_date'] = category_delivery_pt['delivered_date'].dt.days

px.line(category_delivery_pt, 
        x = 'product_category_name_english', 
        y = 'delivered_date',
        labels={'product_category_name_english' : '์นดํ…Œ๊ณ ๋ฆฌ ๋ช…', 'delivered_date' : 'ํ‰๊ท  ๋ฐฐ์†ก ์‹œ๊ฐ„'}, 
        title = '์นดํ…Œ๊ณ ๋ฆฌ๋ณ„ ํ‰๊ท  ๋ฐฐ์†ก ์‹œ๊ฐ„')

  • ๋‹ค์Œ ๊ทธ๋ž˜ํ”„๋Š” ์นดํ…Œ๊ณ ๋ฆฌ๋ณ„ ํ‰๊ท  ๋ฐฐ์†ก์‹œ๊ฐ„์„ ์‹œ๊ฐํ™”ํ•œ ๊ฒƒ์œผ๋กœ ๊ฐ€์žฅ ๋ฐฐ์†ก์‹œ๊ฐ„์ด ๋น ๋ฅธ ์นดํ…Œ๊ณ ๋ฆฌ๋Š” 'arts_and_craftmanship' ์ด๋ฉฐ, 'la_cuisine', 'books_imported', 'party_supplies', 'fashion_childrens_clothes' ๊ฐ€ ๋’ค๋ฅผ ์ž‡๋Š” ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค. ๋ฐ˜๋Œ€๋กœ ๊ฐ€์žฅ ๋ฐฐ์†ก์‹œ๊ฐ„์ด ๋Š๋ฆฐ ์นดํ…Œ๊ณ ๋ฆฌ๋Š” 'office_furniture'์ด๋ฉฐ ํ‰๊ท  ๋ฐฐ์†ก์‹œ๊ฐ„์€ 20์ผ์ด๋‹ค. ๋’ค๋ฅผ ์ด์–ด 'fashion_shoes', 'home_comfort_2', 'christmas_supplies', 'security_and_services' ๋“ฑ์ด ์žˆ๋‹ค.

๋ฐฐ์†ก์‹œ๊ฐ„๋ณ„ ํ‰๊ท  ํ‰์ 

delivery_score_df = olist_df[['order_delivered_customer_date', 'purchase_date', 'review_score']].dropna(axis = 0).reset_index()
delivery_score_df = delivery_score_df.drop(columns = {'index'})

# ๋ฐฐ์†ก๋œ ๋‚ ์งœ๋งŒ ์ถ”์ถœ
delivered_date = delivery_score_df['order_delivered_customer_date'].str.split()

# ๋ฆฌ์ŠคํŠธ์— ์ €์žฅ
delivered_date_list = []
for x in range(delivered_date.shape[0]) :
    delivered_date_list.append(delivered_date[x][0])

# datetime ํ˜•์‹์œผ๋กœ ์ปฌ๋Ÿผ์— ์ €์žฅ
delivery_score_df['order_delivered_customer_date'] = pd.to_datetime(delivered_date_list)

# ์ฃผ๋ฌธ ์‹œ๊ฐ„๊ณผ ๋ฐฐ์†ก ๋„์ฐฉ ๋‚ ์งœ์˜ ์ฐจ์ด๋ฅผ ์ปฌ๋Ÿผ์— ์ €์žฅํ•˜๊ณ  ๋‚˜๋จธ์ง€ ์ปฌ๋Ÿผ ์‚ญ์ œ
delivery_score_df['delivered_date'] = delivery_score_df['order_delivered_customer_date'] - delivery_score_df['purchase_date']
delivery_score_df = delivery_score_df.drop(columns = {'order_delivered_customer_date', 'purchase_date'})
delivery_score_df['delivered_date'] = delivery_score_df['delivered_date'].dt.days
delivery_score_df

# ๋ฐฐ์†ก๊นŒ์ง€ ๊ฑธ๋ฆฐ ๋‚ ์งœ ๋ฒ”์ฃผํ™”
delivery_dic = {1 : '0~5days', 2 : '5~10days', 3 : '10~15days', 4 : '15~20days', 5 : '20days~'}

delivery_legend_list = []
for date in delivery_score_df['delivered_date'] :
    if 0 <= date < 5 :
        delivery_legend_list.append(1)
    elif 5 <= date < 10 :
        delivery_legend_list.append(2)
    elif 10 <= date < 15 :
        delivery_legend_list.append(3)
    elif 15 <= date < 20 :
        delivery_legend_list.append(4)
    else :
        delivery_legend_list.append(5)
        
delivery_score_df['delivery_legend'] = delivery_legend_list

# ๋‚ ์งœ ๋ฒ”์ฃผํ™”๋‹น ํ‰๊ท  ๋ฆฌ๋ทฐ ์ ์ˆ˜๋ฅผ ๊ตฌํ•˜๋Š” ํ”ผ๋ฒ—ํ…Œ์ด๋ธ” ์ƒ์„ฑ
delivery_score_pt = pd.pivot_table(data = delivery_score_df, 
                                   index = 'delivery_legend', 
                                   values = 'review_score', 
                                   aggfunc = 'mean').reset_index()

# ๋ฐ์ดํ„ฐ๋ฅผ delivery_dic์˜ value ๊ฐ’์œผ๋กœ ๋ณ€๊ฒฝ
delivery_score_pt['delivery_legend'] = delivery_dic.values()
delivery_score_pt

px.bar(delivery_score_pt, x = 'delivery_legend', y = 'review_score', 
       labels = {'delivery_legend' : '๋ฐฐ์†ก์‹œ๊ฐ„', 'review_score' : 'ํ‰๊ท  ํ‰์ '}, 
       title='๋ฐฐ์†ก์‹œ๊ฐ„๋ณ„ ํ‰๊ท  ํ‰์ ', color=["rgb(239,209,159)", "rgb(255,185,144)", "rgb(255,190,159)", "rgb(255,163,139)", "rgb(255,179,171)"], 
       color_discrete_map="identity")

  • ๋‹ค์Œ ๊ทธ๋ž˜ํ”„๋Š” ๋ฐฐ์†ก์‹œ๊ฐ„์— ๋”ฐ๋ฅธ ํ‰์ ์„ ์•Œ์•„๋ณธ ๊ทธ๋ž˜ํ”„๋กœ ๋ฐฐ์†ก ์‹œ๊ฐ„์ด ๊ธธ์ˆ˜๋ก ๊ณ ๊ฐ์˜ ์„œ๋น„์Šค ๋ถˆ๋งŒ์กฑ์ด ๋†’์•„์ง€๊ณ  ๋ถ€์ •์ ์ธ ๋ฆฌ๋ทฐ๋ฅผ ๋ฐ›์„ ๊ฐ€๋Šฅ์„ฑ์ด ๋†’์•„์ง„๋‹ค๊ณ  ๋งํ•  ์ˆ˜ ์žˆ๋‹ค.

๋ถ„์„

  • EDA๋ฅผ ํ†ตํ•ด ๋ฐฐ์†ก ์‹œ๊ฐ„์ด ๋Š˜์–ด๋‚  ์ˆ˜๋ก ํ‰์ ์ด ์ค„์–ด๋“œ๋Š” ๊ฒƒ์„ ํ™•์ธํ–ˆ๋‹ค. ๋”ฐ๋ผ์„œ '๋ฐฐ์†ก ์‹œ๊ฐ„์„ ์ค„์ผ๋ ค๋ฉด ์–ด๋–ป๊ฒŒ ํ•ด์•ผ๋ ๊นŒ' ๋ผ๋Š” ์˜๋ฌธ์ ์„ ์‹œ์ž‘์œผ๋กœ ์ด๋ฒˆ ๋ถ„์„์„ ์‹œ์ž‘ํ–ˆ๋‹ค.
  • ๋จผ์ € ๋ฐฐ์†ก ์‹œ๊ฐ„๊ณผ ๊ด€๋ จ๋œ ์—ฌ๋Ÿฌ ์š”์ธ๋“ค์„ ์•Œ์•„๋ณด๊ณ ์ž ํ–ˆ๋‹ค.

๋ฐฐ์†ก์‹œ๊ฐ„์ด 20์ผ ์ด์ƒ ๊ฑธ๋ฆฐ ๊ตฌ๋งค์ž์™€ ํŒ๋งค์ž ์‚ฌ์ด์˜ ํ‰๊ท  ๊ฑฐ๋ฆฌ / ๋ฐฐ์†ก์‹œ๊ฐ„์ด 20์ผ ๋ฏธ๋งŒ ๊ฑธ๋ฆฐ ๊ตฌ๋งค์ž์™€ ํŒ๋งค์ž ์‚ฌ์ด์˜ ํ‰๊ท  ๊ฑฐ๋ฆฌ ๋น„๊ต

# ์œ„๋„์™€ ๊ฒฝ๋„๋ฅผ ํ†ตํ•ด ๊ฑฐ๋ฆฌ(๋‹จ์œ„ : km) ๊ตฌํ•˜๋Š” ํ•จ์ˆ˜
def haversine_distance(lat1, lon1, lat2, lon2):
    # ์ง€๊ตฌ์˜ ๋ฐ˜์ง€๋ฆ„ (ํ‰๊ท  ๋ฐ˜์ง€๋ฆ„์€ ์•ฝ 6,371 km)
    R = 6371.0

    # ์œ„๋„์™€ ๊ฒฝ๋„๋ฅผ ๋ผ๋””์•ˆ์œผ๋กœ ๋ณ€ํ™˜
    lat1 = math.radians(lat1)
    lon1 = math.radians(lon1)
    lat2 = math.radians(lat2)
    lon2 = math.radians(lon2)

    # ์œ„๋„์™€ ๊ฒฝ๋„ ๊ฐ„์˜ ์ฐจ์ด ๊ณ„์‚ฐ
    dlat = lat2 - lat1
    dlon = lon2 - lon1

    # Haversine ๊ณต์‹์„ ์‚ฌ์šฉํ•˜์—ฌ ๊ฑฐ๋ฆฌ ๊ณ„์‚ฐ
    a = math.sin(dlat/2)**2 + math.cos(lat1) * math.cos(lat2) * math.sin(dlon/2)**2
    c = 2 * math.atan2(math.sqrt(a), math.sqrt(1 - a))
    distance = R * c

    return distance

๋‹ค์Œ๊ณผ ๊ฐ™์€ ํ•จ์ˆ˜๋ฅผ ๋งŒ๋“ค์–ด ๊ตฌ๋งค์ž์˜ ์œ„๋„, ๊ฒฝ๋„์™€ ํŒ๋งค์ž์˜ ์œ„๋„, ๊ฒฝ๋„๋ฅผ ํ†ตํ•œ ๊ฑฐ๋ฆฌ๋ฅผ ์•Œ์•„๋ณด์•˜๋‹ค.

px.bar(distance_dif_df, x = '๋ฐฐ์†ก ์‹œ๊ฐ„', y = 'ํ‰๊ท  ๊ฑฐ๋ฆฌ(km)', color = '๋ฐฐ์†ก ์‹œ๊ฐ„')

  • ๋ฐฐ์†ก ์‹œ๊ฐ„์ด 20์ผ ์ด์ƒ ๊ฑธ๋ฆฐ ์ง‘๋‹จ๊ณผ 20์ผ ๋ฏธ๋งŒ ๊ฑธ๋ฆฐ ์ง‘๋‹จ์˜ ํ‰๊ท  ๊ฑฐ๋ฆฌ ์ฐจ์ด๋Š” ์•ฝ 500km๋กœ ๋‘ ๋ฐฐ ๊ฐ€๊นŒ์ด ์ฐจ์ด๊ฐ€ ๋‚˜๋Š” ๊ฒƒ์„ ์•Œ ์ˆ˜ ์žˆ๋‹ค.

์ง€์—ญ๋ณ„ ํ‰๊ท  ๋ฐฐ์†ก๋น„

# ๊ตฌ๋งค์ž ์ฃผ์™€ ๋ฐฐ์†ก๋น„ ์ปฌ๋Ÿผ๋งŒ ์ถ”์ถœ
state_freight_value_df = olist_df[['customer_state', 'freight_value']]

# ๊ตฌ๋งค์ž ์ฃผ๋ณ„๋กœ ํ‰๊ท  ๋ฐฐ์†ก๋น„ ํ”ผ๋ฒ—ํ…Œ์ด๋ธ” ์ƒ์„ฑ
state_freight_value_pt = pd.pivot_table(data = state_freight_value_df, 
                                  values = 'freight_value', index = 'customer_state', aggfunc='mean')
state_freight_value_pt = state_freight_value_pt.reset_index()

map_freight_value = folium.Map(location=[-15.7801, -47.9292], tiles="cartodbpositron", zoom_start=4)

folium.Choropleth(
        geo_data=geo,
        data=state_freight_value_pt,
        columns=['customer_state', 'freight_value'],
        key_on='id',
        fill_color='PuRd',
        fill_opacity=0.7,
        line_opacity=0.5,
).add_to(map_freight_value)

map_freight_value

  • ํ‰๊ท  ๋ฐฐ์†ก๋น„๋Š” ๋‚จ๋ถ€์™€ ๋‚จ๋™๋ถ€ ์ฃผ๋ณ€ ์ง€์—ญ์—์„œ๋Š” ์ ๊ฒŒ ๋‚˜ํƒ€๋‚˜๋ฉฐ ๋ถ๋ถ€์™€ ๋ถ๋™๋ถ€, ์ค‘์„œ๋ถ€์—์„œ๋Š” ๋†’๊ฒŒ ๋‚˜ํƒ€๋‚˜๋Š”๋ฐ ์ด๋Š” ํŒ๋งค์ž์˜ ๋Œ€๋ถ€๋ถ„์ด ๋‚จ๋ถ€์™€ ๋‚จ๋™๋ถ€์— ๊ฑฐ์ฃผํ•˜๊ณ  ์žˆ๊ธฐ ๋•Œ๋ฌธ์ด๋ผ๊ณ  ์ƒ๊ฐํ–ˆ๋‹ค.

์„ธ๋ถ„ํ™”(ํŒ๋งค์ž์˜ ์ง€์—ญ๋ณ„๋กœ ์ฃผ๋ณ„ ํ‰๊ท  ๋ฐฐ์†ก๋น„ ์‹œ๊ฐํ™”)

  • ๋‹ค์Œ ๊ทธ๋ž˜ํ”„๋Š” ํŒ๋งค์ž์˜ ์œ„์น˜์— ๋”ฐ๋ผ ๊ฐ ์ฃผ๋ณ„ ํ‰๊ท  ๋ฐฐ์†ก๋น„๋ฅผ ์‹œ๊ฐํ™”ํ•œ ์ž๋ฃŒ๋กœ ํŒ๋งค์ž๋กœ ๋ถ€ํ„ฐ ๋ฉ€์–ด์งˆ ์ˆ˜๋ก ๋ฐฐ์†ก๋น„๊ฐ€ ์˜ฌ๋ผ๊ฐ„๋‹ค๋Š” ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค.
# ํŒ๋งค์ž์˜ ์ฃผ์™€ ๊ตฌ๋งค์ž์˜ ์ฃผ๊ฐ€ ๊ฐ™์€ ๊ฒƒ๋งŒ ์ถ”์ถœํ•˜๊ธฐ ์œ„ํ•ด mask ์ƒ์„ฑ
mask_same = (olist_df['seller_state'] == olist_df['customer_state'])

# ํŒ๋งค์ž์˜ ์ฃผ์™€ ๊ตฌ๋งค์ž์˜ ์ฃผ๊ฐ€ ๊ฐ™์€ ๋ฐ์ดํ„ฐ
cus_sell_same = olist_df[mask_same].reset_index()
# ํŒ๋งค์ž์˜ ์ฃผ์™€ ๊ตฌ๋งค์ž์˜ ์ฃผ๊ฐ€ ๋‹ค๋ฅธ ๋ฐ์ดํ„ฐ
cus_sell_dif = olist_df[~mask_same].reset_index()
cus_sell_same['state'] = '๊ฐ™์€ ์ฃผ๋กœ ๋ฐฐ์†ก'
cus_sell_dif['state'] = '๋‹ค๋ฅธ ์ฃผ๋กœ ๋ฐฐ์†ก'

cus_sell_df = pd.concat([cus_sell_same, cus_sell_dif])
cus_sell_df = cus_sell_df.reset_index().drop(columns = 'index')
px.histogram(cus_sell_df, x = 'freight_value', color = 'state')

  • ๋‹ค์Œ์€ ๊ฐ™์€ ์ฃผ๋กœ์˜ ๋ฐฐ์†ก๋น„์™€ ๋‹ค๋ฅธ ์ฃผ๋กœ์˜ ๋ฐฐ์†ก๋น„์˜ ๋ถ„ํฌ ์ฐจ์ด๋ฅผ ์•Œ์•„๋ณธ ๊ทธ๋ž˜ํ”„์ด๋‹ค. ํŒ๋งค์ž์™€ ๊ตฌ๋งค์ž๊ฐ€ ๊ฐ™์€ ์ฃผ์ผ๋•Œ์˜ ๋ฐฐ์†ก๋น„๋Š” ํ‰๊ท  13.4ํ—ค์•Œ์ด๊ณ , ๋‹ค๋ฅธ ์ฃผ์ผ๋•Œ์˜ ํ‰๊ท  ๋ฐฐ์†ก๋น„๋Š” 23.6 ํ—ค์•Œ์ด๋‹ค. ๋Œ€๋žต 2๋ฐฐ ๊ฐ€๊นŒ์ด ์ฐจ์ด๊ฐ€ ๋‚˜๋Š” ๊ฒƒ์„ ์•Œ ์ˆ˜ ์žˆ๋‹ค.

๋”ฐ๋ผ์„œ ๋™์ผ ์ง€์—ญ์— ํŒ๋งค์ž๊ฐ€ ์—†์„ ๊ฒฝ์šฐ ๋ฐฐ์†ก๋น„๊ฐ€ ์ฆ๊ฐ€ํ•˜๋ฉฐ ๊ทธ์— ๋”ฐ๋ฅธ ๊ณ ๊ฐ์˜ ์ง€์ถœ ๊ธˆ์•ก์ด ์ฆ๊ฐ€ํ•˜๊ณ  ๋ฐฐ์†ก ์‹œ๊ฐ„์ด ์ฆ๊ฐ€ํ•˜์—ฌ ๊ตฌ๋งค์ž์˜ ์„œ๋น„์Šค ๋งŒ์กฑ๋„๊ฐ€ ๋‚ฎ์•„์งˆ ๊ฒƒ์ด๋‹ค.

์ด๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ์ง€์—ญ๋ณ„ ์„ ํ˜ธ ์นดํ…Œ๊ณ ๋ฆฌ๋ฅผ ์กฐ์‚ฌํ•˜์—ฌ ์„ ํ˜ธ ์นดํ…Œ๊ณ ๋ฆฌ์˜ ํŒ๋งค์ž๊ฐ€ ๋ถ€์กฑํ•œ ๊ฒฝ์šฐ ํŒ๋งค์ž๋ฅผ ์œ ์ž…์‹œํ‚ฌ ์ˆ˜ ์žˆ๋Š” ๋งˆ์ผ€ํŒ…์„ ์ง„ํ–‰ํ•  ์ˆ˜ ์žˆ๋„๋ก ์ง€์—ญ๊ณผ ์„ ํ˜ธ ์นดํ…Œ๊ณ ๋ฆฌ๋ฅผ ๋ถ„์„ํ•˜๊ณ ์ž ํ–ˆ์œผ๋ฉฐ, ์ตœ์ ์˜ ๋ฌผ๋ฅ˜์„ผํ„ฐ ์œ„์น˜๋ฅผ ๋ถ„์„ํ•ด ๋ฐฐ์†ก ์‹œ๊ฐ„์„ ์ตœ์†Œํ™”ํ•  ์ˆ˜ ์žˆ๋Š” ๋ฐฉ๋ฒ•์„ ๊ณ ๋ฏผํ–ˆ๋‹ค.

์ง€์—ญ๋ณ„ ์„ ํ˜ธ ์นดํ…Œ๊ณ ๋ฆฌ

  • ๋‚จ๋ถ€์™€ ๋‚จ๋™๋ถ€ ์ง€์—ญ์—์„œ๋Š” 'bed_bath_table' ์นดํ…Œ๊ณ ๋ฆฌ ์ƒํ’ˆ์„ ๋ชจ๋“  ์ฃผ์—์„œ ์„ ํ˜ธํ•˜๋Š” ๊ฒƒ์œผ๋กœ ๋ณด์—ฌ์ง€๋ฉฐ 'health_beauty', 'sports_leisure' ์นดํ…Œ๊ณ ๋ฆฌ ์ƒํ’ˆ ๋˜ํ•œ ์„ ํ˜ธํ•˜๋Š” ๊ฒƒ์„ ์•Œ ์ˆ˜ ์žˆ๋‹ค.
  • ์ค‘์„œ๋ถ€ ์ง€์—ญ์— ๊ฒฝ์šฐ, ๋‚จ๋ถ€ ๊ทธ๋ฆฌ๊ณ  ๋‚จ๋™๋ถ€์™€ ๋น„์Šทํ•˜๊ฒŒ 'bed_bath_tableโ€™, 'health_beauty', 'sports_leisure' ์นดํ…Œ๊ณ ๋ฆฌ ์ƒํ’ˆ์„ ๊ฐ€์žฅ ์„ ํ˜ธํ•œ๋‹ค
  • ๋ถ๋ถ€์— ๊ฒฝ์šฐ์—๋Š” 'health_beauty', 'computers_accessories', 'sports_leisure' ๋“ฑ์˜ ์นดํ…Œ๊ณ ๋ฆฌ ์ƒํ’ˆ์„ ๊ฐ€์žฅ ์„ ํ˜ธํ•œ๋‹ค.
  • ๋งˆ์ง€๋ง‰์œผ๋กœ ๋ถ๋™๋ถ€์—์„œ๋„ 'bed_bath_table' , 'health_beauty', 'sports_leisure' ์˜ ์นดํ…Œ๊ณ ๋ฆฌ ์ƒํ’ˆ์„ ์„ ํ˜ธํ•˜๋ฉฐ 'watches_gifts', 'telephony' ์นดํ…Œ๊ณ ๋ฆฌ ์ƒํ’ˆ๋„ ์ผ๋ถ€ ์ฃผ์— ํ•œํ•˜์—ฌ ์„ ํ˜ธํ•œ๋‹ค.

ํŒ๋งค์ž๊ฐ€ ๋งŽ์€ ๋‚จ๋ถ€์™€ ๋‚จ๋™๋ถ€๋ฅผ ์ œ์™ธํ•˜๊ณ  ๋‚˜๋จธ์ง€ ์ง€์—ญ์— ๋Œ€ํ•œ ์„ ํ˜ธ ์นดํ…Œ๊ณ ๋ฆฌ์˜ ํŒ๋งค์ž ์ˆ˜๋ฅผ ๊ฐ™์ด ์•Œ์•„๋ณด๊ณ ์ž ํ–ˆ๋‹ค.

๋Œ€์ฒด๋กœ ์ฃผ๋ฌธ ๊ฑด์ˆ˜์— ๋น„ํ•ด ํŒ๋งค์ž์ˆ˜๋Š” ๋‚ฎ๊ฒŒ ๋‚˜ํƒ€๋‚˜๋Š” ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค.

๋”ฐ๋ผ์„œ ์ฃผ๋ฌธ ๊ฑด์ˆ˜์™€ ํŒ๋งค์ž ์ˆ˜๋ฅผ ๊ณ ๋ คํ–ˆ์„ ๋•Œ
1. ๋ถ๋™๋ถ€์—์„œ๋Š” health_beauty ์นดํ…Œ๊ณ ๋ฆฌ ์ƒํ’ˆ์„ ์ทจ๊ธ‰ํ•˜๋Š” ํŒ๋งค์ž ์œ ์ž… ๋งˆ์ผ€ํŒ…์„ ์ง„ํ–‰ํ•˜๋Š” ๊ฒƒ์ด ๊ฐ€์žฅ ํšจ๊ณผ์ ์ผ ๊ฒƒ์œผ๋กœ ์˜ˆ์ƒ๋œ๋‹ค.
2. ์ค‘์„œ๋ถ€๋Š” health_beauty ์™€ sports_leisure ์นดํ…Œ๊ณ ๋ฆฌ ์ƒํ’ˆ์— ๋Œ€ํ•œ ํŒ๋งค์ž ์œ ์ž… ๋งˆ์ผ€ํŒ…์„ ์ง„ํ–‰ํ•˜๋Š” ๊ฒƒ์ด ๊ฐ€์žฅ ํšจ๊ณผ์ ์ผ ๊ฒƒ์œผ๋กœ ์˜ˆ์ƒ๋œ๋‹ค.
3. ๋ถ๋ถ€์—์„œ๋Š” health_beauty ์นดํ…Œ๊ณ ๋ฆฌ ์ƒํ’ˆ์„ ์ทจ๊ธ‰ํ•˜๋Š” ํŒ๋งค์ž ์œ ์ž… ๋งˆ์ผ€ํŒ…์„ ์ง„ํ–‰ํ•˜๋Š” ๊ฒƒ์ด ๊ฐ€์žฅ ํšจ๊ณผ์ ์ผ ๊ฒƒ์œผ๋กœ ์˜ˆ์ƒ๋œ๋‹ค.

๋ฌผ๋ฅ˜ ์„ผํ„ฐ ์œ„์น˜ ๋ถ„์„

  • ์•„์ด๋””์–ด : '๋ฐฐ์†ก์ด ์˜ค๋ž˜ ๊ฑธ๋ฆฐ(20์ผ ์ด์ƒ) ๊ตฌ๋งค์ž์™€ ํŒ๋งค์ž์˜ ์œ„์น˜ ์ค‘์•™๊ฐ’์ด ๋งŽ์ด ๋ถ„ํฌํ•ด ์žˆ๋Š” ๊ณณ์— ๋ฌผ๋ฅ˜ ์„ผํ„ฐ๋ฅผ ๋‘”๋‹ค๋ฉด ๋ฐฐ์†ก ์‹œ๊ฐ„์ด ์ค„์–ด๋“ค์ง€ ์•Š์„๊นŒ?'
# ํŠน์ • ์ปฌ๋Ÿผ๋งŒ ์ถ”์ถœ
geo_seller_customer_df = olist_df[['customer_id', 'seller_id', 'order_approved_at', 'order_delivered_customer_date', 'delivery_date', 'customer_zip_code_prefix', 'seller_zip_code_prefix']]

geo_df = geolocation_df[['geolocation_zip_code_prefix', 'geolocation_lat', 'geolocation_lng']]
geo_df = geo_df.rename(columns = {'geolocation_zip_code_prefix' : 'zip_code_prefix'})

# ์ค‘๋ณตํ–‰ ์ œ๊ฑฐ
geo_df = geo_df[~geo_df.duplicated(['zip_code_prefix'])]
# ์ธ๋ฑ์Šค ์ดˆ๊ธฐํ™”
geo_df = geo_df.reset_index().drop(columns = {'index'})

# ๋ฐฐ์†ก์ผ์ด 20์ผ ์ด์ƒ ๊ฑธ๋ฆฐ ๊ฐ’๋งŒ ์ถ”์ถœํ•˜๊ธฐ ์œ„ํ•ด ํ•„ํ„ฐ ์ƒ์„ฑ
mask_delivery = (geo_seller_customer_df['delivery_date'] >= 20) == True

# ๋ฐฐ์†ก์ผ์ด 20์ผ ์ด์ƒ ๊ฑธ๋ฆฐ ๋ฐ์ดํ„ฐ์™€ ๊ทธ๋ ‡์ง€ ์•Š์€ ๋ฐ์ดํ„ฐ ๋ถ„๋ฆฌ
geo_seller_customer_df_20 = geo_seller_customer_df[mask_delivery]
geo_seller_customer_df_0 = geo_seller_customer_df[~mask_delivery]
  • ๋‹ค์Œ๊ณผ ๊ฐ™์ด ๋ฐฐ์†ก์ด 20์ผ ์ด์ƒ ๊ฑธ๋ฆฐ ๊ตฌ๋งค์ž์™€ ํŒ๋งค์ž ์ง‘๋‹จ๊ณผ 20์ผ ๋ฏธ๋งŒ ๊ฑธ๋ฆฐ ์ง‘๋‹จ์„ ๋ถ„๋ฆฌํ–ˆ๋‹ค.
# geo ๋ฐ์ดํ„ฐ์…‹๊ณผ joinํ•˜์—ฌ ๊ตฌ๋งค์ž์˜ ์œ„๋„, ๊ฒฝ๋„ ๊ฐ’ ๋ถˆ๋Ÿฌ์˜ค๊ธฐ 
geo_seller_customer_df_20 = geo_seller_customer_df_20.merge(geo_df, how = 'inner', on = 'customer_zip_code_prefix')

# geo ๋ฐ์ดํ„ฐ์…‹๊ณผ joinํ•˜์—ฌ ํŒ๋งค์ž์˜ ์œ„๋„, ๊ฒฝ๋„ ๊ฐ’ ๋ถˆ๋Ÿฌ์˜ค๊ธฐ 
geo_seller_customer_df_20 = geo_seller_customer_df_20.merge(geo_df, how = 'inner', on = 'seller_zip_code_prefix')

# ์ปฌ๋Ÿผ๋ช… ๋ณ€๊ฒฝ
geo_seller_customer_df_20 = geo_seller_customer_df_20.rename(columns = {'geolocation_lat_x' : 'customer_lat', 'geolocation_lng_x' : 'customer_lng', 
                                                                  'geolocation_lat_y' : 'seller_lat', 'geolocation_lng_y' : 'seller_lng'})

# ๊ตฌ๋งค์ž์™€ ํŒ๋งค์ž์˜ ์œ„์น˜ ์ค‘์•™๊ฐ’ ๊ณ„์‚ฐ
geo_seller_customer_df_20['middle_location_lat'] = (geo_seller_customer_df_20['customer_lat'] + geo_seller_customer_df_20['seller_lat']) / 2
geo_seller_customer_df_20['middle_location_lng'] = (geo_seller_customer_df_20['customer_lng'] + geo_seller_customer_df_20['seller_lng']) / 2
  • ์ขŒํ‘œํ‰๋ฉด ์ƒ์˜ ๋‘ ์  (x1, y1), (x2, y2)์˜ ์ค‘์•™๊ฐ’์€ ((x1 + x2) / 2 , (y1 + y2) / 2)์ด๋ฏ€๋กœ ์ด ์ ์„ ๊ณ ๋ คํ•ด ๊ตฌ๋งค์ž์™€ ํŒ๋งค์ž์˜ ์ค‘์•™ ์œ„๋„, ๊ฒฝ๋„ ๊ฐ’์„ ๊ตฌํ–ˆ๋‹ค.
import folium
import json

# ์ง€๋„ ์ƒ์„ฑ
map = folium.Map(location=[-15.7801, -47.9292], tiles="cartodbpositron", zoom_start=4)

geo = json.load(open('./brazil_geo.json'))

# FeatureGroups ์ƒ์„ฑ
customer_group = folium.FeatureGroup(name='Customer Locations')
seller_group = folium.FeatureGroup(name='Seller Locations')
middle_location_group = folium.FeatureGroup(name='Middle Locations')
pax_group = folium.FeatureGroup(name='Pax Hubs')

# ๊ตฌ๋งค์ž ์œ„์น˜ ํ‘œ์‹œ
for index, row in geo_seller_customer_df_20.iterrows():
    customer_lat = row['customer_lat']
    customer_lng = row['customer_lng']
    folium.Circle(location=[customer_lat, customer_lng], radius=1, color = 'rgb(6, 224, 208)').add_to(customer_group)

# ํŒ๋งค์ž ์œ„์น˜ ํ‘œ์‹œ
for index, row in geo_seller_customer_df_20.iterrows():
    seller_lat = row['seller_lat']
    seller_lng = row['seller_lng']
    folium.Circle(location=[seller_lat, seller_lng], radius=1, color='rgb(255, 0, 255)').add_to(seller_group)

# ๊ตฌ๋งค์ž์™€ ํŒ๋งค์ž์˜ ์œ„์น˜ ์ค‘์•™๊ฐ’ ํ‘œ์‹œ
for index, row in geo_seller_customer_df_20.iterrows():
    middle_lat = row['middle_location_lat']
    middle_lng = row['middle_location_lng']
    folium.Circle(location=[middle_lat, middle_lng], radius=1, color='rgb(0, 0, 128)').add_to(middle_location_group)

# Pax์˜ ํ—ˆ๋ธŒ ์œ„์น˜ ๋งˆ์ปค๋กœ ํ‘œ์‹œ
for index, row in pax_hub_df.iterrows():
    pax_lat = row['pax_hub_lat']
    pax_lng = row['pax_hub_lng']
    folium.Marker(location=[pax_lat, pax_lng], icon=folium.Icon(color='beige')).add_to(pax_group)

# ๊ฐ ์ฃผ๋ณ„ ๊ฒฝ๊ณ„์„  ํ‘œ์‹œ
folium.Choropleth(geo_data = geo, line_color='black', line_opacity = 1, fill_opacity=0).add_to(map)
    
customer_group.add_to(map)
seller_group.add_to(map)
middle_location_group.add_to(map)
pax_group.add_to(map)


# ๋ ˆ์ด์–ด ์ปจํŠธ๋กค์„ ๋งต์— ๋ถ€์ฐฉ
folium.LayerControl().add_to(map)

map

๋ฐฐ์†ก์ด 20์ผ ์ด์ƒ ๊ฑธ๋ฆฐ ๊ตฌ๋งค์ž์™€ ํŒ๋งค์ž์˜ ์œ„์น˜ ๋ถ„ํฌ๋ฅผ ์‚ดํŽด๋ณด๋ฉด ์œ„์™€ ๊ฐ™๋‹ค.

๊ทธ์— ๋”ฐ๋ฅธ ์œ„์น˜ ์ค‘์•™๊ฐ’์€ ์œ„์™€ ๊ฐ™๋‹ค.

๊ฐ๊ฐ์„ ๋‚จ๋ถ€์™€ ๋‚จ๋™๋ถ€๋ฅผ ์ œ์™ธํ•œ ๋‚˜๋จธ์ง€ ์ง€์—ญ์—์„œ ํ™•๋Œ€ํ•ด๋ณด๋ฉด

ํ•ด๋‹น ์ฃผ์—์„œ ์ค‘์•™๊ฐ’์ด ๋งŽ์ด ๋ถ„ํฌํ•ด ์žˆ๋Š” ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค.

  • ๋ถ๋ถ€, ๋ถ๋™๋ถ€, ์ค‘์„œ๋ถ€์—์„œ ์œ„์น˜ ์ค‘์•™๊ฐ’์ด ๋งŽ์ด ๋ถ„ํฌํ•ด ์žˆ๋Š” ๋„์‹œ๋ฅผ ์•Œ์•„๋ณด๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.
    -์ค‘์„œ๋ถ€-

    (์‹œ๋…ธํ”„ - ์‹œ๋…ธํ”„ ๊ณตํ•ญ ์œ„์น˜, 163๋„๋กœ๋ฅผ ํ†ตํ•ด ๋ธŒ๋ผ์งˆ๋ฆฌ์•„, ๋ถ๋ถ€, ๋ถ๋™๋ถ€์™€ ์—ฐ๊ฒฐ)
    -๋ถ๋ถ€-

    (ํŒŒ๋ผ๋ƒ - 242๋„๋กœ๋ฅผ ํ†ตํ•ด ๋งˆํ† ๊ทธ๋กœ์ˆ˜ ์ฃผ์™€ ์—ฐ๊ฒฐ, 010๋„๋กœ๋ฅผ ํ†ตํ•ด ๋ธŒ๋ผ์งˆ๋ฆฌ์•„, ํŒŒ์šฐ๋งˆ์Šค ๊ณตํ•ญ๊ณผ ์—ฐ๊ฒฐ)
    -๋ถ๋™๋ถ€-

    (๋น„ํ† ๋ฆฌ์•„๋‹ค ์ฝฉ๊ธฐ์Šคํƒ€ - 116๋„๋กœ๋ฅผ ํ†ตํ•ด ์‚ด๋ฐ”๋„๋ฅด ๋„์‹œ์™€ ์—ฐ๊ฒฐ, BA262๋„๋กœ๋ฅผ ํ†ตํ•ด ํˆฌ์นธ์นญ์Šค ์ฃผ์™€ ์—ฐ๊ฒฐ, 415๋„๋กœ๋ฅผ ํ†ตํ•ด ๋ฏธ๋‚˜์Šค ์ œ๋ผ์ด์Šค ์ฃผ์™€ ์—ฐ๊ฒฐ)

๊ฒ€์ฆ

  • ๊ณผ์—ฐ ์ค‘์•™๊ฐ’์œผ๋กœ ๋ฌผ๋ฅ˜ ์„ผํ„ฐ์˜ ์œ„์น˜๋ฅผ ์ •ํ•˜๋Š” ๊ฒƒ์ด ํšจ๊ณผ์ ์ผ์ง€ ์•Œ์•„๋ณด๊ธฐ ์œ„ํ•ด ์‹ค์ œ Olist์—์„œ ์šด์˜ํ•˜๊ณ  ์žˆ๋Š” Pax๋ผ๋Š” ๋ฌผ๋ฅ˜ ํšŒ์‚ฌ์˜ ์„ผํ„ฐ ์œ„์น˜๋ฅผ ๊ฐ™์ด ํ‘œ์‹œํ•ด ๋ณด์•˜๋‹ค.

์ฃผํ™ฉ์ƒ‰ ๋งˆ์ปค๋Š” ์‹ค์ œ Pax ํšŒ์‚ฌ์˜ ๋ฌผ๋ฅ˜ ์„ผํ„ฐ์˜ ์œ„์น˜์ด๊ณ  ๊ฐ™์€ ์ง€์—ญ์—์„œ ๋น„๊ตํ–ˆ์„ ๋•Œ ๋นจ๊ฐ„ ์› ๋ถ€๋ถ„์—์„œ ์œ ์‚ฌํ•˜๋‹ค๋Š” ๊ฒƒ์„ ํ™•์ธํ–ˆ๋‹ค.

๊ฒฐ๋ก 

  • ๋”ฐ๋ผ์„œ ์ง€๊ธˆ๊นŒ์ง€์˜ ๋ถ„์„ ๊ฒฐ๊ณผ๋ฅผ ์ข…ํ•ฉํ•˜๊ณ  ๋ธŒ๋ผ์งˆ์˜ ์ฃผ๋ณ„ ์ธ๊ตฌ์ˆ˜์™€ ์ฃผ๋ฌธ ๊ฑด์ˆ˜๋ฅผ ๊ณ ๋ คํ•˜์—ฌ ์šฐ์„  ์ˆœ์œ„๋ฅผ ์ •ํ–ˆ์„ ๋•Œ

1. ๋ถ๋™๋ถ€์—์„œ๋Š” health_beauty ์นดํ…Œ๊ณ ๋ฆฌ ์ƒํ’ˆ์„ ์ทจ๊ธ‰ํ•˜๋Š” ํŒ๋งค์ž ์œ ์ž…
๋ถ๋™๋ถ€์˜ ๋ฐ”์ด์•„ ์ฃผ์— ์žˆ๋Š” ๋น„ํ† ๋ฆฌ์•„๋‹ค ์ฝฉ๊ธฐ์Šคํƒ€์— ํ—ˆ๋ธŒ ์œ„์น˜ ์ œ์•ˆ

2. ์ค‘์„œ๋ถ€์—์„œ๋Š” bed_bath_table, sports_leisure ์นดํ…Œ๊ณ ๋ฆฌ ์ƒํ’ˆ์„ ์ทจ๊ธ‰ํ•˜๋Š” ํŒ๋งค์ž ์œ ์ž…
์ค‘์„œ๋ถ€์˜ ๋งˆํ† ๊ทธ๋กœ์ˆ˜ ์ฃผ์— ์žˆ๋Š” ์‹œ๋…ธํ”„์— ํ—ˆ๋ธŒ ์œ„์น˜ ์ œ์•ˆ

3. ๋ถ๋ถ€์—์„œ๋Š” health_beauty ์นดํ…Œ๊ณ ๋ฆฌ ์ƒํ’ˆ์„ ์ทจ๊ธ‰ํ•˜๋Š” ํŒ๋งค์ž ์œ ์ž…
๋ถ๋ถ€์˜ ํ† ์นธ์นญ์Šค ์ฃผ์— ์žˆ๋Š” ํŒŒ๋ผ๋ƒ์— ํ—ˆ๋ธŒ ์œ„์น˜ ์ œ์•ˆ

ํ”„๋กœ์ ํŠธ ํšŒ๊ณ 

  • ๊ต‰์žฅํžˆ ์˜๋ฏธ์žˆ๋Š” ๊ฒฐ๊ณผ๊ฐ€ ๋‚˜์™€์„œ ๋งŒ์กฑ์Šค๋Ÿฌ์šด ํ”„๋กœ์ ํŠธ์˜€๋‹ค.
    ์‹ค์ œ EDA๋ฅผ ํ†ตํ•ด ๋ฐ์ดํ„ฐ์˜ ํŠน์ง•์„ ์•Œ์•„๋ณด๊ณ  ๊ทธ๊ฒƒ์„ ๊นŠ์ด ๋ถ„์„ํ•ด ์‹ค์ œ๋กœ ์ œ์•ˆํ•  ์ˆ˜ ์žˆ๋Š” ๊ฒฐ๊ณผ๊นŒ์ง€ ๋„์ถœํ–ˆ๋‹ค๋Š” ์ ์—์„œ ํ”„๋กœ์ ํŠธ ์ง„ํ–‰ ๊ณผ์ •๋„ ๋งค๋„๋Ÿฝ๊ฒŒ ํ˜๋Ÿฌ๊ฐ”๋˜ ๊ฒƒ ๊ฐ™๋‹ค.
    ๋งŽ์ด ์„ฑ์žฅํ•  ์ˆ˜ ์žˆ๋Š” ๊ธฐํšŒ์˜€๊ณ , ๋˜ ์ด๋ฒˆ ํ”„๋กœ์ ํŠธ๋ฅผ ํ•˜๋ฉด์„œ ๋‚˜ ์ž์‹ ์„ ์„ฑ์ฐฐํ•  ์ˆ˜ ์žˆ๋Š” ๊ธฐํšŒ๋„ ๋œ ๊ฒƒ๊ฐ™๋‹ค.

  • ๋‚˜ ์ž์‹ ์— ๋Œ€ํ•ด ์„ฑ์ฐฐํ•œ ๋ถ€๋ถ„์„ ์„œ์ˆ ํ•˜์ž๋ฉด

    1. ํ•˜๋‚˜์— ๋ฐ์ดํ„ฐ๋„ ๋ณด๋Š” ๊ด€์ ์— ๋”ฐ๋ผ ๋‹ค๋ฅด๊ฒŒ ํ•ด์„๋  ์ˆ˜ ์žˆ๋‹ค. ์‹œ์•ผ๋ฅผ ๋„“ํ˜€์„œ ํ™•์ธํ•ด์•ผ๋œ๋‹ค. (์ด๋ฏธ ๋„“์–ด๋„ ๋” ๋„“ํ˜€์•ผ ๋œ๋‹ค.)
    2. ๋ฐ์ดํ„ฐ ์‹œ๊ฐํ™”๋Š” ๋ณด๋Š” ์‚ฌ๋žŒ์— ๋งž์ถฐ์„œ ์ง„ํ–‰ํ•ด์•ผ๋œ๋‹ค. ํ”Œ๋กฏ์„ ์„ ํƒํ•˜๋Š” ๊ฒƒ ๋ฟ๋งŒ ์•„๋‹ˆ๋ผ ์„ธ์„ธํ•œ ๋‹จ์œ„๊นŒ์ง€ ๋ญ˜ ์˜๋ฏธํ•˜๋Š” ์ง€ ๋ช…ํ™•ํ•˜๊ฒŒ ๋ณด์—ฌ์ฃผ์–ด์•ผ ํ•˜๊ธฐ ๋•Œ๋ฌธ์— ๊ผผ๊ผผํ•˜๊ฒŒ ํ™•์ธํ•ด์•ผ๋œ๋‹ค.
    3. ๋ฐ์ดํ„ฐ ์‹œ๊ฐํ™”๋Š” ์—ฌ๋Ÿฌ ํˆด์„ ์‚ฌ์šฉํ•ด ๋ณด๋Š” ๊ฒƒ์ด ์ข‹๋‹ค. ๊ฐ๊ฐ์˜ ํˆด๋งˆ๋‹ค ์žฅ์ ์ด ์žˆ๊ณ  ์ „๋‹ฌ๋ ฅ๋„ ๋ฐ์ดํ„ฐ๋งˆ๋‹ค ํˆด๋งˆ๋‹ค ๋‹ฌ๋ผ์ง„๋‹ค.(์ด์ „์—๋Š” ์‹œ๊ฐํ™”๋ฅผ ํ•  ๋•Œ python๋งŒ ๊ณ ์ง‘ํ–ˆ๋‹ค..)
    4. ์ •๋ง ๋ฐ์ดํ„ฐ๋Š” ๋‚ด ์ž…๋ง›์— ๋งž๊ฒŒ ์กด์žฌํ•˜์ง€ ์•Š๋Š”๋‹ค.
    5. ํŒ€ ํ™œ๋™์—์„œ๋Š” ํ˜‘๋ ฅ์ด ์ œ์ผ ์ค‘์š”ํ•œ ๊ฒƒ ๊ฐ™๋‹ค. ํ˜‘๋ ฅ์„ ํ•˜๊ฒŒ ๋˜๋ฉด ๋‚ด๊ฐ€ ๋ณด์ง€ ๋ชปํ–ˆ๋˜ ๊ฒƒ์„ ํŒ€์›์ด ๋ณด์—ฌ์ฃผ๊ณ  ํŒ€์›์ด ๋ณด์ง€ ๋ชปํ–ˆ๋˜ ๊ฒƒ์„ ๋‚ด๊ฐ€ ์•Œ๋ ค์ค„ ์ˆ˜ ์žˆ๋‹ค. ๊ทธ์— ๋”ฐ๋ผ ํ€„๋ฆฌํ‹ฐ๋„ ๋†’์•„์ง€๊ฒŒ ๋˜์–ด์žˆ๋‹ค.

4๊ฐœ์˜ ๋Œ“๊ธ€

comment-user-thumbnail
2023๋…„ 12์›” 18์ผ

์•ˆ๋…•ํ•˜์„ธ์š”! ์ข‹์€ ์ •๋ณด ๊ฐ์‚ฌํ•ฉ๋‹ˆ๋‹ค. ๋‹ค๋ฆ„์ด ์•„๋‹ˆ๋ผ ๋ฐ์ดํ„ฐ๋ถ€ํŠธ์บ ํ”„์—์„œ ํ”„๋กœ์ ํŠธ๋ฅผ ์ง„ํ–‰ ์ค‘์ธ๋ฐ ๋ฌผ๋ฅ˜์„ผํ„ฐ ๊ด€๋ จ ์ •๋ณด๋ฅผ ์ฐพ์„ ์ˆ˜ ์—†์–ด์„œ ์งˆ๋ฌธ ๋‚จ๊น๋‹ˆ๋‹ค. ํ˜น์‹œ ๊ดœ์ฐฎ์œผ์‹œ๋‹ค๋ฉด ์–ด๋””์„œ Pax ๋ฌผ๋ฅ˜์„ผํ„ฐ์˜ ์ •๋ณด๋ฅผ ์ฐพ์œผ์…จ๋Š”์ง€ ์—ฌ์ญค๋ด๋„ ๊ดœ์ฐฎ์„๊นŒ์š”?

1๊ฐœ์˜ ๋‹ต๊ธ€
comment-user-thumbnail
2024๋…„ 3์›” 22์ผ

์•ˆ๋…•ํ•˜์„ธ์š” ์ •๋ฆฌํ•ด์ฃผ์‹  ์ž๋ฃŒ๊ฐ€ ๋งŽ์€ ๋„์›€์ด ๋ฉ๋‹ˆ๋‹ค :) ๋ถ„์„ ํŒŒํŠธ ๋’ท๋ถ€๋ถ„์— ๋‚˜์˜ค๋Š” '์ง€์—ญ๋ณ„ ์„ ํ˜ธ ์นดํ…Œ๊ณ ๋ฆฌ'์—๋Š” ์ถ”๊ฐ€ ๋ฐ์ดํ„ฐ์…‹์„ ์‚ฌ์šฉํ•˜์‹  ๊ฒƒ ๊ฐ™์€๋ฐ, ํ˜น์‹œ ๋งž๋‹ค๋ฉด ํŒŒ์ผ์„ ๊ณต์œ ๋ฐ›์„ ์ˆ˜ ์žˆ์„๊นŒ์š”?

๋‹ต๊ธ€ ๋‹ฌ๊ธฐ