Sqoop SQL Server to Hive

Lยท2022๋…„ 7์›” 21์ผ
0

๋…์ž ๋Œ€์ƒ๐Ÿ“ฃ

๐Ÿ‘‰ SQL Server ๋ฐ์ดํ„ฐ๋ฅผ Hive๋กœ ์˜ฎ๊ธฐ์‹œ๋ ค๋Š” ๋ถ„


์ถœ๊ฐ„ ์ด์œ โ“

์ฒซ ๋ฒˆ์งธ ๋งˆ์ด๊ทธ๋ ˆ์ด์…˜ ์ดํ›„, ๋ฐ์ดํ„ฐ๋ฅผ ๊ณ„์† ๋ฎ์–ด์“ฐ๋Š” ๋ฐฉ์‹์„ ์‚ฌ์šฉํ•˜๋‹ค๊ฐ€, ๋„ˆ๋ฌด ๋‹ต๋‹ตํ•ด์„œ ์ด์–ด์“ฐ๋Š” ๋ฐฉ์‹์„ ์ฑ„ํƒ

์‹œ๊ณ„์—ด ๋ฐ์ดํ„ฐ๊ฐ€ ๊ธฐ๋ฐ˜์ผ ๋•Œ, ์ •๋ง ์œ ์šฉํ•˜๋‹คโ—โ— ๊ฐ•์ถ” โ—โ—โ—โ—


๋ฌธ์ œ ํŒŒ์•…๐Ÿ”

Q: Sqoop์„ ์‚ฌ์šฉํ•˜๋ฉด ๋Œ€์ค‘์ ์ธ RDB์—์„œ HDFS๋กœ 
   ๋ฐ์ดํ„ฐ๋ฅผ ๋งˆ์ด๊ทธ๋ ˆ์ด์…˜์„ ํ•  ์ˆ˜ ์žˆ๋Š” ๊ฑด ์•„๋Š”๋ฐ... 
   ๋งค์ผ ๋ฎ์–ด์“ฐ๊ธฐ๋ฅผ ํ•˜๋‹ˆ๊นŒ ์‹œ๊ฐ„ ์†Œ์š”๊ฐ€ ๋„ˆ๋ฌด ๊ธธ์–ด์š”๐Ÿ˜ฅ
A: ์•„~ ์šธ์ง€ ๋งˆ์‹œ๊ณ ~ ์ž ์ผ๋‹จ Sqoop ์ปค๋งจ๋”ฉ ์‚ฌ์šฉ ํ›„, 
   Hive์—์„œ ํ…Œ์ด๋ธ”์„ ์ƒ์„ฑํ•˜๋Š” ๋ฐฉ๋ฒ•๊นŒ์ง„ ์•„์‹œ์ฃ ? 
   ์„ค๋งˆ ํ…Œ์ด๋ธ” ์ƒ์„ฑํ•˜๊ณ  'load data inpath~' ์ปค๋งจ๋”ฉ๊นŒ์ง€ ์“ฐ์‹œ์ง„ ์•Š์ฃ ?๐Ÿ™‚
Q: ์Œ... 'load data inpath'๋ฅผ ์‚ฌ์šฉํ•˜์ง€ ์•Š๊ณ  
   HDFS ๋ฐ์ดํ„ฐ๋ฅผ Hive์— ์ ์žฌํ•  ์ˆ˜๊ฐ€ ์žˆ๋‚˜์š”??๐Ÿ˜ฒ
A: Oh... jesus cr.. ์•„.. ๊ทธ๋Ÿด ์ˆ˜ ์žˆ์ฃ ๐Ÿ˜Œ 
   ๊ทธ๋Ÿผ 'load data inpath~' ์ปค๋งจ๋”ฉ ์—†์ด Hive ํ…Œ์ด๋ธ”๋กœ 
   ๋ฐ์ดํ„ฐ๋ฅผ ์ ์žฌํ•˜๋Š” ๋ฐฉ๋ฒ•๊ณผ 
   'Sqoop์„ ์‚ฌ์šฉํ•˜์—ฌ ๋ฐ์ดํ„ฐ ์ด์–ด์“ฐ๊ธฐ' ๋ฐฉ๋ฒ•์— ๋Œ€ํ•ด 
   ํ•œ ๋ฒˆ ๋„์ ์—ฌ๋“œ๋ฆฌ๊ฒ ์Šต๋‹ˆ๋‹ค.

๋ฌธ์ œ ํ•ด๊ฒฐ๐ŸŽŠ

  • 'load data inpath' ์—†์ด Hive ํ…Œ์ด๋ธ”๊ณผ ์—ฐ๋™
sqoop import 
      --username xxxx 
      --password xxxxx
      --num-mappers 1 
      --connect "jdbc:sqlserver://118.219.11.126:1566;database=ctf_spc" 
      --query "select a.xxxx, count(a.xx) as xxx  
               from aaaaa a with (nolock)
                    inner join bbbbb b with (nolock) 
                    on a.xxx = b.yyy and
                       a.xxx = b.yyy and
                       a.xxx = b.yyy and
                       a.xxx = b.yyy and
                       a.xxx = b.yyy and
                       a.xxx = b.yyy and
                       a.xxx = b.yyy
                    inner join ccccc c 
                    on b.yyy = c.zzz and
                       b.yyyy = c.zzzz and
                       b.yyyyy = c.zzzzz 
                    inner join ddddd d 
                    on b.yyy = d.qqq and
                       b.yyy = d.qqq and
                       b.yyy = d.qqq and
                       b.yyy = d.qqq and
                       b.yyy = d.qqq 
               where \$CONDITIONS 
               group by a.xxx" 
      --delete-target-dir
      --target-dir /date_data_count/ 
      --as-textfile
create external table seokjin_date_data_count (F_WORKDATE DATE, data_count INT) 
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' 
LINES TERMINATED BY '\n' 
stored as textfile 
location '/date_data_count';
A: ์šฐ์„  ์œ„ ์ฒซ ๋ฒˆ์งธ ์ปค๋งจ๋”ฉ์—์„œ '--delete-target-dir' ๊ณผ 
   '--target-dir' ์˜ต์…˜์„ ๋ด์ฃผ์„ธ์š”. 
   
   '--delete-target-dir'์€ ๋งˆ์ด๊ทธ๋ ˆ์ด์…˜ ํ•˜๊ธฐ ์ „, 
   ๋Œ€์ƒ ๋””๋ ‰ํ„ฐ๋ฆฌ๊ฐ€ ๊ธฐ ์กด์žฌํ•˜๋Š”์ง€ ํ™•์ธํ•˜์—ฌ ์‚ญ์ œํ•ด์ฃผ๋Š” ์˜ต์…˜์ด์—์š”. 
   
   ๊ทธ ํ›„, '--target-dir'์„ ์‚ฌ์šฉํ•˜์—ฌ ๋Œ€์ƒ ๋””๋ ‰ํ„ฐ๋ฆฌ๋ฅผ ์ƒ์„ฑํ•˜๊ณ 
   ๋งˆ์ด๊ทธ๋ ˆ์ด์…˜์„ ์ง„ํ–‰ํ•ฉ๋‹ˆ๋‹ค๐Ÿ˜
A: ๋‘ ๋ฒˆ์งธ ์ปค๋งจ๋”ฉ์ด ์งˆ๋ฌธ์ž ๋ถ„์ด ๊ถ๊ธˆํ•ดํ•˜์‹œ๋Š” 
   'load data inpath~' ์—†์ด 
   ๋‹ค์ด๋ ‰ํŠธ๋กœ Hive ํ…Œ์ด๋ธ”์—์„œ HDFS ๋‚ด ๋ฐ์ดํ„ฐ๋ฅผ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋Š” ์ž‘์—…๋‹ˆ๋‹ค. 
   
   ๋งˆ์ง€๋ง‰์— 'location~' ๊ตฌ๋ฌธ์„ ์ถ”๊ฐ€ํ•˜๋Š”๋ฐ 
   ์ด ๋•Œ, ๋กœ์ผ€์ด์…˜์€ 
   ์ฒซ ๋ฒˆ์งธ ์ปค๋งจ๋”ฉ์˜ '--target-dir'์—์„œ ์„ค์ •ํ•œ ๋””๋ ‰ํ„ฐ๋ฆฌ๋ช…๊ณผ
   ๋™์ผํ•˜๊ฒŒ ์„ค์ •ํ•ด์•ผํ•ฉ๋‹ˆ๋‹ค๐Ÿ˜๐Ÿ˜
  • ์ฒซ ๋งˆ์ด๊ทธ๋ ˆ์ด์…˜ ์ดํ›„, ๋ฐ์ดํ„ฐ ์ด์–ด์“ฐ๊ธฐ
sqoop import 
      --username xxxx 
      --password xxxxx
      --num-mappers 1 
      --connect "jdbc:sqlserver://118.219.11.126:1566;database=ctf_spc" 
      --query "select a.xxxx, count(a.xx) as xxx  
               from aaaaa a with (nolock)
                    inner join bbbbb b with (nolock) 
                    on a.xxx = b.yyy and
                       a.xxx = b.yyy and
                       a.xxx = b.yyy and
                       a.xxx = b.yyy and
                       a.xxx = b.yyy and
                       a.xxx = b.yyy and
                       a.xxx = b.yyy
                    inner join ccccc c 
                    on b.yyy = c.zzz and
                       b.yyyy = c.zzzz and
                       b.yyyyy = c.zzzzz 
                    inner join ddddd d 
                    on b.yyy = d.qqq and
                       b.yyy = d.qqq and
                       b.yyy = d.qqq and
                       b.yyy = d.qqq and
                       b.yyy = d.qqq 
               where a.DATE = '2022-07-21' and
               \$CONDITIONS 
               group by a.xxx" 
      --target-dir /date_data_count/ 
      --as-textfile
      --append
A: ์ถœ๊ฐ„ ์ด์œ ์—์„œ ๋ณด๋ฉด ์‹œ๊ณ„์—ด ๋ฐ์ดํ„ฐ ์‚ฌ์šฉ ์‹œ, ์ถ”์ฒœํ•œ๋‹ค๊ณ  ํ–ˆ์—ˆ์ฅฌ? 
   
   'where' ๊ตฌ๋ฌธ์„ ๋ณด์‹œ๋ฉด ํŠน์ • ๋‚ ์งœ๋ฅผ ์ถ”๊ฐ€ํ•˜๋ ค๊ณ  ํ•˜๋Š” ๋ชจ์Šต์ด ๋ณด์ž…๋‹ˆ๋‹ค. 
   
   ๊ทธ๋ฆฌ๊ณ  ๋ฐ์ดํ„ฐ๋ฅผ ์ด์–ด์“ฐ๋ ค๋Š” ๋ชฉ์ ์ด๊ธฐ ๋•Œ๋ฌธ์— 
   '--delete-target-dir' ์˜ต์…˜์ด ์‚ฌ๋ผ์ง€๊ณ , 
   
   ๋งˆ์ง€๋ง‰์— '--append' ์˜ต์…˜์ด ์ถ”๊ฐ€๋œ ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ์–ด์š”๐Ÿ˜

์œ„์˜ ๋ฐฉ์‹์„ ๋”ฐ๋ฅด๋ฉด, ๋”์ด์ƒ ๋ฐ์ดํ„ฐ ๋ฎ์–ด์“ฐ๊ธฐ๋กœ ์ธํ•ด ์†Œ์š”ํ•œ ์‹œ๊ฐ„๋“ค์„ ์•„๋‚„ ์ˆ˜ ์žˆ๋‹ต๋‹ˆ๋‹ค๐Ÿ˜
์‹œ๊ฐ„์€ ๊ธˆ์ด๋‹ˆ๊นŒ์š”~~ Time is Gold

profile
๋ฐ์ดํ„ฐ ์š”๋ฆฌ์‚ฌ

0๊ฐœ์˜ ๋Œ“๊ธ€