PySpark cannot run with different minor versions.

Lยท2022๋…„ 8์›” 3์ผ
0
post-thumbnail

๋…์ž ๋Œ€์ƒ๐Ÿ“ฃ

๐Ÿ‘‰ Docker์—์„œ PySpark๋ฅผ ์‚ฌ์šฉํ•˜์‹œ๋Š” ๋ถ„
๐Ÿ‘‰ PySpark๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ HDFS์— ๋ฐ์ดํ„ฐ๋ฅผ ์ €์žฅํ•˜์‹œ๋ ค๋Š” ๋ถ„


์ถœ๊ฐ„ ์ด์œ โ“

PySpark๋กœ HDFS์— ๋ฐ์ดํ„ฐ ์ €์žฅ ์‹œ, ์—๋Ÿฌ ๋ฐœ์ƒ ํ™•์ธ


๋ฌธ์ œ ํŒŒ์•…๐Ÿ”

RuntimeError: Python in worker has different version 3.6 than that in driver 3.7, PySpark cannot run with different minor versions. Please check environment variables PYSPARK_PYTHON and PYSPARK_DRIVER_PYTHON are correctly set.
Q: Apache Spark๋กœ master 1๊ธฐ, worker 3๊ธฐ ํด๋Ÿฌ์Šคํ„ฐ๋ฅผ ๊ตฌ์ถ•ํ•ด์„œ
   Hive(HDFS) ๋ฐ์ดํ„ฐ๋ฅผ ์ฒ˜๋ฆฌํ•˜๊ณ  ์ ์žฌํ•˜๋˜ ์ค‘,
   ์ƒ๊ธฐ ์˜ค๋ฅ˜๊ฐ€ ๋ฐœ์ƒํ•ฉ๋‹ˆ๋‹ค.๐Ÿ˜ฅ

๋ฌธ์ œ ํ•ด๊ฒฐ๐ŸŽŠ

A: Spark master, worker ์ปจํ…Œ์ด๋„ˆ(SC)์—์„œ ์‚ฌ์šฉํ•˜๋Š” ํŒŒ์ด์ฌ ๋ฒ„์ „๊ณผ
   PySpark๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ์ปจํ…Œ์ด๋„ˆ(PSC)์—์„œ ์‚ฌ์šฉํ•˜๋Š” ํŒŒ์ด์ฌ ๋ฒ„์ „์ด
   ์ƒ์ดํ•˜์—ฌ ๋ฐœ์ƒํ•˜๋Š” ์—๋Ÿฌ์ž…๋‹ˆ๋‹ค.๐Ÿ˜€
   
   ํŒŒ์ด์ฌ ๋ฒ„์ „๋งŒ ๋งž์ถฐ์ฃผ๋ฉด ๋˜๋Š”๋ฐ,
   ์ƒ๊ธฐ ์—๋Ÿฌ๋ฅผ ๋ณด์•„, SC์—์„œ ์‚ฌ์šฉํ•˜๋Š” ํŒŒ์ด์ฌ ๋ฒ„์ „์€ 3.7,
   PSC์—์„œ ์‚ฌ์šฉํ•˜๋Š” ํŒŒ์ด์ฌ ๋ฒ„์ „์€ 3.6 ๋ฒ„์ „์œผ๋กœ ์ถ”์ •๋ฉ๋‹ˆ๋‹ค.
   
   SC์˜ ๋ชจ๋“  ์ปจํ…Œ์ด๋„ˆ์—์„œ ํŒŒ์ด์ฌ ๋ฒ„์ „์„ ๋ณ€๊ฒฝํ•˜๋Š” ๊ฒƒ์€ ๋ฒˆ๊ฑฐ๋กœ์šฐ๋‹ˆ,
   PSC์—์„œ ํŒŒ์ด์ฌ ๋ฒ„์ „์„ ์žฌ์„ค์ •ํ•˜๋Š” ๊ฒƒ์„ ๊ถŒ์žฅํ•ฉ๋‹ˆ๋‹ค.
   
   ํ•˜๋‹จ์˜ ๋‚ด์šฉ์„ ์ฐธ๊ณ ํ•˜์—ฌ ์ด Disgustingํ•œ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜์‹ญ์‹œ์˜คโ—โ—
  1. ํŒŒ์ด์ฌ 3.7 ์„ค์น˜
    $ apt install python3.7
  2. PySpark๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๋ชจ๋“ˆ์— ํ•˜๊ธฐ ์ฝ”๋“œ ์ถ”๊ฐ€
    ~
    import os
    os.environ["PYSPARK_PYTHON"] = "/usr/bin/python3.7"
    os.environ["PYSPARK_DRIVER_PYTHON"] = "/usr/bin/python3.7"
    ~
profile
๋ฐ์ดํ„ฐ ์š”๋ฆฌ์‚ฌ

0๊ฐœ์˜ ๋Œ“๊ธ€