Some checklist to execute PySpark

May 05, 2019

Note:

This is only my personal note. I have time to make this post pretty. It's only enough for me to understand.
I may improve the post in the future, but this is enough at this point

Before running PySpark, set these environment first:

To determine where to find the driver,
PYSPARK_PYTHON=/usr/bin/python3.5

Location of Yarn config
YARN_CONF_DIR=
HADOOP_CONF_DIR

Living as as Data Scientist

Search This Blog

Some checklist to execute PySpark

Comments

Post a Comment