Some checklist to execute PySpark

Note:

This is only my personal note. I have time to make this post pretty. It's only enough for me to understand.
I may improve the post in the future, but this is enough at this point


Before running PySpark,  set these environment first:

To determine where to find the driver,
PYSPARK_PYTHON=/usr/bin/python3.5

Location of Yarn config
YARN_CONF_DIR=
HADOOP_CONF_DIR



Comments