Problem
While creating spark sql session, received following error message.
Environment
Steps to reproduce
Root Cause Analysis
While creating spark sql session, received following error message.
Exception: Java gateway process exited before sending its port number
Environment
- OS - RHEL 7
- Jupyter notebook
Steps to reproduce
- configure jupyter notebook
- Start Jupyter
- Access Jupyter webpage
- Run program in notebook like below
from pyspark import SparkContext, SparkConf
from pyspark.sql import SparkSession
- now click on "run"
Solution
Add following in .bashrc and restart jupyter notebook
#For jupyter notebook:
export PYSPARK_SUBMIT_ARGS="--master yarn-client pyspark-shell"
If you look at the code below
it says connection file is not being created in tmp directory. In my case issue was environment variable "PYSPARK_SUBMIT_ARGS" was not setup. This caused to not to create "proc"
submit_args = os.environ.get("PYSPARK_SUBMIT_ARGS", "pyspark-shell")
...
command = command + shlex.split(submit_args)
...
proc = Popen(command, **popen_kwargs)
...
# Wait for the file to appear, or for the process to exit, whichever happens first.
while not proc.poll() and not os.path.isfile(conn_info_file):
time.sleep(0.1)
if not os.path.isfile(conn_info_file):
raise Exception("Java gateway process exited before sending its port number")
...
where to write the above code?
ReplyDeleteinside java_gateway.py?
You are getting this error because the environment variable HADOOP_CONF_DIR has not been set.
ReplyDeleteInside jupyter notebook try:
%env HADOOP_CONF_DIR=/{path_to_hadoop}/etc/hadoop
But JAVA_HOME should point to your directory.
ReplyDeleteimport os
os.environ["JAVA_HOME"] = "C:/Program\ Files/Java/jdk1.8.0_60"
Thanks for the great article!
ReplyDeleteRegards,
BroadMind - IELTS coaching in Madurai