Sunday, June 23, 2019

Java gateway process exited before sending its port number

Problem
While creating spark sql session, received following error message.

Exception: Java gateway process exited before sending its port number

Environment

  • OS - RHEL 7
  • Jupyter notebook


Steps to reproduce

  • configure jupyter notebook
  • Start Jupyter
  • Access Jupyter webpage
  • Run program in notebook like below


from pyspark import SparkContext, SparkConf
from pyspark.sql import SparkSession


  • now click on "run"
Solution
Add following in .bashrc and restart jupyter notebook

#For jupyter notebook:
export PYSPARK_SUBMIT_ARGS="--master yarn-client pyspark-shell"
Root Cause Analysis
If you look at the code below

it says connection file is not being created in tmp directory. In my case issue was environment variable "PYSPARK_SUBMIT_ARGS" was not setup. This caused to not to create "proc"

submit_args = os.environ.get("PYSPARK_SUBMIT_ARGS", "pyspark-shell")
...
command = command + shlex.split(submit_args)
...
proc = Popen(command, **popen_kwargs)

...
# Wait for the file to appear, or for the process to exit, whichever happens first. while not proc.poll() and not os.path.isfile(conn_info_file): time.sleep(0.1) if not os.path.isfile(conn_info_file): raise Exception("Java gateway process exited before sending its port number")
...