Sunday, June 23, 2019

Java gateway process exited before sending its port number

Problem
While creating spark sql session, received following error message.

Exception: Java gateway process exited before sending its port number

Environment

  • OS - RHEL 7
  • Jupyter notebook


Steps to reproduce

  • configure jupyter notebook
  • Start Jupyter
  • Access Jupyter webpage
  • Run program in notebook like below


from pyspark import SparkContext, SparkConf
from pyspark.sql import SparkSession


  • now click on "run"
Solution
Add following in .bashrc and restart jupyter notebook

#For jupyter notebook:
export PYSPARK_SUBMIT_ARGS="--master yarn-client pyspark-shell"
Root Cause Analysis
If you look at the code below

it says connection file is not being created in tmp directory. In my case issue was environment variable "PYSPARK_SUBMIT_ARGS" was not setup. This caused to not to create "proc"

submit_args = os.environ.get("PYSPARK_SUBMIT_ARGS", "pyspark-shell")
...
command = command + shlex.split(submit_args)
...
proc = Popen(command, **popen_kwargs)

...
# Wait for the file to appear, or for the process to exit, whichever happens first. while not proc.poll() and not os.path.isfile(conn_info_file): time.sleep(0.1) if not os.path.isfile(conn_info_file): raise Exception("Java gateway process exited before sending its port number")
...

4 comments:

  1. where to write the above code?
    inside java_gateway.py?

    ReplyDelete
  2. You are getting this error because the environment variable HADOOP_CONF_DIR has not been set.
    Inside jupyter notebook try:
    %env HADOOP_CONF_DIR=/{path_to_hadoop}/etc/hadoop

    ReplyDelete
  3. But JAVA_HOME should point to your directory.

    import os
    os.environ["JAVA_HOME"] = "C:/Program\ Files/Java/jdk1.8.0_60"

    ReplyDelete
  4. Thanks for the great article!

    Regards,
    BroadMind - IELTS coaching in Madurai

    ReplyDelete