

/home/kuba/.virtualenvs/pyspark-learning is my virtualenv main directory.PySpark would start with Python's default shell again.Īfter some trials & errors I came to conclusion that PySpark would load with IPython shell if I executed pyspark file directly from installed package location (and not from path/to/venv/bin/pyspark) while setting PYSPARK_DRIVER_PYTHON at the same time. I added PYSPARK_DRIVER_PYTHON=ipython there but still, that did not work as expected. When you start your virtualenv on UNIX based systems, you'd usually do that by running source /path/to/venv/bin/activate.Īny variables that you define inside activate file will be set and available for you after your environment is started. PySpark's documentation ( available here for 3.1.1 version) advises to set PYSPARK_DRIVER_PYTHON variable to ipython. When running pyspark command, PySpark would again start inside default's Python shell. I'm working on Ubuntu 20.04 system and zsh shell so I set this variable in ~/.zshrc file as following: Once you have Java installed on your machine, you need to set an environment variable JAVA_HOME.

There is one dependency though that you'll need to take care of to make it work. Therefore, installing it with pip should be pretty straightforward. Python's binding for Apache Spark is called PySpark and is available at PyPi. Sticking with Python seems most reasonable. I was wondering about Scala as well but I only had a short episode with this language in one of my previous jobs and I haven't used it since then. Python seems the best choice for me at the moment because I use it both at job and for my private projects. Choice of the programming languageĪpache Spark offers high-level APIs for Java, Scala, Python, R.
#Pycharm venv series
Hi, I'm starting a new series of short posts that will describe the process of me learning Apache Spark.
