Note: Migrate to Debian before Ubuntu 20.04 ends support (April, 2025).
WSL: Ubuntu 18.04.1
Prepare
Java 8+, Python 2.7+/3.4+ and R 3.1+
Spark runs on Java 8+, Python 2.7+/3.4+ and R 3.1+.
For the Scala API, Spark 2.4.1 uses Scala 2.12.
You will need to use a compatible Scala version (2.12.x).
1. java
2. Python
sudo apt-get install python
3. R language
1) add ppa
sudo add-apt-repository ppa:marutter/rrutter
sudo apt-get update
2) install
sudo apt-get install r-base r-base-dev
Installation
1. unzip
~/spark-2.4.1-bin-hadoop2.7
2. test
cd ~/spark-2.4.1-bin-hadoop2.7
1) pi
./bin/run-example SparkPi 10
2) Scala shell
./bin/spark-shell --master local[2]
input:
for (i <- 1 to 3; j <- 1 to 3) print(10 * i + j + "\t")
quit:
:q
3) pyspark
./bin/spark-submit examples/src/main/python/pi.py 10
4) sparkR
./bin/sparkR --master local[2]
input:
print(matrix(c(.3, .6, .9, .3 + .6)), digits = 18)
quit:
q()
5) R example
./bin/spark-submit examples/src/main/r/dataframe.R
run
cd ~/spark-2.4.1-bin-hadoop2.7
./sbin/start-master.sh
web: http://localhost:8080/