安装spark

ubuntu

Note: Migrate to Debian before Ubuntu 20.04 ends support (April, 2025).

WSL: Ubuntu 18.04.1

Prepare

Java 8+, Python 2.7+/3.4+ and R 3.1+

Spark runs on Java 8+, Python 2.7+/3.4+ and R 3.1+.
For the Scala API, Spark 2.4.1 uses Scala 2.12.
You will need to use a compatible Scala version (2.12.x).

1. java

设置OpenJDK - Ubuntu

2. Python

sudo apt-get install python

3. R language

1) add ppa

sudo add-apt-repository ppa:marutter/rrutter
sudo apt-get update

2) install

sudo apt-get install r-base r-base-dev

Installation

1. unzip

~/spark-2.4.1-bin-hadoop2.7

2. test

cd ~/spark-2.4.1-bin-hadoop2.7

1) pi

./bin/run-example SparkPi 10

2) Scala shell

./bin/spark-shell --master local[2]

input:

for (i <- 1 to 3; j <- 1 to 3) print(10 * i + j + "\t")

quit:

:q

3) pyspark

./bin/spark-submit examples/src/main/python/pi.py 10

4) sparkR

./bin/sparkR --master local[2]

input:

print(matrix(c(.3,  .6,  .9, .3 + .6)), digits = 18)

quit:

q()

5) R example

./bin/spark-submit examples/src/main/r/dataframe.R

run

cd ~/spark-2.4.1-bin-hadoop2.7
./sbin/start-master.sh

web: http://localhost:8080/ spark info

Reference