Difference between revisions of "Spark"

From UFAL AIC
(Create initial Spark page. More content to come.)
 
m (Use Apache Spark in the Title.)
Line 1: Line 1:
= Spark: Framework for Distributed Computations =
+
= Apache Spark: Framework for Distributed Computations =
  
 
[https://spark.apache.org Apache Spark] is a framework for distributed computations. Natively it works in Python, Scala, and Java.
 
[https://spark.apache.org Apache Spark] is a framework for distributed computations. Natively it works in Python, Scala, and Java.

Revision as of 18:06, 13 November 2023

Apache Spark: Framework for Distributed Computations

Apache Spark is a framework for distributed computations. Natively it works in Python, Scala, and Java.

Apart from embarrassingly parallel computations, Spark framework is suitable for in-memory and/or iterative computations, making it suitable even for machine learning and complex data processing. (The Spark framework shares some underlying implementation with Hadoop, but it is quite different – Hadoop framework does not offer in-memory computations and has only limited support for iterative computations.)

The Spark framework can run either locally using one thread, locally using multiple threads or in a distributed fashion.

Current Version as of Nov 2023 is Spark 3.5.0

Initial Configuration

To use Spark on AIC, you need to add the following to your .profile

export PATH="/lnet/aic/data/spark/bin:/lnet/aic/data/spark/slurm:/lnet/aic/data/spark/sbt/bin:$PATH"