Difference between revisions of "Spark"
From UFAL AIC
(Create initial Spark page. More content to come.) |
m (Use Apache Spark in the Title.) |
||
Line 1: | Line 1: | ||
− | = Spark: Framework for Distributed Computations = | + | = Apache Spark: Framework for Distributed Computations = |
[https://spark.apache.org Apache Spark] is a framework for distributed computations. Natively it works in Python, Scala, and Java. | [https://spark.apache.org Apache Spark] is a framework for distributed computations. Natively it works in Python, Scala, and Java. |
Revision as of 18:06, 13 November 2023
Apache Spark: Framework for Distributed Computations
Apache Spark is a framework for distributed computations. Natively it works in Python, Scala, and Java.
Apart from embarrassingly parallel computations, Spark framework is suitable for in-memory and/or iterative computations, making it suitable even for machine learning and complex data processing. (The Spark framework shares some underlying implementation with Hadoop, but it is quite different – Hadoop framework does not offer in-memory computations and has only limited support for iterative computations.)
The Spark framework can run either locally using one thread, locally using multiple threads or in a distributed fashion.
Current Version as of Nov 2023 is Spark 3.5.0
Initial Configuration
To use Spark on AIC, you need to add the following to your .profile
export PATH="/lnet/aic/data/spark/bin:/lnet/aic/data/spark/slurm:/lnet/aic/data/spark/sbt/bin:$PATH"