Spark
From UFAL AIC
Apache Spark: Framework for Distributed Computations
Apache Spark is a framework for distributed computations. Natively it works in Python, Scala, and Java.
Apart from embarrassingly parallel computations, Spark framework is suitable for in-memory and/or iterative computations, making it suitable even for machine learning and complex data processing. (The Spark framework shares some underlying implementation with Hadoop, but it is quite different – Hadoop framework does not offer in-memory computations and has only limited support for iterative computations.)
The Spark framework can run either locally using one thread, locally using multiple threads or in a distributed fashion.
Current Version as of Nov 2023 is Spark 3.5.0
Initial Configuration
To use Spark on AIC, you need to add the following to your .profile
export PATH="/lnet/aic/data/spark/bin:/lnet/aic/data/spark/slurm:/lnet/aic/data/spark/sbt/bin:$PATH"