Spark: Framework for Distributed Computations

Apache Spark is a framework for distributed computations. Natively it works in Python, Scala, and Java.

Apart from embarrassingly parallel computations, Spark framework is suitable for in-memory and/or iterative computations, making it suitable even for machine learning and complex data processing. (The Spark framework shares some underlying implementation with Hadoop, but it is quite different – Hadoop framework does not offer in-memory computations and has only limited support for iterative computations.)

The Spark framework can run either locally using one thread, locally using multiple threads or in a distributed fashion.

Current Version as of Nov 2023 is Spark 3.5.0

Initial Configuration

To use Spark on AIC, you need to add the following to your .profile

export PATH="/lnet/aic/data/spark/bin:/lnet/aic/data/spark/slurm:/lnet/aic/data/spark/sbt/bin:$PATH"

Anonymous

Search

Spark

Namespaces

More

Page actions

Spark: Framework for Distributed Computations

Current Version as of Nov 2023 is Spark 3.5.0

Initial Configuration

Navigation

Navigation

MediaWiki

Wiki tools

Wiki tools

Anonymous

Search

Spark

Spark: Framework for Distributed Computations

Current Version as of Nov 2023 is Spark 3.5.0

Initial Configuration

Navigation

Wiki tools

Page tools