Revision as of 18:06, 13 November 2023

Apache Spark: Framework for Distributed Computations

Apache Spark is a framework for distributed computations. Natively it works in Python, Scala, and Java.

Apart from embarrassingly parallel computations, Spark framework is suitable for in-memory and/or iterative computations, making it suitable even for machine learning and complex data processing. (The Spark framework shares some underlying implementation with Hadoop, but it is quite different – Hadoop framework does not offer in-memory computations and has only limited support for iterative computations.)

The Spark framework can run either locally using one thread, locally using multiple threads or in a distributed fashion.

Current Version as of Nov 2023 is Spark 3.5.0

Initial Configuration

To use Spark on AIC, you need to add the following to your .profile

export PATH="/lnet/aic/data/spark/bin:/lnet/aic/data/spark/slurm:/lnet/aic/data/spark/sbt/bin:$PATH"

Revision as of 18:05, 13 November 2023 (view source) Straka (talk \| contribs) (Create initial Spark page. More content to come.)		Revision as of 18:06, 13 November 2023 (view source) Straka (talk \| contribs) m (Use Apache Spark in the Title.) Newer edit →
Line 1:		Line 1:
−	= Spark: Framework for Distributed Computations =	+	= Apache Spark: Framework for Distributed Computations =

	[https://spark.apache.org Apache Spark] is a framework for distributed computations. Natively it works in Python, Scala, and Java.		[https://spark.apache.org Apache Spark] is a framework for distributed computations. Natively it works in Python, Scala, and Java.

Anonymous

Search

Difference between revisions of "Spark"

Namespaces

More

Page actions

Revision as of 18:06, 13 November 2023

Apache Spark: Framework for Distributed Computations

Current Version as of Nov 2023 is Spark 3.5.0

Initial Configuration

Navigation

Navigation

MediaWiki

Wiki tools

Wiki tools

Anonymous

Search

Difference between revisions of "Spark"

Revision as of 18:06, 13 November 2023

Apache Spark: Framework for Distributed Computations

Current Version as of Nov 2023 is Spark 3.5.0

Initial Configuration

Navigation

Wiki tools

Page tools