Saturday, July 25, 2015

Simple Example of Using Accumulators in Python Spark

Here is a simple example of accumulator variable that is used by multiple workers and return an accumulative value at the end.


>>> acc = sc.accumulator(5)
>>> def f(x):
...     global acc
...     acc += x
>>> rdd = sc.parallelize([1,2,4,1])
>>> rdd.foreach(f)
>>> acc.value
13


A couple of notes:

  • Tasks in the worker can not access the Accumulator values
  • Tasks see accumulator as write only
  • Mostly used for debugging puerpose

No comments:

Post a Comment