Here is a simple example of accumulator variable that is used by multiple workers and return an accumulative value at the end.
A couple of notes:
>>> acc = sc.accumulator(5)
>>> def f(x):
... global acc
... acc += x
>>> rdd = sc.parallelize([1,2,4,1])
>>> rdd.foreach(f)
>>> acc.value
13
A couple of notes:
- Tasks in the worker can not access the Accumulator values
- Tasks see accumulator as write only
- Mostly used for debugging puerpose
No comments:
Post a Comment