If your results fit into node's memory, you can use coalesce(number_od_partitions) to bring all data into one node and then write it into your output will be one single file.
look at this example:
Here I want to parse a text file filter few lines based on a criteria and get the distinct number of first item in the file and at the end storing the list of the ids into one single output.
look at this example:
Here I want to parse a text file filter few lines based on a criteria and get the distinct number of first item in the file and at the end storing the list of the ids into one single output.
a.filter(lambda x : len(x.split('|')) >3 ).filter(lambda x : x.split('|')[2]=='SD').map(lambda x : x.split('|')[0]).distinct().coalesce(1).saveAsTextFile('/Downloads/spark-1.4.0-bin-hadoop2.6/ciks1')
No comments:
Post a Comment