Class RDDAggregateUtils
- java.lang.Object
- 
- org.apache.sysds.runtime.instructions.spark.utils.RDDAggregateUtils
 
- 
 public class RDDAggregateUtils extends Object Collection of utility methods for aggregating binary block rdds. As a general policy always call stable algorithms which maintain corrections over blocks per key. The performance overhead over a simple reducebykey is roughly 7-10% and with that acceptable.
- 
- 
Constructor SummaryConstructors Constructor Description RDDAggregateUtils()
 - 
Method SummaryAll Methods Static Methods Concrete Methods Modifier and Type Method Description static org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock>aggByKeyStable(org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock> in, AggregateOperator aop)static org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock>aggByKeyStable(org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock> in, AggregateOperator aop, boolean deepCopyCombiner)static org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock>aggByKeyStable(org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock> in, AggregateOperator aop, int numPartitions, boolean deepCopyCombiner)static MatrixBlockaggStable(org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock> in, AggregateOperator aop)Single block aggregation over pair rdds with corrections for numerical stability.static MatrixBlockaggStable(org.apache.spark.api.java.JavaRDD<MatrixBlock> in, AggregateOperator aop)Single block aggregation over rdds with corrections for numerical stability.static TensorBlockaggStableTensor(org.apache.spark.api.java.JavaPairRDD<TensorIndexes,TensorBlock> in, AggregateOperator aop)Single block aggregation over pair rdds with corrections for numerical stability.static TensorBlockaggStableTensor(org.apache.spark.api.java.JavaRDD<TensorBlock> in, AggregateOperator aop)Single block aggregation over rdds with corrections for numerical stability.static doublemax(org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock> in)static org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock>mergeByKey(org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock> in)Merges disjoint data of all blocks per key.static org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock>mergeByKey(org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock> in, boolean deepCopyCombiner)Merges disjoint data of all blocks per key.static org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock>mergeByKey(org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock> in, int numPartitions, boolean deepCopyCombiner)Merges disjoint data of all blocks per key.static org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock>mergeRowsByKey(org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,RowMatrixBlock> in)Merges disjoint data of all blocks per key.static org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock>sumByKeyStable(org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock> in)static org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock>sumByKeyStable(org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock> in, boolean deepCopyCombiner)static org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock>sumByKeyStable(org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock> in, int numPartitions, boolean deepCopyCombiner)static org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,Double>sumCellsByKeyStable(org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,Double> in)static org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,Double>sumCellsByKeyStable(org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,Double> in, int numParts)static MatrixBlocksumStable(org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock> in)static MatrixBlocksumStable(org.apache.spark.api.java.JavaRDD<MatrixBlock> in)
 
- 
- 
- 
Method Detail- 
sumStablepublic static MatrixBlock sumStable(org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock> in) 
 - 
sumStablepublic static MatrixBlock sumStable(org.apache.spark.api.java.JavaRDD<MatrixBlock> in) 
 - 
sumByKeyStablepublic static org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock> sumByKeyStable(org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock> in) 
 - 
sumByKeyStablepublic static org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock> sumByKeyStable(org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock> in, boolean deepCopyCombiner) 
 - 
sumByKeyStablepublic static org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock> sumByKeyStable(org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock> in, int numPartitions, boolean deepCopyCombiner) 
 - 
sumCellsByKeyStablepublic static org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,Double> sumCellsByKeyStable(org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,Double> in) 
 - 
sumCellsByKeyStablepublic static org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,Double> sumCellsByKeyStable(org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,Double> in, int numParts) 
 - 
aggStablepublic static MatrixBlock aggStable(org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock> in, AggregateOperator aop) Single block aggregation over pair rdds with corrections for numerical stability.- Parameters:
- in- matrix as- JavaPairRDD<MatrixIndexes, MatrixBlock>
- aop- aggregate operator
- Returns:
- matrix block
 
 - 
aggStablepublic static MatrixBlock aggStable(org.apache.spark.api.java.JavaRDD<MatrixBlock> in, AggregateOperator aop) Single block aggregation over rdds with corrections for numerical stability.- Parameters:
- in- matrix as- JavaRDD<MatrixBlock>
- aop- aggregate operator
- Returns:
- matrix block
 
 - 
aggStableTensorpublic static TensorBlock aggStableTensor(org.apache.spark.api.java.JavaPairRDD<TensorIndexes,TensorBlock> in, AggregateOperator aop) Single block aggregation over pair rdds with corrections for numerical stability.- Parameters:
- in- tensor as- JavaPairRDD<TensorIndexes, TensorBlock>
- aop- aggregate operator
- Returns:
- tensor block
 
 - 
aggStableTensorpublic static TensorBlock aggStableTensor(org.apache.spark.api.java.JavaRDD<TensorBlock> in, AggregateOperator aop) Single block aggregation over rdds with corrections for numerical stability.- Parameters:
- in- tensor as- JavaRDD<TensorBlock>
- aop- aggregate operator
- Returns:
- tensor block
 
 - 
aggByKeyStablepublic static org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock> aggByKeyStable(org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock> in, AggregateOperator aop) 
 - 
aggByKeyStablepublic static org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock> aggByKeyStable(org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock> in, AggregateOperator aop, boolean deepCopyCombiner) 
 - 
aggByKeyStablepublic static org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock> aggByKeyStable(org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock> in, AggregateOperator aop, int numPartitions, boolean deepCopyCombiner) 
 - 
maxpublic static double max(org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock> in) 
 - 
mergeByKeypublic static org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock> mergeByKey(org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock> in) Merges disjoint data of all blocks per key. Note: The behavior of this method is undefined for both sparse and dense data if the assumption of disjoint data is violated.- Parameters:
- in- matrix as- JavaPairRDD<MatrixIndexes, MatrixBlock>
- Returns:
- matrix as JavaPairRDD<MatrixIndexes, MatrixBlock>
 
 - 
mergeByKeypublic static org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock> mergeByKey(org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock> in, boolean deepCopyCombiner) Merges disjoint data of all blocks per key. Note: The behavior of this method is undefined for both sparse and dense data if the assumption of disjoint data is violated.- Parameters:
- in- matrix as- JavaPairRDD<MatrixIndexes, MatrixBlock>
- deepCopyCombiner- indicator if the createCombiner functions needs to deep copy the input block
- Returns:
- matrix as JavaPairRDD<MatrixIndexes, MatrixBlock>
 
 - 
mergeByKeypublic static org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock> mergeByKey(org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock> in, int numPartitions, boolean deepCopyCombiner) Merges disjoint data of all blocks per key. Note: The behavior of this method is undefined for both sparse and dense data if the assumption of disjoint data is violated.- Parameters:
- in- matrix as- JavaPairRDD<MatrixIndexes, MatrixBlock>
- numPartitions- number of output partitions
- deepCopyCombiner- indicator if the createCombiner functions needs to deep copy the input block
- Returns:
- matrix as JavaPairRDD<MatrixIndexes, MatrixBlock>
 
 - 
mergeRowsByKeypublic static org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,MatrixBlock> mergeRowsByKey(org.apache.spark.api.java.JavaPairRDD<MatrixIndexes,RowMatrixBlock> in) Merges disjoint data of all blocks per key. Note: The behavior of this method is undefined for both sparse and dense data if the assumption of disjoint data is violated.- Parameters:
- in- matrix as- JavaPairRDD<MatrixIndexes, RowMatrixBlock>
- Returns:
- matrix as JavaPairRDD<MatrixIndexes, MatrixBlock>
 
 
- 
 
-