Class CompressedSizeEstimatorSample
- java.lang.Object
- 
- org.apache.sysds.runtime.compress.estim.CompressedSizeEstimator
- 
- org.apache.sysds.runtime.compress.estim.CompressedSizeEstimatorSample
 
 
- 
 public class CompressedSizeEstimatorSample extends CompressedSizeEstimator 
- 
- 
Constructor SummaryConstructors Constructor Description CompressedSizeEstimatorSample(MatrixBlock data, CompressionSettings cs, int sampleSize, int k)CompressedSizeEstimatorSample, samples from the input data and estimates the size of the compressed matrix.
 - 
Method SummaryAll Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description CompressedSizeInfoColGroupgetColGroupInfo(int[] colIndexes, int estimate, int maxDistinct)A method to extract the Compressed Size Info for a given list of columns, This method further limits the estimated number of unique values, since in some cases the estimated number of uniques is estimated higher than the number estimated in sub groups of the given colIndexes.CompressedSizeInfoColGroupgetDeltaColGroupInfo(int[] colIndexes, int estimate, int maxDistinct)A method to extract the Compressed Size Info for a given list of columns, This method further limits the estimated number of unique values, since in some cases the estimated number of uniques is estimated higher than the number estimated in sub groups of the given colIndexes.static int[]getSortedSample(int range, int sampleSize, long seed, int k)StringtoString()- 
Methods inherited from class org.apache.sysds.runtime.compress.estim.CompressedSizeEstimatorclearNNZ, combine, combine, computeCompressedSizeInfos, getColGroupInfo, getDeltaColGroupInfo
 
- 
 
- 
- 
- 
Constructor Detail- 
CompressedSizeEstimatorSamplepublic CompressedSizeEstimatorSample(MatrixBlock data, CompressionSettings cs, int sampleSize, int k) CompressedSizeEstimatorSample, samples from the input data and estimates the size of the compressed matrix.- Parameters:
- data- The input data toSample from
- cs- The Settings used for the sampling, and compression, contains information such as seed.
- sampleSize- The size to sample from the data.
- k- The parallelization degree allowed.
 
 
- 
 - 
Method Detail- 
getColGroupInfopublic CompressedSizeInfoColGroup getColGroupInfo(int[] colIndexes, int estimate, int maxDistinct) Description copied from class:CompressedSizeEstimatorA method to extract the Compressed Size Info for a given list of columns, This method further limits the estimated number of unique values, since in some cases the estimated number of uniques is estimated higher than the number estimated in sub groups of the given colIndexes.- Specified by:
- getColGroupInfoin class- CompressedSizeEstimator
- Parameters:
- colIndexes- The columns to extract compression information from
- estimate- An estimate of number of unique elements in these columns
- maxDistinct- The upper bound of unique elements allowed in the estimate, can be calculated from the number of unique elements estimated in sub columns multiplied together. This is flexible in the sense that if the sample is small then this unique can be manually edited like in CoCodeCostMatrixMult.
- Returns:
- The CompressedSizeInfoColGroup for the given column indexes.
 
 - 
getDeltaColGroupInfopublic CompressedSizeInfoColGroup getDeltaColGroupInfo(int[] colIndexes, int estimate, int maxDistinct) Description copied from class:CompressedSizeEstimatorA method to extract the Compressed Size Info for a given list of columns, This method further limits the estimated number of unique values, since in some cases the estimated number of uniques is estimated higher than the number estimated in sub groups of the given colIndexes. The Difference for this method is that it extract the values as delta values from the matrix block input.- Specified by:
- getDeltaColGroupInfoin class- CompressedSizeEstimator
- Parameters:
- colIndexes- The columns to extract compression information from
- estimate- An estimate of number of unique delta elements in these columns
- maxDistinct- The upper bound of unique elements allowed in the estimate, can be calculated from the number of unique elements estimated in sub columns multiplied together. This is flexible in the sense that if the sample is small then this unique can be manually edited like in CoCodeCostMatrixMult.
- Returns:
- The CompressedSizeInfoColGroup for the given column indexes.
 
 - 
getSortedSamplepublic static int[] getSortedSample(int range, int sampleSize, long seed, int k)
 
- 
 
-