Class CompressedSizeEstimator
- java.lang.Object
- 
- org.apache.sysds.runtime.compress.estim.CompressedSizeEstimator
 
- 
- Direct Known Subclasses:
- CompressedSizeEstimatorExact,- CompressedSizeEstimatorSample
 
 public abstract class CompressedSizeEstimator extends Object Main abstract class for estimating size of compressions on columns.
- 
- 
Method SummaryAll Methods Instance Methods Abstract Methods Concrete Methods Modifier and Type Method Description voidclearNNZ()Clear the pointer to the materialized list of nnz in columnsCompressedSizeInfoColGroupcombine(int[] combinedColumns, CompressedSizeInfoColGroup g1, CompressedSizeInfoColGroup g2)Combine two analyzed column groups together.CompressedSizeInfoColGroupcombine(CompressedSizeInfoColGroup g1, CompressedSizeInfoColGroup g2)combine two analyzed column groups together.CompressedSizeInfocomputeCompressedSizeInfos(int k)Multi threaded version of extracting compression size infoCompressedSizeInfoColGroupgetColGroupInfo(int[] colIndexes)Method for extracting Compressed Size Info of specified columns, together in a single ColGroupabstract CompressedSizeInfoColGroupgetColGroupInfo(int[] colIndexes, int estimate, int nrUniqueUpperBound)A method to extract the Compressed Size Info for a given list of columns, This method further limits the estimated number of unique values, since in some cases the estimated number of uniques is estimated higher than the number estimated in sub groups of the given colIndexes.CompressedSizeInfoColGroupgetDeltaColGroupInfo(int[] colIndexes)Method for extracting info of specified columns as delta encodings (delta from previous rows values)abstract CompressedSizeInfoColGroupgetDeltaColGroupInfo(int[] colIndexes, int estimate, int nrUniqueUpperBound)A method to extract the Compressed Size Info for a given list of columns, This method further limits the estimated number of unique values, since in some cases the estimated number of uniques is estimated higher than the number estimated in sub groups of the given colIndexes.
 
- 
- 
- 
Method Detail- 
computeCompressedSizeInfospublic final CompressedSizeInfo computeCompressedSizeInfos(int k) Multi threaded version of extracting compression size info- Parameters:
- k- The concurrency degree.
- Returns:
- The Compression Size info of each Column compressed isolated.
 
 - 
getColGroupInfopublic final CompressedSizeInfoColGroup getColGroupInfo(int[] colIndexes) Method for extracting Compressed Size Info of specified columns, together in a single ColGroup- Parameters:
- colIndexes- The columns to group together inside a ColGroup
- Returns:
- The CompressedSizeInformation associated with the selected ColGroups.
 
 - 
getColGroupInfopublic abstract CompressedSizeInfoColGroup getColGroupInfo(int[] colIndexes, int estimate, int nrUniqueUpperBound) A method to extract the Compressed Size Info for a given list of columns, This method further limits the estimated number of unique values, since in some cases the estimated number of uniques is estimated higher than the number estimated in sub groups of the given colIndexes.- Parameters:
- colIndexes- The columns to extract compression information from
- estimate- An estimate of number of unique elements in these columns
- nrUniqueUpperBound- The upper bound of unique elements allowed in the estimate, can be calculated from the number of unique elements estimated in sub columns multiplied together. This is flexible in the sense that if the sample is small then this unique can be manually edited like in CoCodeCostMatrixMult.
- Returns:
- The CompressedSizeInfoColGroup for the given column indexes.
 
 - 
getDeltaColGroupInfopublic final CompressedSizeInfoColGroup getDeltaColGroupInfo(int[] colIndexes) Method for extracting info of specified columns as delta encodings (delta from previous rows values)- Parameters:
- colIndexes- The columns to group together inside a ColGroup
- Returns:
- The CompressedSizeInformation assuming delta encoding of the column.
 
 - 
getDeltaColGroupInfopublic abstract CompressedSizeInfoColGroup getDeltaColGroupInfo(int[] colIndexes, int estimate, int nrUniqueUpperBound) A method to extract the Compressed Size Info for a given list of columns, This method further limits the estimated number of unique values, since in some cases the estimated number of uniques is estimated higher than the number estimated in sub groups of the given colIndexes. The Difference for this method is that it extract the values as delta values from the matrix block input.- Parameters:
- colIndexes- The columns to extract compression information from
- estimate- An estimate of number of unique delta elements in these columns
- nrUniqueUpperBound- The upper bound of unique elements allowed in the estimate, can be calculated from the number of unique elements estimated in sub columns multiplied together. This is flexible in the sense that if the sample is small then this unique can be manually edited like in CoCodeCostMatrixMult.
- Returns:
- The CompressedSizeInfoColGroup for the given column indexes.
 
 - 
combinepublic final CompressedSizeInfoColGroup combine(CompressedSizeInfoColGroup g1, CompressedSizeInfoColGroup g2) combine two analyzed column groups together. without materializing the dictionaries of either side. if the number of distinct elements in both sides multiplied is larger than Integer, return null. If either side was constructed without analysis then fall back to default materialization of double arrays. O- Parameters:
- g1- First group
- g2- Second group
- Returns:
- A combined compressed size estimation for the group.
 
 - 
combinepublic final CompressedSizeInfoColGroup combine(int[] combinedColumns, CompressedSizeInfoColGroup g1, CompressedSizeInfoColGroup g2) Combine two analyzed column groups together. without materializing the dictionaries of either side. if the number of distinct elements in both sides multiplied is larger than Integer, return null. If either side was constructed without analysis then fall back to default materialization of double arrays.- Parameters:
- combinedColumns- The combined column indexes.
- g1- First group
- g2- Second group
- Returns:
- A combined compressed size estimation for the columns specified using the combining algorithm
 
 - 
clearNNZpublic void clearNNZ() Clear the pointer to the materialized list of nnz in columns
 
- 
 
-