Class CompressionSettings
- java.lang.Object
- 
- org.apache.sysds.runtime.compress.CompressionSettings
 
- 
 public class CompressionSettings extends Object Compression Settings class, used as a bundle of parameters inside the Compression framework. See CompressionSettingsBuilder for default non static parameters.
- 
- 
Field SummaryFields Modifier and Type Field Description booleanallowSharedDictionaryShare DDC Dictionaries between ColGroups.static intBITMAP_BLOCK_SZSize of the blocks used in a blocked bitmap representation.doublecoCodePercentageA Cocode parameter that differ in behavior based on compression method, in general it is a value that reflects aggressively likely coCoding is used.CoCoderFactory.PartitionerTypecolumnPartitionerThe selected method for column partitioning used in CoCoding compressed columnsCostEstimatorFactory.CostTypecostComputationTypeThe cost computation type for the compressionSampleEstimatorFactory.EstimationTypeestimationTypeThe sample type used for samplingbooleanisInSparkInstructionIs a spark instructionbooleanlossyTrue if lossy compression is enabledintmaxColGroupCoCodeThe maximum number of columns CoCoded allowedintmaxSampleSizeThe maximum size of the sample extracted.doubleminimumCompressionRatioThe minimum compression ratio to achieve.intminimumSampleSizeThe minimum size of the sample extracted.static intPAR_DDC_THRESHOLDParallelization threshold for DDC compressiondoublesamplePowerThe sampling ratio power to use when choosing sample size.doublesamplingRatioThe sampling ratio used when choosing ColGroups.InsertionSorterFactory.SORT_TYPEsdcSortTypeThe sorting type used in sorting/joining offsets to create SDC groupsintseedIf the seed is -1 then the system used system millisecond time and class hash for seeding.booleansortTuplesByFrequencySorting of values by physical length helps by 10-20%, especially for serial, while slight performance decrease for parallel incl multi-threaded, hence not applied for distributed operations (also because compression time + garbage collection increases)booleantransposedTranspose input matrix, to optimize access when extracting bitmaps.StringtransposeInputBoolean specifying which transpose setting is used, can be auto, true or falseEnumSet<AColGroup.CompressionType>validCompressionsValid Compressions List, containing the ColGroup CompressionTypes that are allowed to be used for the compression Default is to always allow for Uncompromisable ColGroup.
 
- 
- 
- 
Field Detail- 
PAR_DDC_THRESHOLDpublic static int PAR_DDC_THRESHOLD Parallelization threshold for DDC compression
 - 
BITMAP_BLOCK_SZpublic static final int BITMAP_BLOCK_SZ Size of the blocks used in a blocked bitmap representation. Note it is exactly Character.MAX_VALUE. This is not Character max value + 1 because it breaks the offsets in cases with fully dense values.- See Also:
- Constant Field Values
 
 - 
sortTuplesByFrequencypublic final boolean sortTuplesByFrequency Sorting of values by physical length helps by 10-20%, especially for serial, while slight performance decrease for parallel incl multi-threaded, hence not applied for distributed operations (also because compression time + garbage collection increases)
 - 
samplingRatiopublic final double samplingRatio The sampling ratio used when choosing ColGroups. Note that, default behavior is to use exact estimator if the number of elements is below 1000. DEPRECATED
 - 
samplePowerpublic final double samplePower The sampling ratio power to use when choosing sample size. This is used in accordance to the function: sampleSize += nRows^samplePower; The value is bounded to be in the range of 0 to 1, 1 giving a sample size of everything, and 0 adding 1.
 - 
allowSharedDictionarypublic final boolean allowSharedDictionary Share DDC Dictionaries between ColGroups.
 - 
transposeInputpublic final String transposeInput Boolean specifying which transpose setting is used, can be auto, true or false
 - 
seedpublic final int seed If the seed is -1 then the system used system millisecond time and class hash for seeding.
 - 
lossypublic final boolean lossy True if lossy compression is enabled
 - 
columnPartitionerpublic final CoCoderFactory.PartitionerType columnPartitioner The selected method for column partitioning used in CoCoding compressed columns
 - 
costComputationTypepublic final CostEstimatorFactory.CostType costComputationType The cost computation type for the compression
 - 
maxColGroupCoCodepublic final int maxColGroupCoCode The maximum number of columns CoCoded allowed
 - 
coCodePercentagepublic final double coCodePercentage A Cocode parameter that differ in behavior based on compression method, in general it is a value that reflects aggressively likely coCoding is used.
 - 
validCompressionspublic final EnumSet<AColGroup.CompressionType> validCompressions Valid Compressions List, containing the ColGroup CompressionTypes that are allowed to be used for the compression Default is to always allow for Uncompromisable ColGroup.
 - 
minimumSampleSizepublic final int minimumSampleSize The minimum size of the sample extracted.
 - 
maxSampleSizepublic final int maxSampleSize The maximum size of the sample extracted.
 - 
estimationTypepublic final SampleEstimatorFactory.EstimationType estimationType The sample type used for sampling
 - 
transposedpublic boolean transposed Transpose input matrix, to optimize access when extracting bitmaps. This setting is changed inside the script based on the transposeInput setting. This is intentionally left as a mutable value, since the transposition of the input matrix is decided in phase 3.
 - 
minimumCompressionRatiopublic final double minimumCompressionRatio The minimum compression ratio to achieve.
 - 
isInSparkInstructionpublic final boolean isInSparkInstruction Is a spark instruction
 - 
sdcSortTypepublic final InsertionSorterFactory.SORT_TYPE sdcSortType The sorting type used in sorting/joining offsets to create SDC groups
 
- 
 
-