Package org.apache.sysds.utils
Class DataAugmentation
- java.lang.Object
- 
- org.apache.sysds.utils.DataAugmentation
 
- 
 public class DataAugmentation extends Object 
- 
- 
Constructor SummaryConstructors Constructor Description DataAugmentation()
 - 
Method SummaryAll Methods Static Methods Concrete Methods Modifier and Type Method Description static FrameBlockdataCorruption(FrameBlock input, double pTypo, double pMiss, double pDrop, double pOut, double pSwap)This function returns a new frame block with error introduced in the data: Typos in string values, null values, outliers in numeric data and swapped elements.static FrameBlockmiss(FrameBlock frame, double pMiss, double pDrop)This function modifies the given, preprocessed frame block to add missing values to some of the rows, marking them with the label missing.static FrameBlockoutlier(FrameBlock frame, List<Integer> numerics, double pOut, double pPos, int times)This function modifies the given, preprocessed frame block to add outliers to some of the numeric data of the frame, adding or several times the standard deviation, and marking them with the label outlier.static FrameBlockpreprocessing(FrameBlock frame, List<Integer> numerics, List<Integer> strings, List<Integer> swappable)This function returns a new frame block with a labels column added, and build the lists with column index of the different types of data.static FrameBlockswap(FrameBlock frame, List<Integer> swappable, double pSwap)This function modifies the given, preprocessed frame block to add swapped fields of the same ValueType that are consecutive, marking them with the label swap.static FrameBlocktypos(FrameBlock frame, List<Integer> strings, double pTypo)This function modifies the given, preprocessed frame block to add typos to the string values, marking them with the label typos.
 
- 
- 
- 
Method Detail- 
dataCorruptionpublic static FrameBlock dataCorruption(FrameBlock input, double pTypo, double pMiss, double pDrop, double pOut, double pSwap) This function returns a new frame block with error introduced in the data: Typos in string values, null values, outliers in numeric data and swapped elements.- Parameters:
- input- Original frame block
- pTypo- Probability of introducing a typo in a row
- pMiss- Probability of introducing missing values in a row
- pDrop- Probability of dropping a value inside a row
- pOut- Probability of introducing outliers in a row
- pSwap- Probability swapping two elements in a row
- Returns:
- A new frameblock with corrupted elements
 
 - 
preprocessingpublic static FrameBlock preprocessing(FrameBlock frame, List<Integer> numerics, List<Integer> strings, List<Integer> swappable) This function returns a new frame block with a labels column added, and build the lists with column index of the different types of data.- Parameters:
- frame- Original frame block
- numerics- Empty list to return the numeric positions
- strings- Empty list to return the string positions
- swappable- Empty list to return the swappable positions
- Returns:
- A new frameblock with a labels column
 
 - 
typospublic static FrameBlock typos(FrameBlock frame, List<Integer> strings, double pTypo) This function modifies the given, preprocessed frame block to add typos to the string values, marking them with the label typos.- Parameters:
- frame- Original frame block
- strings- List with the columns of string type that can be changed, generated during preprocessing or manually selected
- pTypo- Probability of adding a typo to a row
- Returns:
- A new frameblock with typos
 
 - 
misspublic static FrameBlock miss(FrameBlock frame, double pMiss, double pDrop) This function modifies the given, preprocessed frame block to add missing values to some of the rows, marking them with the label missing.- Parameters:
- frame- Original frame block
- pMiss- Probability of adding missing values to a row
- pDrop- Probability of dropping a value
- Returns:
- A new frameblock with missing values
 
 - 
outlierpublic static FrameBlock outlier(FrameBlock frame, List<Integer> numerics, double pOut, double pPos, int times) This function modifies the given, preprocessed frame block to add outliers to some of the numeric data of the frame, adding or several times the standard deviation, and marking them with the label outlier.- Parameters:
- frame- Original frame block
- numerics- List with the columns of numeric type that can be changed, generated during preprocessing or manually selected
- pOut- Probability of introducing an outlier in a row
- pPos- Probability of using positive deviation
- times- Times the standard deviation is added
- Returns:
- A new frameblock with outliers
 
 - 
swappublic static FrameBlock swap(FrameBlock frame, List<Integer> swappable, double pSwap) This function modifies the given, preprocessed frame block to add swapped fields of the same ValueType that are consecutive, marking them with the label swap.- Parameters:
- frame- Original frame block
- swappable- List with the columns that are swappable, generated during preprocessing
- pSwap- Probability of swapping two fields in a row
- Returns:
- A new frameblock with swapped elements
 
 
- 
 
-