Package org.apache.sysds.runtime.util
Class DataConverter
- java.lang.Object
- 
- org.apache.sysds.runtime.util.DataConverter
 
- 
 public class DataConverter extends Object This class provides methods to read and write matrix blocks from to HDFS using different data formats. Those functionalities are used especially for CP read/write and exporting in-memory matrices to HDFS (before executing MR jobs).
- 
- 
Constructor SummaryConstructors Constructor Description DataConverter()
 - 
Method SummaryAll Methods Static Methods Concrete Methods Modifier and Type Method Description static org.apache.commons.math3.linear.Array2DRowRealMatrixconvertToArray2DRowRealMatrix(MatrixBlock mb)Helper method that converts SystemDS matrix variable (varname) into a Array2DRowRealMatrix format, which is useful in invoking Apache CommonsMath.static org.apache.commons.math3.linear.BlockRealMatrixconvertToBlockRealMatrix(MatrixBlock mb)static boolean[]convertToBooleanVector(MatrixBlock mb)static DenseBlockconvertToDenseBlock(MatrixBlock mb)static DenseBlockconvertToDenseBlock(MatrixBlock mb, boolean deep)static List<Double>convertToDoubleList(MatrixBlock mb)static double[][]convertToDoubleMatrix(MatrixBlock mb)Creates a two-dimensional double matrix of the input matrix block.static double[]convertToDoubleVector(MatrixBlock mb)static double[]convertToDoubleVector(MatrixBlock mb, boolean deep)static double[]convertToDoubleVector(MatrixBlock mb, boolean deep, boolean allowNull)static FrameBlockconvertToFrameBlock(String[][] data)Converts a two dimensions string array into a frame block of value type string.static FrameBlockconvertToFrameBlock(String[][] data, Types.ValueType[] schema)static FrameBlockconvertToFrameBlock(String[][] data, Types.ValueType[] schema, String[] colnames)static FrameBlockconvertToFrameBlock(MatrixBlock mb)Converts a matrix block into a frame block of value type double.static FrameBlockconvertToFrameBlock(MatrixBlock mb, Types.ValueType vt)Converts a matrix block into a frame block of a given value type.static FrameBlockconvertToFrameBlock(MatrixBlock mb, Types.ValueType[] schema)static int[]convertToIntVector(MatrixBlock mb)static long[]convertToLongVector(MatrixBlock mb)static MatrixBlockconvertToMatrixBlock(double[][] data)Creates a dense Matrix Block and copies the given double matrix into it.static MatrixBlockconvertToMatrixBlock(double[] data, boolean columnVector)Creates a dense Matrix Block and copies the given double vector into it.static MatrixBlockconvertToMatrixBlock(int[][] data)Converts an Integer matrix to an MatrixBlockstatic MatrixBlockconvertToMatrixBlock(HashMap<MatrixIndexes,Double> map)static MatrixBlockconvertToMatrixBlock(HashMap<MatrixIndexes,Double> map, int rlen, int clen)NOTE: this method also ensures the specified matrix dimensionsstatic MatrixBlockconvertToMatrixBlock(org.apache.commons.math3.linear.RealMatrix rm)static MatrixBlockconvertToMatrixBlock(CTableMap map)static MatrixBlockconvertToMatrixBlock(CTableMap map, int rlen, int clen)NOTE: this method also ensures the specified matrix dimensionsstatic MatrixBlockconvertToMatrixBlock(FrameBlock frame)Converts a frame block with arbitrary schema into a matrix block.static MatrixBlock[]convertToMatrixBlockPartitions(MatrixBlock mb, boolean colwise)static String[][]convertToStringFrame(FrameBlock frame)Converts a frame block with arbitrary schema into a two dimensional string array.static TensorBlockconvertToTensorBlock(MatrixBlock mb, Types.ValueType vt, boolean toBasicTensor)static int[]convertVectorToIndexList(MatrixBlock mb)static voidcopyToDoubleVector(MatrixBlock mb, double[] dest, int destPos)static int[]getTensorDimensions(ExecutionContext ec, CPOperand dims)static MatrixBlockreadMatrixFromHDFS(String dir, Types.FileFormat fmt, long rlen, long clen, int blen)static MatrixBlockreadMatrixFromHDFS(String dir, Types.FileFormat fmt, long rlen, long clen, int blen, boolean localFS)static MatrixBlockreadMatrixFromHDFS(String dir, Types.FileFormat fmt, long rlen, long clen, int blen, long expectedNnz)static MatrixBlockreadMatrixFromHDFS(String dir, Types.FileFormat fmt, long rlen, long clen, int blen, long expectedNnz, boolean localFS)static MatrixBlockreadMatrixFromHDFS(String dir, Types.FileFormat fmt, long rlen, long clen, int blen, long expectedNnz, FileFormatProperties formatProperties)static MatrixBlockreadMatrixFromHDFS(ReadProperties prop)Core method for reading matrices in format textcell, matrixmarket, binarycell, or binaryblock from HDFS into main memory.static TensorBlockreadTensorFromHDFS(String dir, Types.FileFormat fmt, long[] dims, int blen, Types.ValueType[] schema)static BitSettoBitSet(double[] data)static double[]toDouble(float[] data)static double[]toDouble(int[] data)static double[]toDouble(long[] data)static double[]toDouble(String[] data)static double[]toDouble(BitSet data, int len)static float[]toFloat(double[] data)static int[]toInt(double[] data)static long[]toLong(double[] data)static String[]toString(double[] data)static StringtoString(TensorBlock tb)static StringtoString(TensorBlock tb, boolean sparse, String separator, String lineseparator, String leftBorder, String rightBorder, int rowsToPrint, int colsToPrint, int decimal)Returns a string representation of a tensorstatic StringtoString(ListObject list, int rows, int cols, boolean sparse, String separator, String lineSeparator, int rowsToPrint, int colsToPrint, int decimal)static StringtoString(FrameBlock fb)static StringtoString(FrameBlock fb, boolean sparse, String separator, String lineseparator, int rowsToPrint, int colsToPrint, int decimal)static StringtoString(MatrixBlock mb)static StringtoString(MatrixBlock mb, boolean sparse, String separator, String lineseparator, int rowsToPrint, int colsToPrint, int decimal)Returns a string representation of a matrixstatic voidwriteMatrixToHDFS(MatrixBlock mat, String dir, Types.FileFormat fmt, DataCharacteristics dc)static voidwriteMatrixToHDFS(MatrixBlock mat, String dir, Types.FileFormat fmt, DataCharacteristics dc, int replication, FileFormatProperties formatProperties)static voidwriteMatrixToHDFS(MatrixBlock mat, String dir, Types.FileFormat fmt, DataCharacteristics dc, int replication, FileFormatProperties formatProperties, boolean diag)static voidwriteTensorToHDFS(TensorBlock tensor, String dir, Types.FileFormat fmt, DataCharacteristics dc)
 
- 
- 
- 
Method Detail- 
writeMatrixToHDFSpublic static void writeMatrixToHDFS(MatrixBlock mat, String dir, Types.FileFormat fmt, DataCharacteristics dc) throws IOException - Throws:
- IOException
 
 - 
writeMatrixToHDFSpublic static void writeMatrixToHDFS(MatrixBlock mat, String dir, Types.FileFormat fmt, DataCharacteristics dc, int replication, FileFormatProperties formatProperties) throws IOException - Throws:
- IOException
 
 - 
writeMatrixToHDFSpublic static void writeMatrixToHDFS(MatrixBlock mat, String dir, Types.FileFormat fmt, DataCharacteristics dc, int replication, FileFormatProperties formatProperties, boolean diag) throws IOException - Throws:
- IOException
 
 - 
writeTensorToHDFSpublic static void writeTensorToHDFS(TensorBlock tensor, String dir, Types.FileFormat fmt, DataCharacteristics dc) throws IOException - Throws:
- IOException
 
 - 
readMatrixFromHDFSpublic static MatrixBlock readMatrixFromHDFS(String dir, Types.FileFormat fmt, long rlen, long clen, int blen, boolean localFS) throws IOException - Throws:
- IOException
 
 - 
readMatrixFromHDFSpublic static MatrixBlock readMatrixFromHDFS(String dir, Types.FileFormat fmt, long rlen, long clen, int blen) throws IOException - Throws:
- IOException
 
 - 
readMatrixFromHDFSpublic static MatrixBlock readMatrixFromHDFS(String dir, Types.FileFormat fmt, long rlen, long clen, int blen, long expectedNnz) throws IOException - Throws:
- IOException
 
 - 
readMatrixFromHDFSpublic static MatrixBlock readMatrixFromHDFS(String dir, Types.FileFormat fmt, long rlen, long clen, int blen, long expectedNnz, boolean localFS) throws IOException - Throws:
- IOException
 
 - 
readMatrixFromHDFSpublic static MatrixBlock readMatrixFromHDFS(String dir, Types.FileFormat fmt, long rlen, long clen, int blen, long expectedNnz, FileFormatProperties formatProperties) throws IOException - Throws:
- IOException
 
 - 
readTensorFromHDFSpublic static TensorBlock readTensorFromHDFS(String dir, Types.FileFormat fmt, long[] dims, int blen, Types.ValueType[] schema) throws IOException - Throws:
- IOException
 
 - 
readMatrixFromHDFSpublic static MatrixBlock readMatrixFromHDFS(ReadProperties prop) throws IOException Core method for reading matrices in format textcell, matrixmarket, binarycell, or binaryblock from HDFS into main memory. For expected dense matrices we directly copy value- or block-at-a-time into the target matrix. In contrast, for sparse matrices, we append (column-value)-pairs and do a final sort if required in order to prevent large reorg overheads and increased memory consumption in case of unordered inputs. DENSE MxN input: * best/average/worst: O(M*N) SPARSE MxN input * best (ordered, or binary block w/ clen<=blen): O(M*N) * average (unordered): O(M*N*log(N)) * worst (descending order per row): O(M * N^2) NOTE: providing an exact estimate of 'expected sparsity' can prevent a full copy of the result matrix block (required for changing sparse->dense, or vice versa)- Parameters:
- prop- read properties
- Returns:
- matrix block
- Throws:
- IOException- if IOException occurs
 
 - 
convertToDoubleMatrixpublic static double[][] convertToDoubleMatrix(MatrixBlock mb) Creates a two-dimensional double matrix of the input matrix block.- Parameters:
- mb- matrix block
- Returns:
- 2d double array
 
 - 
convertToBooleanVectorpublic static boolean[] convertToBooleanVector(MatrixBlock mb) 
 - 
convertVectorToIndexListpublic static int[] convertVectorToIndexList(MatrixBlock mb) 
 - 
convertToIntVectorpublic static int[] convertToIntVector(MatrixBlock mb) 
 - 
convertToLongVectorpublic static long[] convertToLongVector(MatrixBlock mb) 
 - 
convertToDenseBlockpublic static DenseBlock convertToDenseBlock(MatrixBlock mb) 
 - 
convertToDenseBlockpublic static DenseBlock convertToDenseBlock(MatrixBlock mb, boolean deep) 
 - 
convertToDoubleVectorpublic static double[] convertToDoubleVector(MatrixBlock mb) 
 - 
convertToDoubleVectorpublic static double[] convertToDoubleVector(MatrixBlock mb, boolean deep) 
 - 
convertToDoubleVectorpublic static double[] convertToDoubleVector(MatrixBlock mb, boolean deep, boolean allowNull) 
 - 
convertToDoubleListpublic static List<Double> convertToDoubleList(MatrixBlock mb) 
 - 
convertToMatrixBlockpublic static MatrixBlock convertToMatrixBlock(double[][] data) Creates a dense Matrix Block and copies the given double matrix into it.- Parameters:
- data- 2d double array
- Returns:
- matrix block
 
 - 
convertToMatrixBlockpublic static MatrixBlock convertToMatrixBlock(int[][] data) Converts an Integer matrix to an MatrixBlock- Parameters:
- data- Int matrix input that is converted to double MatrixBlock
- Returns:
- The matrixBlock constructed.
 
 - 
convertToMatrixBlockpublic static MatrixBlock convertToMatrixBlock(double[] data, boolean columnVector) Creates a dense Matrix Block and copies the given double vector into it.- Parameters:
- data- double array
- columnVector- if true, create matrix with single column. if false, create matrix with single row
- Returns:
- matrix block
 
 - 
convertToMatrixBlockpublic static MatrixBlock convertToMatrixBlock(HashMap<MatrixIndexes,Double> map) 
 - 
convertToMatrixBlockpublic static MatrixBlock convertToMatrixBlock(HashMap<MatrixIndexes,Double> map, int rlen, int clen) NOTE: this method also ensures the specified matrix dimensions- Parameters:
- map- map of matrix index keys and double values
- rlen- number of rows
- clen- number of columns
- Returns:
- matrix block
 
 - 
convertToMatrixBlockpublic static MatrixBlock convertToMatrixBlock(CTableMap map) 
 - 
convertToMatrixBlockpublic static MatrixBlock convertToMatrixBlock(CTableMap map, int rlen, int clen) NOTE: this method also ensures the specified matrix dimensions- Parameters:
- map- ?
- rlen- number of rows
- clen- number of columns
- Returns:
- matrix block
 
 - 
convertToMatrixBlockpublic static MatrixBlock convertToMatrixBlock(FrameBlock frame) Converts a frame block with arbitrary schema into a matrix block. Since matrix block only supports value type double, we do a best effort conversion of non-double types which might result in errors for non-numerical data.- Parameters:
- frame- frame block
- Returns:
- matrix block
 
 - 
convertToStringFramepublic static String[][] convertToStringFrame(FrameBlock frame) Converts a frame block with arbitrary schema into a two dimensional string array.- Parameters:
- frame- frame block
- Returns:
- 2d string array
 
 - 
convertToFrameBlockpublic static FrameBlock convertToFrameBlock(String[][] data) Converts a two dimensions string array into a frame block of value type string. If the given array is null or of length 0, we return an empty frame block.- Parameters:
- data- 2d string array
- Returns:
- frame block
 
 - 
convertToFrameBlockpublic static FrameBlock convertToFrameBlock(String[][] data, Types.ValueType[] schema) 
 - 
convertToFrameBlockpublic static FrameBlock convertToFrameBlock(String[][] data, Types.ValueType[] schema, String[] colnames) 
 - 
convertToFrameBlockpublic static FrameBlock convertToFrameBlock(MatrixBlock mb) Converts a matrix block into a frame block of value type double.- Parameters:
- mb- matrix block
- Returns:
- frame block of type double
 
 - 
convertToFrameBlockpublic static FrameBlock convertToFrameBlock(MatrixBlock mb, Types.ValueType vt) Converts a matrix block into a frame block of a given value type.- Parameters:
- mb- matrix block
- vt- value type
- Returns:
- frame block
 
 - 
convertToFrameBlockpublic static FrameBlock convertToFrameBlock(MatrixBlock mb, Types.ValueType[] schema) 
 - 
convertToTensorBlockpublic static TensorBlock convertToTensorBlock(MatrixBlock mb, Types.ValueType vt, boolean toBasicTensor) 
 - 
convertToMatrixBlockPartitionspublic static MatrixBlock[] convertToMatrixBlockPartitions(MatrixBlock mb, boolean colwise) 
 - 
convertToArray2DRowRealMatrixpublic static org.apache.commons.math3.linear.Array2DRowRealMatrix convertToArray2DRowRealMatrix(MatrixBlock mb) Helper method that converts SystemDS matrix variable (varname) into a Array2DRowRealMatrix format, which is useful in invoking Apache CommonsMath.- Parameters:
- mb- matrix object
- Returns:
- matrix as a commons-math3 Array2DRowRealMatrix
 
 - 
convertToBlockRealMatrixpublic static org.apache.commons.math3.linear.BlockRealMatrix convertToBlockRealMatrix(MatrixBlock mb) 
 - 
convertToMatrixBlockpublic static MatrixBlock convertToMatrixBlock(org.apache.commons.math3.linear.RealMatrix rm) 
 - 
copyToDoubleVectorpublic static void copyToDoubleVector(MatrixBlock mb, double[] dest, int destPos) 
 - 
toStringpublic static String toString(MatrixBlock mb) 
 - 
toStringpublic static String toString(MatrixBlock mb, boolean sparse, String separator, String lineseparator, int rowsToPrint, int colsToPrint, int decimal) Returns a string representation of a matrix- Parameters:
- mb- matrix block
- sparse- if true, string will contain a table with row index, col index, value (where value != 0.0) otherwise it will be a rectangular string with all values of the matrix block
- separator- Separator string between each element in a row, or between the columns in sparse format
- lineseparator- Separator string between each row
- rowsToPrint- maximum number of rows to print, -1 for all
- colsToPrint- maximum number of columns to print, -1 for all
- decimal- number of decimal places to print, -1 for default
- Returns:
- matrix as a string
 
 - 
toStringpublic static String toString(TensorBlock tb) 
 - 
toStringpublic static String toString(TensorBlock tb, boolean sparse, String separator, String lineseparator, String leftBorder, String rightBorder, int rowsToPrint, int colsToPrint, int decimal) Returns a string representation of a tensor- Parameters:
- tb- tensor block
- sparse- if true, string will contain a table with row index, col index, value (where value != 0.0) otherwise it will be a rectangular string with all values of the tensor block
- separator- Separator string between each element in a row, or between the columns in sparse format
- lineseparator- Separator string between each row
- leftBorder- Characters placed at the start of a new dimension level
- rightBorder- Characters placed at the end of a new dimension level
- rowsToPrint- maximum number of rows to print, -1 for all
- colsToPrint- maximum number of columns to print, -1 for all
- decimal- number of decimal places to print, -1 for default
- Returns:
- tensor as a string
 
 - 
toStringpublic static String toString(FrameBlock fb) 
 - 
toStringpublic static String toString(FrameBlock fb, boolean sparse, String separator, String lineseparator, int rowsToPrint, int colsToPrint, int decimal) 
 - 
toStringpublic static String toString(ListObject list, int rows, int cols, boolean sparse, String separator, String lineSeparator, int rowsToPrint, int colsToPrint, int decimal) 
 - 
getTensorDimensionspublic static int[] getTensorDimensions(ExecutionContext ec, CPOperand dims) 
 - 
toDoublepublic static double[] toDouble(float[] data) 
 - 
toDoublepublic static double[] toDouble(long[] data) 
 - 
toDoublepublic static double[] toDouble(int[] data) 
 - 
toDoublepublic static double[] toDouble(BitSet data, int len) 
 - 
toDoublepublic static double[] toDouble(String[] data) 
 - 
toFloatpublic static float[] toFloat(double[] data) 
 - 
toIntpublic static int[] toInt(double[] data) 
 - 
toLongpublic static long[] toLong(double[] data) 
 - 
toBitSetpublic static BitSet toBitSet(double[] data) 
 - 
toStringpublic static String[] toString(double[] data) 
 
- 
 
-