Package groovy.util
Class CharsetToolkit
java.lang.Object
groovy.util.CharsetToolkit
public class CharsetToolkit
extends java.lang.Object
Utility class to guess the encoding of a given text file.
 
Unicode files encoded in UTF-16 (low or big endian) or UTF-8 files with a Byte Order Marker are correctly discovered. For UTF-8 files with no BOM, if the buffer is wide enough, the charset should also be discovered.
A byte buffer of 4KB is used to be able to guess the encoding.
Usage:
 CharsetToolkit toolkit = new CharsetToolkit(file);
 // guess the encoding
 Charset guessedCharset = toolkit.getCharset();
 // create a reader with the correct charset
 BufferedReader reader = toolkit.getReader();
 // read the file content
 String line;
 while ((line = br.readLine())!= null)
 {
     System.out.println(line);
 }
 - 
Constructor SummaryConstructors Constructor Description CharsetToolkit(java.io.File file)Constructor of theCharsetToolkitutility class.
- 
Method SummaryModifier and Type Method Description static java.nio.charset.Charset[]getAvailableCharsets()Retrieves all the availableCharsets on the platform, among which the defaultcharset.java.nio.charset.CharsetgetCharset()java.nio.charset.CharsetgetDefaultCharset()Retrieves the default Charsetstatic java.nio.charset.CharsetgetDefaultSystemCharset()Retrieve the default charset of the system.booleangetEnforce8Bit()Gets the enforce8Bit flag, in case we do not want to ever get a US-ASCII encoding.java.io.BufferedReadergetReader()Gets aBufferedReader(indeed aLineNumberReader) from theFilespecified in the constructor ofCharsetToolkitusing the charset discovered or the default charset if an 8-bitCharsetis encountered.booleanhasUTF16BEBom()Has a Byte Order Marker for UTF-16 Big Endian (utf-16 and ucs-2).booleanhasUTF16LEBom()Has a Byte Order Marker for UTF-16 Low Endian (ucs-2le, ucs-4le, and ucs-16le).booleanhasUTF8Bom()Has a Byte Order Marker for UTF-8 (Used by Microsoft's Notepad and other editors).voidsetDefaultCharset(java.nio.charset.Charset defaultCharset)Defines the defaultCharsetused in case the buffer represents an 8-bitCharset.voidsetEnforce8Bit(boolean enforce)If US-ASCII is recognized, enforce to return the default encoding, rather than US-ASCII.Methods inherited from class java.lang.Objectclone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
- 
Constructor Details- 
CharsetToolkitpublic CharsetToolkit(java.io.File file) throws java.io.IOExceptionConstructor of theCharsetToolkitutility class.- Parameters:
- file- of which we want to know the encoding.
- Throws:
- java.io.IOException
 
 
- 
- 
Method Details- 
setDefaultCharsetpublic void setDefaultCharset(java.nio.charset.Charset defaultCharset)Defines the defaultCharsetused in case the buffer represents an 8-bitCharset.- Parameters:
- defaultCharset- the default- Charsetto be returned if an 8-bit- Charsetis encountered.
 
- 
getCharsetpublic java.nio.charset.Charset getCharset()
- 
setEnforce8Bitpublic void setEnforce8Bit(boolean enforce)If US-ASCII is recognized, enforce to return the default encoding, rather than US-ASCII. It might be a file without any special character in the range 128-255, but that may be or become a file encoded with the defaultcharsetrather than US-ASCII.- Parameters:
- enforce- a boolean specifying the use or not of US-ASCII.
 
- 
getEnforce8Bitpublic boolean getEnforce8Bit()Gets the enforce8Bit flag, in case we do not want to ever get a US-ASCII encoding.- Returns:
- a boolean representing the flag of use of US-ASCII.
 
- 
getDefaultCharsetpublic java.nio.charset.Charset getDefaultCharset()Retrieves the default Charset
- 
getDefaultSystemCharsetpublic static java.nio.charset.Charset getDefaultSystemCharset()Retrieve the default charset of the system.- Returns:
- the default Charset.
 
- 
hasUTF8Bompublic boolean hasUTF8Bom()Has a Byte Order Marker for UTF-8 (Used by Microsoft's Notepad and other editors).- Returns:
- true if the buffer has a BOM for UTF8.
 
- 
hasUTF16LEBompublic boolean hasUTF16LEBom()Has a Byte Order Marker for UTF-16 Low Endian (ucs-2le, ucs-4le, and ucs-16le).- Returns:
- true if the buffer has a BOM for UTF-16 Low Endian.
 
- 
hasUTF16BEBompublic boolean hasUTF16BEBom()Has a Byte Order Marker for UTF-16 Big Endian (utf-16 and ucs-2).- Returns:
- true if the buffer has a BOM for UTF-16 Big Endian.
 
- 
getReaderpublic java.io.BufferedReader getReader() throws java.io.FileNotFoundExceptionGets aBufferedReader(indeed aLineNumberReader) from theFilespecified in the constructor ofCharsetToolkitusing the charset discovered or the default charset if an 8-bitCharsetis encountered.- Returns:
- a BufferedReader
- Throws:
- java.io.FileNotFoundException- if the file is not found.
 
- 
getAvailableCharsetspublic static java.nio.charset.Charset[] getAvailableCharsets()Retrieves all the availableCharsets on the platform, among which the defaultcharset.- Returns:
- an array of Charsets.
 
 
-