Encode#
NAME#
Kernel::System::Encode - character encodings
DESCRIPTION#
This module will use Perl’s Encode module (Perl 5.8.0 or higher is required).
PUBLIC INTERFACE#
new()#
Don’t use the constructor directly, use the ObjectManager instead:
my $EncodeObject = $Kernel::OM->Get('Kernel::System::Encode');
Convert()#
Convert a string from one charset to another charset.
my $utf8 = $EncodeObject->Convert(
Text => $iso_8859_1_string,
From => 'iso-8859-1',
To => 'utf-8',
);
my $iso_8859_1 = $EncodeObject->Convert(
Text => $utf-8_string,
From => 'utf-8',
To => 'iso-8859-1',
);
There is also a Force => 1 option if you need to force the already converted string. And Check => 1 if the string result should be checked to be a valid string (e. g. valid utf-8 string).
Convert2CharsetInternal()#
Convert given charset into the internal used charset (utf-8). Should be used on all I/O interfaces.
my $String = $EncodeObject->Convert2CharsetInternal(
Text => $String,
From => $SourceCharset,
);
EncodeInput()#
By default, this function assumes incoming bytes to be well-formed UTF-8 and
will set the utf8 flag to make Perl treat it as such. This Should be used on
all I/O interfaces if and only if (see the warning in Encoding/_utf8_on!)
data is already utf-8. It modifies the scalar or array referenced by its
$What parameter in place!
$EncodeObject->EncodeInput( \$String );
$EncodeObject->EncodeInput( \@Array );
If there is a possibility that strings may not be UTF-8, simply setting the
UTF-8 flag will probably lead to crashes down the road. In this case, set the
$Safe argument to a true value to make the function use Encode/decode.
This is a bit slower and will produce mojibake if the input is
decoded UTF-8 already but will always yield safe results.
There are four possible values for $Safe:
undef: Backwards-compatible behavior—don’t use any safety measures, just turn on the UTF-8 flag and call it a day.1: A simple 1 will decode UTF-8 and replace malformed sequences with an escape code and the hex byte values e.g.\x{0d}A coderef will be passed to Encode/decode to format your own replacement codes
Anything else will be interpreted as the name of an alternative charset that should be tried in case UTF-8 decoding fails, falling back to the
\x{XX} escapes as a last resort.
EncodeOutput()#
Convert utf-8 to a sequence of bytes. All possible characters have a UTF-8 representation so this function cannot fail.
This should be used in for output of utf-8 chars.
$EncodeObject->EncodeOutput( \$String );
$EncodeObject->EncodeOutput( \@Array );
ConfigureOutputFileHandle()#
switch output file handle to utf-8 output.
$EncodeObject->ConfigureOutputFileHandle( FileHandle => \*STDOUT );
EncodingIsAsciiSuperset()#
Checks if an encoding is a super-set of ASCII, that is, encodes the codepoints from 0 to 127 the same way as ASCII.
my $IsSuperset = $EncodeObject->EncodingIsAsciiSuperset(
Encoding => 'UTF-8',
);
FindAsciiSupersetEncoding()#
From a list of character encodings, returns the first that
is a super-set of ASCII. If none matches, ASCII is returned.
my $Encoding = $EncodeObject->FindAsciiSupersetEncoding(
Encodings => [ 'UTF-16LE', 'UTF-8' ],
);
RemoveUTF8BOM()#
Removes UTF-8 BOM (if present) from start of given string.
my $StringWithoutBOM = $EncodeObject->RemoveUTF8BOM(
String => '<BOM>....',
);
Returns given string without BOM.