Encode#

NAME#

Kernel::System::Encode - character encodings

DESCRIPTION#

This module will use Perl’s Encode module (Perl 5.8.0 or higher is required).

PUBLIC INTERFACE#

new()#

Don’t use the constructor directly, use the ObjectManager instead:

my $EncodeObject = $Kernel::OM->Get('Kernel::System::Encode');

Convert()#

Convert a string from one charset to another charset.

my $utf8 = $EncodeObject->Convert(
    Text => $iso_8859_1_string,
    From => 'iso-8859-1',
    To   => 'utf-8',
);

my $iso_8859_1 = $EncodeObject->Convert(
    Text => $utf-8_string,
    From => 'utf-8',
    To   => 'iso-8859-1',
);

There is also a Force => 1 option if you need to force the already converted string. And Check => 1 if the string result should be checked to be a valid string (e. g. valid utf-8 string).

Convert2CharsetInternal()#

Convert given charset into the internal used charset (utf-8). Should be used on all I/O interfaces.

my $String = $EncodeObject->Convert2CharsetInternal(
    Text => $String,
    From => $SourceCharset,
);

EncodeInput()#

By default, this function assumes incoming bytes to be well-formed UTF-8 and will set the utf8 flag to make Perl treat it as such. This Should be used on all I/O interfaces if and only if (see the warning in Encoding/_utf8_on!) data is already utf-8. It modifies the scalar or array referenced by its $What parameter in place!

$EncodeObject->EncodeInput( \$String );
$EncodeObject->EncodeInput( \@Array );

If there is a possibility that strings may not be UTF-8, simply setting the UTF-8 flag will probably lead to crashes down the road. In this case, set the $Safe argument to a true value to make the function use Encode/decode. This is a bit slower and will produce mojibake if the input is decoded UTF-8 already but will always yield safe results.

There are four possible values for $Safe:

  • undef: Backwards-compatible behavior—don’t use any safety measures, just turn on the UTF-8 flag and call it a day.

  • 1: A simple 1 will decode UTF-8 and replace malformed sequences with an escape code and the hex byte values e.g. \x{0d}

  • A coderef will be passed to Encode/decode to format your own replacement codes

  • Anything else will be interpreted as the name of an alternative charset that should be tried in case UTF-8 decoding fails, falling back to the

\x{XX} escapes as a last resort.

EncodeOutput()#

Convert utf-8 to a sequence of bytes. All possible characters have a UTF-8 representation so this function cannot fail.

This should be used in for output of utf-8 chars.

$EncodeObject->EncodeOutput( \$String );

$EncodeObject->EncodeOutput( \@Array );

ConfigureOutputFileHandle()#

switch output file handle to utf-8 output.

$EncodeObject->ConfigureOutputFileHandle( FileHandle => \*STDOUT );

EncodingIsAsciiSuperset()#

Checks if an encoding is a super-set of ASCII, that is, encodes the codepoints from 0 to 127 the same way as ASCII.

my $IsSuperset = $EncodeObject->EncodingIsAsciiSuperset(
    Encoding    => 'UTF-8',
);

FindAsciiSupersetEncoding()#

From a list of character encodings, returns the first that is a super-set of ASCII. If none matches, ASCII is returned.

my $Encoding = $EncodeObject->FindAsciiSupersetEncoding(
    Encodings   => [ 'UTF-16LE', 'UTF-8' ],
);

RemoveUTF8BOM()#

Removes UTF-8 BOM (if present) from start of given string.

my $StringWithoutBOM = $EncodeObject->RemoveUTF8BOM(
    String => '<BOM>....',
);

Returns given string without BOM.