There are two main charsets:
UTF-8
(Unicode) and ISO-8859-1
. UTF-8
lets one handle more characters than ISO-8859-1
(such as arabic and chinese characters).HTML files that should handle unicode characters must have this set in the header:
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
Here one could also use 'iso-8859-1' instead.
If you need to convert between
ISO-8859-1
and UTF-8
in PHP the functions utf8_encode
and utf8_decode
are useful. utf8_encode
will convert from ISO-8859-1
to UTF-8
and utf8_decode
from UTF-8
to ISO-8859-1
. The use of the functions is:
$utf8string = utf8_encode( $iso-8859-1string );
$iso-8859-1string = utf8_decode( $utf8string );
In addition to
UTF-8
there are two other Unicode standards UTF-16
and UTF-32
. The difference is how many bytes are used to store a character and thus how many different characters can be stored. UTF-32
is in little use, and so is UTF-16
, but UTF-16
is used in more places.To convert between these Unicode charsets use
iconv
. It is used like this:$utf16string = iconv("utf-8", "utf-16", $utf8str);
or
$utf8string = iconv("utf-16", "utf-8", $utf16str);
incov
can also be used to convert from other character sets such as ISO-8859-1
like this:$utf16string = iconv("iso-8859-1", "utf-16", $iso-8859-1string);
Solutions found at: http://www.php.net/manual/en/function.utf8-decode.php, http://php.net/manual/en/function.utf8-encode.php, http://www.experts-exchange.com/Web_Development/Web_Languages-Standards/PHP/Q_21916597.html, and http://www.php.net/manual/en/function.iconv.php.