PHP allows treating a string as an array, so that you can use indexing syntax to
get or set a single position in the string:
$foo = 'bar';
$foo = 'z';
This seems handy but you should never do it.
The reason to avoid string indexing like this in PHP is that PHP strings are not
multibyte character strings, they’re just bytes.
You won’t notice this with plain ASCII strings like the above, as each character
happens to be one byte anyway so they’re equivalent.
As soon as you get a multi-byte string, which you will as everything is UTF-8
and internationalised now, that kind of naive string indexing will break.
// You might expect to get '葛' here, but you won't as it's multi-byte.
// Instead you get the mangled '�', which is the first byte of the UTF-8
// encoding of '葛'.
The correct way to handle this in PHP is to never use naive string indexing, and
instead use the
mb_ functions, in this case
echo mb_substr('葛修远', 0, 1);
Unfortunately neither of the most popular linters for PHP, PHPMD and PHPCS, seem
to have standard rules for banning naive string indexing, as it would be handy
to automatically reject it in a codebase.
Avoid naive string indexing in PHP