Can't search special character "／" in PHP 5

Swift&Smart · 10-25-2007, 11:34 AM

Hello,everyone.

I am playing with some chinese characters and would like to find a specific character to split the text into 2 strings.For example,"你／我" will become 2 strings,str1=你,str2=我.However,it doesn't seem to react correctly if I grab the text from the web and then do the search with strpos or preg_match.It does the trick if I copy and paste those words to my php page.I doubt that is the problem with encoding.But I don't know exactly how I solve this.

Do you guys have any clue?Please drop me a line if you do.

Guttorm · 10-26-2007, 05:02 AM

Hi

Those are UTF-8, right? Then you can't use the regular strpos or regular expression functions. Take a look at
http://www.php.net/manual/en/ref.mbstring.php

For strpos, you can use mb_strpos
http://www.php.net/manual/en/function.mb-strpos.php

There is no mb_preg_match, but several other regular expression functions you can use.

PHP6 will have native unicode strings, but until then we are stuck with the mbstring functions.

Swift&Smart · 10-26-2007, 06:51 AM

Guttorm,thanks for your reply.

I have looked at the links you specified.However,It doesn't seem like I know exactly how to do it technically.My situation goes like this:

1.I retrieved certain chinese characters from a website.(It uses BIG-5 encoding).
2.Then I try to find certain string "／" but it failed.

However,as I have told,if I copy those strings to the php file and search that special character,it works out perfectly.I don'know why.If I do,I can use the same method to solve my problem.

Again,thanks for the help.

Guttorm · 10-26-2007, 07:10 AM

Hi

Maybe you need to convert the page you got from a website from BIG-5 encoding to UTF-8? I don't even know what BIG-5 is, but I've used the iconv function in similar situations to convert everything to UTF-8, before doing stuff with the mbstring functions.

http://www.php.net/manual/en/function.iconv.php

The manual says: "These are examples of character encodings that are unlikely to work with PHP: JIS, SJIS, ISO-2022-JP, BIG-5"

Maybe that a copy and paste converts it to UTF-8? To find out which encoding a text file has, you can use the "file" command in the shell.

Swift&Smart · 10-26-2007, 07:44 AM

Guttorm,thanks for your swift reply.

I have just rewritten the code like you said.Fortunately,it can detect the special character with mb_strpos after I used mb_convert_encoding to UTF-8 from BIG-5.It's better than iconv as I used it.However,it's not successfully completed the task after I used mb_split.The first part of string is successful but the second part become monster code.I guess that the mb_split is not fully successful to split out the string.As the manual said,if the split process failed,like the character cannot get all the bytes information which it needed,the output character will become monster code.

I do think the trouble of multibyte characters is the biggest problem I found in PHP.