LinuxQuestions.org
Visit Jeremy's Blog.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 10-25-2007, 11:34 AM   #1
Swift&Smart
Member
 
Registered: Jan 2003
Location: Hong Kong,China
Distribution: Slackware,OpenSUSE
Posts: 472

Rep: Reputation: 30
Can't search special character "/" in PHP 5


Hello,everyone.

I am playing with some chinese characters and would like to find a specific character to split the text into 2 strings.For example,"你/我" will become 2 strings,str1=你,str2=我.However,it doesn't seem to react correctly if I grab the text from the web and then do the search with strpos or preg_match.It does the trick if I copy and paste those words to my php page.I doubt that is the problem with encoding.But I don't know exactly how I solve this.

Do you guys have any clue?Please drop me a line if you do.
 
Old 10-26-2007, 05:02 AM   #2
Guttorm
Senior Member
 
Registered: Dec 2003
Location: Trondheim, Norway
Distribution: Debian and Ubuntu
Posts: 1,453

Rep: Reputation: 448Reputation: 448Reputation: 448Reputation: 448Reputation: 448
Hi

Those are UTF-8, right? Then you can't use the regular strpos or regular expression functions. Take a look at
http://www.php.net/manual/en/ref.mbstring.php

For strpos, you can use mb_strpos
http://www.php.net/manual/en/function.mb-strpos.php

There is no mb_preg_match, but several other regular expression functions you can use.

PHP6 will have native unicode strings, but until then we are stuck with the mbstring functions.
 
Old 10-26-2007, 06:51 AM   #3
Swift&Smart
Member
 
Registered: Jan 2003
Location: Hong Kong,China
Distribution: Slackware,OpenSUSE
Posts: 472

Original Poster
Rep: Reputation: 30
Guttorm,thanks for your reply.

I have looked at the links you specified.However,It doesn't seem like I know exactly how to do it technically.My situation goes like this:

1.I retrieved certain chinese characters from a website.(It uses BIG-5 encoding).
2.Then I try to find certain string "/" but it failed.

However,as I have told,if I copy those strings to the php file and search that special character,it works out perfectly.I don'know why.If I do,I can use the same method to solve my problem.

Again,thanks for the help.
 
Old 10-26-2007, 07:10 AM   #4
Guttorm
Senior Member
 
Registered: Dec 2003
Location: Trondheim, Norway
Distribution: Debian and Ubuntu
Posts: 1,453

Rep: Reputation: 448Reputation: 448Reputation: 448Reputation: 448Reputation: 448
Hi

Maybe you need to convert the page you got from a website from BIG-5 encoding to UTF-8? I don't even know what BIG-5 is, but I've used the iconv function in similar situations to convert everything to UTF-8, before doing stuff with the mbstring functions.

http://www.php.net/manual/en/function.iconv.php

The manual says: "These are examples of character encodings that are unlikely to work with PHP: JIS, SJIS, ISO-2022-JP, BIG-5"

Maybe that a copy and paste converts it to UTF-8? To find out which encoding a text file has, you can use the "file" command in the shell.
 
Old 10-26-2007, 07:44 AM   #5
Swift&Smart
Member
 
Registered: Jan 2003
Location: Hong Kong,China
Distribution: Slackware,OpenSUSE
Posts: 472

Original Poster
Rep: Reputation: 30
Guttorm,thanks for your swift reply.

I have just rewritten the code like you said.Fortunately,it can detect the special character with mb_strpos after I used mb_convert_encoding to UTF-8 from BIG-5.It's better than iconv as I used it.However,it's not successfully completed the task after I used mb_split.The first part of string is successful but the second part become monster code.I guess that the mb_split is not fully successful to split out the string.As the manual said,if the split process failed,like the character cannot get all the bytes information which it needed,the output character will become monster code.

I do think the trouble of multibyte characters is the biggest problem I found in PHP.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Telling people to use "Google," to "RTFM," or "Use the search feature" Ausar General 77 03-21-2010 11:26 AM
Apply "Advanced Search" options to "My LQ" searches PTrenholme LQ Suggestions & Feedback 22 03-10-2007 08:30 AM
remove folder that stats with a "special" character mago Linux - General 3 06-27-2006 04:40 PM
need help, on how to access quickly to special characters like "ñ" or "á"? Motaro Linux - Newbie 1 12-31-2003 11:53 AM
Ximian crashes when I enter "special chars" into message borbjo Linux - Software 0 01-28-2003 07:31 AM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 09:02 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration