I would like to do more explanations to my problem.
I want to match and count string like "intel", but Intel should be not in another word like "intelligence" (or "cd", not in "lcd").
If I grep use patters like "[^a-zA-Z]?intel[^a-zA-Z]?" might output ",intel", ".intel" or "@cd", "~cd". Because my document is Chinese, so any Chinese chracters beside "intel" or "cd" should fit the pattern.
What I actually want is only "intel" and "cd", but not ",intel" and ".cd", so that I could count number of lines for each patter using sort|uniq -c.
Now I use two steps to do the task:
egrep -f -o pattern1.txt document.txt >temp
egrep -f -o pattern2.txt temp|sort|uniq -c >keywords.txt
where pattern1 is the patters I described above ("[^a-zA-Z]?intel[^a-zA-Z]?"), and the pattern2 is the only core word ("intel").
I hope it can be finished in a pattern, not need to create a temp file and use two pattern files, because for any English words I need to do like this, which should consumes much resource.
Thanks for your ideas on this problem.