LinuxQuestions.org
Latest LQ Deal: Latest LQ Deals
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 05-23-2012, 06:45 PM   #1
masavini
Member
 
Registered: Jun 2008
Posts: 285

Rep: Reputation: 6
proper syntax for grep "a[;$]" - endline in charset...


hi,
how to put "endline" in a regexp charset?
i mean, something like:
Code:
$ echo "a" | grep "a[;$]"
$ echo "a" | grep "a[;\$]"
$ echo "a" | grep "a[;\n]"
thanks you so much, always...
 
Old 05-23-2012, 07:00 PM   #2
jhwilliams
Senior Member
 
Registered: Apr 2007
Location: Portland, OR
Distribution: Debian, Android, LFS
Posts: 1,168

Rep: Reputation: 211Reputation: 211Reputation: 211
You don't; the line ending isn't a possibility, it's a certainty. All that matters is what happens before it.

If you omit the '$' at the end, there could be additional text you're not matching. Putting a '$' at the end means there is not additional line text to match.

grep only operates on single lines -- you'd need to use something like awk to operate across newlines.
 
Old 05-23-2012, 07:30 PM   #3
masavini
Member
 
Registered: Jun 2008
Posts: 285

Original Poster
Rep: Reputation: 6
solved in a "rude" way:

Code:
$ echo "a" | sed 's/$/ /' | grep "a[; ]" | sed 's/ $//'
a
ugly, but working for me...
 
Old 05-23-2012, 07:33 PM   #4
uhelp
Member
 
Registered: Nov 2011
Location: Germany, Bavaria, Nueremberg area
Distribution: openSUSE, Debian, LFS
Posts: 205

Rep: Reputation: 43
allbeit correct, what jhwilliams said, there are situations where you want that anyway.

Try this:
Code:
echo $'\n'
echo $"\n"
printf $'\n'
printf $"\n"
And your solution does not, what you have described.
It works differently.
The "$" in sed means "a virtual non-existing character" meaning end of the line. There is no such thing like a newline...

Last edited by uhelp; 05-23-2012 at 07:38 PM.
 
Old 05-24-2012, 05:23 PM   #5
David the H.
Bash Guru
 
Registered: Jun 2004
Location: Osaka, Japan
Distribution: Arch + Xfce
Posts: 6,852

Rep: Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037
Quote:
Originally Posted by masavini View Post
solved in a "rude" way:

Code:
$ echo "a" | sed 's/$/ /' | grep "a[; ]" | sed 's/ $//'
a
ugly, but working for me...
Could you explain exactly what your purpose is for doing this? If you gave us some actual details about what you want to do, we could perhaps come up with something less "rude" for you.


Perhaps you want something more like this?

Code:
grep "a[; ]*$"
That's "a", followed by an optional (*) space or semicolon, followed by the line ending anchor. If it's not the newline you're worried about, but the word ending (i.e. it can appear anywhere in the line), then use "\>", the word-ending anchor, instead.


And I hope you realize that it's almost never necessary to use grep and sed together like this. sed can do line filtering on its own.
 
Old 05-24-2012, 05:30 PM   #6
David the H.
Bash Guru
 
Registered: Jun 2004
Location: Osaka, Japan
Distribution: Arch + Xfce
Posts: 6,852

Rep: Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037
BTW, uhelp, FYI, your use of $"" has no significance here. Despite the visual appearance, it has no direct relationship to $''. It's for setting up strings that can be translated according to different locales:

http://mywiki.wooledge.org/BashFAQ/098
 
Old 05-26-2012, 03:13 AM   #7
masavini
Member
 
Registered: Jun 2008
Posts: 285

Original Poster
Rep: Reputation: 6
ok... i have a file with sequences of [A-Z] or [0-9]:
DV 2000 ACER-TRAVELMATE 44
DV 6000-ACER TRAVELMATE 55
DV/9000 ACER TRAVELMATE 77

i want to grep within these lines with these rules:
if the test string is a sequence of letters, then grep "[^A-Z]$testString[^A-Z]"
if the test string is a number, then grep "[^0-9]$testString[^0-9]"

in this way test strings "DV", "44", "55" and "77" won't give any grep output, because beginning of line is not [^A-Z] and end of line is not [^0-9].
so i need a way to tell grep to include beginning and end of line to the admitted characters.

what i do now is simply add a space at the first and last positions of the string. so the [^A-Z] and [^0-9] conditions are satisfied...
is there a smarter way to do the same?
 
Old 05-26-2012, 03:43 AM   #8
pixellany
LQ Veteran
 
Registered: Nov 2005
Location: Annapolis, MD
Distribution: Mint
Posts: 17,809

Rep: Reputation: 743Reputation: 743Reputation: 743Reputation: 743Reputation: 743Reputation: 743Reputation: 743
Quote:
Originally Posted by masavini View Post
ok...

i want to grep within these lines with these rules:
if the test string is a sequence of letters, then grep "[^A-Z]$testString[^A-Z]"
if the test string is a number, then grep "[^0-9]$testString[^0-9]"
this is very confusing---it implies that you need to first test the "test string", and then build the grep syntax based on that.

How about just telling us the desired output after operating on various lines in the file?---i.e. show us an input file, and then the desired output.
 
Old 05-26-2012, 03:53 AM   #9
masavini
Member
 
Registered: Jun 2008
Posts: 285

Original Poster
Rep: Reputation: 6
this is a running example:

Code:
#!/bin/bash
searchEngine () { # "$string" "$/path/to/file"
	if [[ -z $2 ]]; then
		echo "function usage:" > $global__functionErrorLog
		echo "$FUNCNAME \$item "\$string" \"\$/path/to/file\"" >> $global__functionErrorLog
		functionErrorNotify $FUNCNAME $BASH_SOURCE
	fi
	
	_query=$(echo "$1" | sed -e 's/[^0-9A-Za-z]/ /g' -e 's_\([0-9]\)\([A-Za-z]\)_\1 \2_g' -e 's_\([A-Za-z]\)\([0-9]\)_\1 \2_g')
	
	if [[ ! -e $2 ]]; then
		echo "file $2 does not exist" > $global__functionErrorLog
		functionErrorNotify $FUNCNAME $BASH_SOURCE
	fi
	
	_out=$(sed -e 's/^/ /g' -e 's/$/ /g' $2)

	for _string in $_query; do
		if [[ $_string =~ [0-9] ]]; then
			_sep="[^0-9]"
		else
			_sep="[^A-Z]"
		fi
		_out=$(echo "$_out" | grep -i "${_sep}${_string}${_sep}")
	done
	
	echo "$_out" | sed -e 's/^ //g' -e 's/ $//g'
}

echo "DV 2000 ACER-TRAVELMATE 44" > file1
echo "DV 6000-ACER TRAVELMATE 55" >> file1
echo "DV/9000 ACER TRAVELMATE 77" >> file1

searchEngine "ACER DV2000" file1
 
Old 05-26-2012, 04:13 AM   #10
pixellany
LQ Veteran
 
Registered: Nov 2005
Location: Annapolis, MD
Distribution: Mint
Posts: 17,809

Rep: Reputation: 743Reputation: 743Reputation: 743Reputation: 743Reputation: 743Reputation: 743Reputation: 743
If all that code was intended to answer my question, then I'm afraid you lost me.........

I just asked for some sample lines, and the desired output from the command syntax---eg:

Code:
line       desired output
ABC         A
123         1
ABC123      A1

etc

Last edited by pixellany; 05-26-2012 at 04:14 AM.
 
Old 05-26-2012, 04:30 AM   #11
masavini
Member
 
Registered: Jun 2008
Posts: 285

Original Poster
Rep: Reputation: 6
the code i posted is a working script...
you can launch it an see what it does...

i'd like to do exactly the same thing, but with simplier code: as you can see, now $_out needs to be processed with sed: once to add a space at the beginning and in the end of every line, and once to remove those spaces before displaying output...

this could be obtained by letting $_sep contain "beginning and end of lines"...
 
Old 05-26-2012, 04:53 AM   #12
pixellany
LQ Veteran
 
Registered: Nov 2005
Location: Annapolis, MD
Distribution: Mint
Posts: 17,809

Rep: Reputation: 743Reputation: 743Reputation: 743Reputation: 743Reputation: 743Reputation: 743Reputation: 743
Quote:
Originally Posted by masavini View Post
the code i posted is a working script...
you can launch it an see what it does...
Sorry--I'm not going to do that.....

You've now marked this thread "SOLVED", so you don't need to answer my simple question.....
 
Old 05-27-2012, 04:49 PM   #13
masavini
Member
 
Registered: Jun 2008
Posts: 285

Original Poster
Rep: Reputation: 6
sorry, maybe i didn't explain properly...
i believe forums are made to let people who knows more help people who don't know or know less. i surely belong to one of these last categories, and i will never complain anyone for anything. i hardly realize how such a lot of wonderful helpers may exist. so thanks, thanks and thanks again. always.

that said, the thread has been marked as solved because i was told there is no solution other than my "ugly fix"... the fix works, but since i feel like a baby out of the greatest cake shop of the world, i'd REALLY like to learn if my fix is the best that can be done... and when you interested in after the thread was closed, i smelled the flavour of one of those wonderful cakes...

i didn't post the expected output because i thought it wouldn't have helped. and if i was wrong, posting the whole code would have helped you much more. but it seems like i was wrong twice...

the function is used to simulate how a search engine works.
if i search for a string, the text is plitted into the minimum number of substrings only made of numbers or letters: "ACER DV2001EL" is plitted into the substrings "ACER DV 2001 EL". other chars ([-/ ;,.-]) are interpreted as substrings separators. so the string "ACER DV2-001EL" is plitted into the substrings "ACER DV 2 001 EL".
in my code, this is done by:
Code:
_query=$(echo "$1" | sed -e 's/[^0-9A-Za-z]/ /g' -e 's_\([0-9]\)\([A-Za-z]\)_\1 \2_g' -e 's_\([A-Za-z]\)\([0-9]\)_\1 \2_g')
after that, the search engine looks for the lines in the input file that contain all of the substrings. but every substring must not preceeded or followed by a character of the same charset. if the input file contains the line "COMPUTER ACER DV2001EL NEW", the second search string ("ACER DV2-001EL") gives no results, since the substring "2" is followed by "0" and does not match.
in my code this is done adding a custom "separator" before and after every substring:
Code:
for _string in $_query; do
		if [[ $_string =~ [0-9] ]]; then
			_sep="[^0-9]"
		else
			_sep="[^A-Z]"
		fi
		_out=$(echo "$_out" | grep -i "${_sep}${_string}${_sep}")
	done
this works fine in the previous example, but would not work if one of the substrings is the first or the last "word" of the line in the input file. if the line is "ACER DV2001EL NEW", the search string "ACER DV2001EL" would give no results, since grep looks for a NON [A-Z] character before the substring "ACER".

i guess it could be useful for many purposes, so i'm asking to anyone who knows more than me (and i know YOU are a LOT ) if this can be done in a better way than adding a space at the beginning and in the end of every line of the input file, so that grep finds a separator before and after each substring.

thank you, always

Last edited by masavini; 05-27-2012 at 04:51 PM.
 
Old 05-27-2012, 05:19 PM   #14
whizje
Member
 
Registered: Sep 2008
Location: The Netherlands
Distribution: Slackware64 current
Posts: 594

Rep: Reputation: 141Reputation: 141
Code:
echo "ACER DV2001EL NEW" | grep -w "ACER DV2001EL"
ACER DV2001EL NEW
or
bash-4.2$ echo "ACER DV2001EL NEW" | grep -wo "ACER DV2001EL"
ACER DV2001EL
bash-4.2$ echo "ACER DV2001EL3 NEW" | grep -wo "ACER DV2001EL"
bash-4.2$ echo "ACER DV2001EL3 NEW" | grep -o "ACER DV2001EL"
ACER DV2001EL

Last edited by whizje; 05-27-2012 at 05:24 PM.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
how can I "cat" or "grep" a file to ignore lines starting with "#" ??? callagga Linux - Newbie 7 08-16-2013 06:58 AM
Problem "$value=`mpstat 1 1 | grep "Average"`;" Alias pipe return nothing adamlucansky Linux - General 8 09-25-2009 07:26 AM
Stupid question: if [ "$i" == `$(cat ${LOGFILESSHD} | grep "${i}" )` ] ; then frenchn00b Programming 6 05-19-2008 05:16 PM
charset "UTF-8" not supported, using "ISO8859-1". satishpatel Linux - Software 3 04-09-2004 07:11 AM
"Undeleting" data using grep, but get "grep: memory exhausted" error SammyK Linux - Software 2 03-13-2004 03:11 PM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 04:57 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration