[SOLVED] proper syntax for grep "a[;$]"

masavini · 05-23-2012, 06:45 PM

hi,
how to put "endline" in a regexp charset?
i mean, something like:

Code:

$ echo "a" | grep "a[;$]"
$ echo "a" | grep "a[;\$]"
$ echo "a" | grep "a[;\n]"

thanks you so much, always...

jhwilliams · 05-23-2012, 07:00 PM

You don't; the line ending isn't a possibility, it's a certainty. All that matters is what happens before it.

If you omit the '$' at the end, there could be additional text you're not matching. Putting a '$' at the end means there is not additional line text to match.

grep only operates on single lines -- you'd need to use something like awk to operate across newlines.

masavini · 05-23-2012, 07:30 PM

solved in a "rude" way:

Code:

$ echo "a" | sed 's/$/ /' | grep "a[; ]" | sed 's/ $//'
a

ugly, but working for me...

uhelp · 05-23-2012, 07:33 PM

allbeit correct, what jhwilliams said, there are situations where you want that anyway.

Try this:

Code:

echo $'\n'
echo $"\n"
printf $'\n'
printf $"\n"

And your solution does not, what you have described.
It works differently.
The "$" in sed means "a virtual non-existing character" meaning end of the line. There is no such thing like a newline...

David the H. · 05-24-2012, 05:23 PM

Quote:

Originally Posted by masavini

solved in a "rude" way:

Code:

$ echo "a" | sed 's/$/ /' | grep "a[; ]" | sed 's/ $//'
a

ugly, but working for me...

Could you explain exactly what your purpose is for doing this? If you gave us some actual details about what you want to do, we could perhaps come up with something less "rude" for you.

Perhaps you want something more like this?

Code:

grep "a[; ]*$"

That's "a", followed by an optional (*) space or semicolon, followed by the line ending anchor. If it's not the newline you're worried about, but the word ending (i.e. it can appear anywhere in the line), then use "\>", the word-ending anchor, instead.

And I hope you realize that it's almost never necessary to use grep and sed together like this. sed can do line filtering on its own.

David the H. · 05-24-2012, 05:30 PM

BTW, uhelp, FYI, your use of $"" has no significance here. Despite the visual appearance, it has no direct relationship to $''. It's for setting up strings that can be translated according to different locales:

http://mywiki.wooledge.org/BashFAQ/098

masavini · 05-26-2012, 03:13 AM

ok... i have a file with sequences of [A-Z] or [0-9]:
DV 2000 ACER-TRAVELMATE 44
DV 6000-ACER TRAVELMATE 55
DV/9000 ACER TRAVELMATE 77

i want to grep within these lines with these rules:
if the test string is a sequence of letters, then grep "[^A-Z]$testString[^A-Z]"
if the test string is a number, then grep "[^0-9]$testString[^0-9]"

in this way test strings "DV", "44", "55" and "77" won't give any grep output, because beginning of line is not [^A-Z] and end of line is not [^0-9].
so i need a way to tell grep to include beginning and end of line to the admitted characters.

what i do now is simply add a space at the first and last positions of the string. so the [^A-Z] and [^0-9] conditions are satisfied...
is there a smarter way to do the same?

pixellany · 05-26-2012, 03:43 AM

Quote:

Originally Posted by masavini

ok...

i want to grep within these lines with these rules:
if the test string is a sequence of letters, then grep "[^A-Z]$testString[^A-Z]"
if the test string is a number, then grep "[^0-9]$testString[^0-9]"

this is very confusing---it implies that you need to first test the "test string", and then build the grep syntax based on that.

How about just telling us the desired output after operating on various lines in the file?---i.e. show us an input file, and then the desired output.

masavini · 05-26-2012, 03:53 AM

this is a running example:

Code:

#!/bin/bash
searchEngine () { # "$string" "$/path/to/file"
	if [[ -z $2 ]]; then
		echo "function usage:" > $global__functionErrorLog
		echo "$FUNCNAME \$item "\$string" \"\$/path/to/file\"" >> $global__functionErrorLog
		functionErrorNotify $FUNCNAME $BASH_SOURCE
	fi
	
	_query=$(echo "$1" | sed -e 's/[^0-9A-Za-z]/ /g' -e 's_\([0-9]\)\([A-Za-z]\)_\1 \2_g' -e 's_\([A-Za-z]\)\([0-9]\)_\1 \2_g')
	
	if [[ ! -e $2 ]]; then
		echo "file $2 does not exist" > $global__functionErrorLog
		functionErrorNotify $FUNCNAME $BASH_SOURCE
	fi
	
	_out=$(sed -e 's/^/ /g' -e 's/$/ /g' $2)

	for _string in $_query; do
		if [[ $_string =~ [0-9] ]]; then
			_sep="[^0-9]"
		else
			_sep="[^A-Z]"
		fi
		_out=$(echo "$_out" | grep -i "${_sep}${_string}${_sep}")
	done
	
	echo "$_out" | sed -e 's/^ //g' -e 's/ $//g'
}

echo "DV 2000 ACER-TRAVELMATE 44" > file1
echo "DV 6000-ACER TRAVELMATE 55" >> file1
echo "DV/9000 ACER TRAVELMATE 77" >> file1

searchEngine "ACER DV2000" file1

pixellany · 05-26-2012, 04:13 AM

If all that code was intended to answer my question, then I'm afraid you lost me.........

I just asked for some sample lines, and the desired output from the command syntax---eg:

Code:

line       desired output
ABC         A
123         1
ABC123      A1

etc

masavini · 05-26-2012, 04:30 AM

the code i posted is a working script...
you can launch it an see what it does...

i'd like to do exactly the same thing, but with simplier code: as you can see, now $_out needs to be processed with sed: once to add a space at the beginning and in the end of every line, and once to remove those spaces before displaying output...

this could be obtained by letting $_sep contain "beginning and end of lines"...

pixellany · 05-26-2012, 04:53 AM

Quote:

Originally Posted by masavini

the code i posted is a working script...
you can launch it an see what it does...

Sorry--I'm not going to do that.....

You've now marked this thread "SOLVED", so you don't need to answer my simple question.....

masavini · 05-27-2012, 04:49 PM

sorry, maybe i didn't explain properly...
i believe forums are made to let people who knows more help people who don't know or know less. i surely belong to one of these last categories, and i will never complain anyone for anything. i hardly realize how such a lot of wonderful helpers may exist. so thanks, thanks and thanks again. always.

that said, the thread has been marked as solved because i was told there is no solution other than my "ugly fix"... the fix works, but since i feel like a baby out of the greatest cake shop of the world, i'd REALLY like to learn if my fix is the best that can be done... and when you interested in after the thread was closed, i smelled the flavour of one of those wonderful cakes...

i didn't post the expected output because i thought it wouldn't have helped. and if i was wrong, posting the whole code would have helped you much more. but it seems like i was wrong twice...

the function is used to simulate how a search engine works.
if i search for a string, the text is plitted into the minimum number of substrings only made of numbers or letters: "ACER DV2001EL" is plitted into the substrings "ACER DV 2001 EL". other chars ([-/ ;,.-]) are interpreted as substrings separators. so the string "ACER DV2-001EL" is plitted into the substrings "ACER DV 2 001 EL".
in my code, this is done by:

Code:

_query=$(echo "$1" | sed -e 's/[^0-9A-Za-z]/ /g' -e 's_\([0-9]\)\([A-Za-z]\)_\1 \2_g' -e 's_\([A-Za-z]\)\([0-9]\)_\1 \2_g')

after that, the search engine looks for the lines in the input file that contain all of the substrings. but every substring must not preceeded or followed by a character of the same charset. if the input file contains the line "COMPUTER ACER DV2001EL NEW", the second search string ("ACER DV2-001EL") gives no results, since the substring "2" is followed by "0" and does not match.
in my code this is done adding a custom "separator" before and after every substring:

Code:

for _string in $_query; do
		if [[ $_string =~ [0-9] ]]; then
			_sep="[^0-9]"
		else
			_sep="[^A-Z]"
		fi
		_out=$(echo "$_out" | grep -i "${_sep}${_string}${_sep}")
	done

this works fine in the previous example, but would not work if one of the substrings is the first or the last "word" of the line in the input file. if the line is "ACER DV2001EL NEW", the search string "ACER DV2001EL" would give no results, since grep looks for a NON [A-Z] character before the substring "ACER".

i guess it could be useful for many purposes, so i'm asking to anyone who knows more than me (and i know YOU are a LOT

) if this can be done in a better way than adding a space at the beginning and in the end of every line of the input file, so that grep finds a separator before and after each substring.

thank you, always

whizje · 05-27-2012, 05:19 PM

Code:

echo "ACER DV2001EL NEW" | grep -w "ACER DV2001EL"
ACER DV2001EL NEW
or
bash-4.2$ echo "ACER DV2001EL NEW" | grep -wo "ACER DV2001EL"
ACER DV2001EL
bash-4.2$ echo "ACER DV2001EL3 NEW" | grep -wo "ACER DV2001EL"
bash-4.2$ echo "ACER DV2001EL3 NEW" | grep -o "ACER DV2001EL"
ACER DV2001EL