LinuxQuestions.org
Share your knowledge at the LQ Wiki.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 09-28-2013, 04:42 PM   #1
cosminel
Member
 
Registered: Sep 2013
Posts: 31

Rep: Reputation: Disabled
grep for pattern following the nth occurence of a character in a file


Hello everyone,

After days of searching articles, forums etc I still can't get grep to do what I want. I have some files that contain data in the following format and I am interested in the my_string and my_string_2 as shown below:

data;data;data;data;;data;my_string;my_string;data;data;data;data;;;;my_string_2;;;;etc;etc

Things to consider:
- "data" may contain anything and the lenght may vary
- as it is clearly shown the data strings are separated by ; or ;; or ;;;;
- sometimes I want grep to look for the 2nd "my_string", sometimes for "my_string_2" as they will represent user input in a script, something like: "Enter [my_string] or leave blank" and "Enter [my_string_2] or leave blank"

So basicaully I want to grep for the 2nd "my_string" or "my_string_2". The only constant, non-changing markers I have in all this is the ";" character. So what I know for sure is that after the 7th ";" the 2nd "my_string" will always follow and after the 15th ";" "my_string_3" will always follow.

Is it possible to do the above with grep?

Thank you in advance.

Last edited by cosminel; 09-28-2013 at 04:45 PM.
 
Old 09-28-2013, 04:56 PM   #2
Firerat
Senior Member
 
Registered: Oct 2008
Distribution: Debian sid
Posts: 2,683

Rep: Reputation: 783Reputation: 783Reputation: 783Reputation: 783Reputation: 783Reputation: 783Reputation: 783
I really don't understand what you are trying to do

why do you want the n'th match?

can you post your script so I can get an idea of what you want

I have a feeling you really want awk, but full(er) context will help

Code:
awk -F\; '{printf "%s %s",$8,$16}' InputFile
example on posting code

[code]
awk -F\; '{printf "%s %s",$8,$16}' InputFile
[/code]

Last edited by Firerat; 09-28-2013 at 04:58 PM.
 
Old 09-28-2013, 08:24 PM   #3
allend
LQ 5k Club
 
Registered: Oct 2003
Location: Melbourne
Distribution: Slackware64-15.0
Posts: 6,371

Rep: Reputation: 2750Reputation: 2750Reputation: 2750Reputation: 2750Reputation: 2750Reputation: 2750Reputation: 2750Reputation: 2750Reputation: 2750Reputation: 2750Reputation: 2750
grep is probably not the tool you want to use, as the regular expression matching is 'greedy'.
Another alternative is the 'cut' command.
Code:
bash-4.2$ echo 'data;data;data;data;data;;my_string;data;data;data;data;;;;my_string_2;;;;' | cut -d';' -f7,15
my_string;my_string_2
 
Old 09-29-2013, 02:54 AM   #4
cosminel
Member
 
Registered: Sep 2013
Posts: 31

Original Poster
Rep: Reputation: Disabled
Thank you for your replies.

I was hoping that grep has the ability to do what I want using a more complicated extended regexp which I can't determine at this point.

Firerat, the position of my_string changes its significance, this is why I want grep to match it at precisely that position. Furthermore, in several cases my_string = my_string_2 and as I said, depending on the user input, the meaning of the value differs.

If what I need grep to do is not possible, I will try the awk instead.
 
Old 09-29-2013, 06:09 AM   #5
Firerat
Senior Member
 
Registered: Oct 2008
Distribution: Debian sid
Posts: 2,683

Rep: Reputation: 783Reputation: 783Reputation: 783Reputation: 783Reputation: 783Reputation: 783Reputation: 783
Quote:
Originally Posted by cosminel View Post
Thank you for your replies.

I was hoping that grep has the ability to do what I want using a more complicated extended regexp which I can't determine at this point.

Firerat, the position of my_string changes its significance, this is why I want grep to match it at precisely that position. Furthermore, in several cases my_string = my_string_2 and as I said, depending on the user input, the meaning of the value differs.

If what I need grep to do is not possible, I will try the awk instead.

with awk you can test each field, you can the report which field it is
but
at the moment I still do not understand what you want from your description

show us your code and some input data, multiple lines.
so we have some context


but here I give you an awk ( not certain it fits with what you want/need )
Code:
awk -F\; -v string1="my_string" -v string2="my_string_2"  '{for (i=1;i<=NF;i++)
    {
     {if ( $i == string1) print "String1 found at field "i}
     {if ( $i == string2) print "String2 found at field "i}
    }
}' Input
 
Old 09-29-2013, 08:01 AM   #6
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,007

Rep: Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192
I think Firerat is on the mark, my only addition would be to alter the separator to include one or more semicolons:
Code:
awk -F";+" ...
 
Old 09-29-2013, 08:58 AM   #7
Firerat
Senior Member
 
Registered: Oct 2008
Distribution: Debian sid
Posts: 2,683

Rep: Reputation: 783Reputation: 783Reputation: 783Reputation: 783Reputation: 783Reputation: 783Reputation: 783
an alternative might be to put your data into an array

e.g.
Code:
MyArray=( $(sed -e 's/^/"/' -e 's/;/" "/g' -e s/$/\"/ Input ))
Edit: Forget the above
Code:
while read -d\; Element;do MyArray+=("$Element");done < Input
Code:
echo "Number of elements in MyArray= ${#MyArray[@]}"
echo -e "Array :-\n${MyArray[@]}"
echo "Note: Arrays start at 0 "
for ((i=0;i<${#MyArray[@]};i++));do
    echo "${i} = ${MyArray[$i]}"
done

echo "remove all \""
for ((i=0;i<${#MyArray[@]};i++));do
    echo "${i} = ${MyArray[$i]//\"}"
done
http://www.tldp.org/LDP/Bash-Beginners-Guide/html/
http://www.tldp.org/LDP/abs/html/
http://mywiki.wooledge.org/BashGuide
http://www.gnu.org/software/bash/manual/bashref.html

specifically
http://www.tldp.org/LDP/abs/html/arrays.html

Last edited by Firerat; 09-29-2013 at 09:07 AM. Reason: no need for the sed , use while read
 
Old 09-30-2013, 12:14 AM   #8
cosminel
Member
 
Registered: Sep 2013
Posts: 31

Original Poster
Rep: Reputation: Disabled
Thank you for your help. I will try to see which proposed solution returns the desired result.

To tell you the truth I thought it would be easier to write instructions for returning the whole line if the searched string is found at nth semicolon (which is used as a separatror).

Firerat, the information I have in those files is written in such a way that "my_string = received data" and "my_string_2 = sent data", and this can be determined solely on where they are positioned inside the line, having the semicolons as separators for all the data strings.

Also note that my_string and my_string_2 are interchangeable.

All I want is to extend a script that I made in order to contain these prompts:

"Enter received data string or leave blank:"
"Enter sent data string or leave blank:"

As the searched string may be positioned at the "received data" location or at the "sent data" location (which is determined by the nth semicolon), I want the returned results to conform to the user's choices when using grep to search the files based on the above prompts.

I hope this clarifies what I aim to do.
 
Old 09-30-2013, 12:48 AM   #9
Firerat
Senior Member
 
Registered: Oct 2008
Distribution: Debian sid
Posts: 2,683

Rep: Reputation: 783Reputation: 783Reputation: 783Reputation: 783Reputation: 783Reputation: 783Reputation: 783
I think either would work


if you are still stuck,

post a sample script, with sample data
along with some user input to test it
 
Old 10-07-2013, 03:07 PM   #10
cosminel
Member
 
Registered: Sep 2013
Posts: 31

Original Poster
Rep: Reputation: Disabled
I finally found some time to investigate your solutions. I found the command string that I was looking for:

grep string file* | awk -F";+" '$13 ~ "string" {print $0}'

Now, the trick is to pass the string which is a user input variable into the awk command. This is where I'm currently stuck. I looked over Firerat's command, searched the web but for the life of me I cannot figure out how to pass the script variable into awk. I do not understand the syntax. Here is part of my script:

Code:
#!/bin/bash

cd /root

read -p "Enter received data or leave blank: " rcvdata
read -p "Enter sent data or leave blank: " sntdata

if [ -z $sntdata ]; then
	grep $rcvdata testfile* | awk -F";+" '$13 ~ "$rcvdata" {print $0}'
fi
As you can see "rcvdata" and "sntdata" are user generated variables. Now, from what I understand I need to pass the script variable "rcvdata" to awk with -v (and here is the point where I get completely lost)
 
Old 10-07-2013, 03:43 PM   #11
Firerat
Senior Member
 
Registered: Oct 2008
Distribution: Debian sid
Posts: 2,683

Rep: Reputation: 783Reputation: 783Reputation: 783Reputation: 783Reputation: 783Reputation: 783Reputation: 783
awk --help
Code:
Usage: awk [POSIX or GNU style options] -f progfile [--] file ...
Usage: awk [POSIX or GNU style options] [--] 'program' file ...
POSIX options:		GNU long options: (standard)
	-f progfile		--file=progfile
	-F fs			--field-separator=fs
	-v var=val		--assign=var=val
Short options:		GNU long options: (extensions)
	-b			--characters-as-bytes
	-c			--traditional
	-C			--copyright
	-d[file]		--dump-variables[=file]
	-e 'program-text'	--source='program-text'
	-E file			--exec=file
	-g			--gen-pot
	-h			--help
	-L [fatal]		--lint[=fatal]
	-n			--non-decimal-data
	-N			--use-lc-numeric
	-O			--optimize
	-p[file]		--profile[=file]
	-P			--posix
	-r			--re-interval
	-S			--sandbox
	-t			--lint-old
	-V			--version
man awk
Code:
......
       -v var=val
       --assign var=val
              Assign the value val to the variable var, before execution of the program begins.  Such variable values are available to the BEGIN block of an AWK program.
......
Code:
#!/bin/bash

cd /root

read -p "Enter received data or leave blank: " rcvdata
read -p "Enter sent data or leave blank: " sntdata

if [ -z $sntdata ]; then
	grep $rcvdata testfile* | awk -F";+" '$13 ~ "$rcvdata" {print $0}'
        #^^^ You do not need this,            ^^^^^^^^^^^^ that does it
fi
Code:
#!/bin/bash

cd /root

read -p "Enter received data or leave blank: " rcvdata
read -p "Enter sent data or leave blank: " sntdata

if [ -z $sntdata ]; then
	awk -v Foo="$rcvdata" -F";+" '$13 ~ Foo {print $0}' testfile*
     #or
     #  awk -F";+" '$13 ~ "'"$rcvdata"'" {print $0}' testfile*
     # the seaGreen is protected from shell expansion
     # echo awk -F";+" '$13 ~ "$rcvdata" {print $0}' testfile*
     # echo awk -F";+" '$13 ~ "'"$rcvdata"'" {print $0}' testfile*
     # see the difference the '' makes
     # don't think of them as being around "$rcvdata", think "$rcvdata" as being outside the ''
fi

Last edited by Firerat; 10-07-2013 at 03:45 PM. Reason: switched to Foo=$rcvdata , I think better example than rcvdata=$rcvdata
 
Old 10-07-2013, 03:59 PM   #12
GazL
LQ Veteran
 
Registered: May 2008
Posts: 6,897

Rep: Reputation: 5019Reputation: 5019Reputation: 5019Reputation: 5019Reputation: 5019Reputation: 5019Reputation: 5019Reputation: 5019Reputation: 5019Reputation: 5019Reputation: 5019
Quote:
Originally Posted by cosminel View Post
So basicaully I want to grep for the 2nd "my_string" or "my_string_2". The only constant, non-changing markers I have in all this is the ";" character. So what I know for sure is that after the 7th ";" the 2nd "my_string" will always follow and after the 15th ";" "my_string_3" will always follow.

Is it possible to do the above with grep?

Thank you in advance.
If I'm understanding your requirements correctly then this grep string looks like it does what you're asking.
Code:
gazl@ws1:/tmp$ cat testdata
matchboth;data;data;data;;data;my_string;my_string;data;data;data;data;;;;my_string_2;;;;etc;etc
nomatch;data;data;data;;data;my_string;other_string;data;data;data;data;;;;other_string_2;;;;etc;etc
match8th;data;data;data;;data;my_string;my_string;data;data;data;data;;;;other_string_2;;;;etc;etc
match16th;data;data;data;;data;my_string;other_string;data;data;data;data;;;;my_string_2;;;;etc;etc
gazl@ws1:/tmp$ string1="my_string"
gazl@ws1:/tmp$ string2="my_string_2"
gazl@ws1:/tmp$ grep "\(^\([^;]*;\)\{7\}${string1};.*\)\|\(^\([^;]*;\)\{15\}${string2};.*\)" < testdata
matchboth;data;data;data;;data;my_string;my_string;data;data;data;data;;;;my_string_2;;;;etc;etc
match8th;data;data;data;;data;my_string;my_string;data;data;data;data;;;;other_string_2;;;;etc;etc
match16th;data;data;data;;data;my_string;other_string;data;data;data;data;;;;my_string_2;;;;etc;etc
gazl@ws1:/tmp$

Last edited by GazL; 10-07-2013 at 04:11 PM.
 
1 members found this post helpful.
Old 10-07-2013, 04:16 PM   #13
cosminel
Member
 
Registered: Sep 2013
Posts: 31

Original Poster
Rep: Reputation: Disabled
Oh I see now Firerat, I needed to define the variable for awk for the defined variable in the script Either do this or use the ' ' to separate. The syntax format is killing me since I am a total beginner.

I already knew that I could grab the data without grep but I had this impression that using solely awk would slow down the search considerably. I didn't get the chance to test this in the working environment (a server with loads of data). So I just temporarily thought of letting grep (or I could use fgrep) of grabbing the data and then pass the results to awk.

Thank you for your input GazL. I have to say, awk looks cleaner at this point

I will test grep/fgrep against awk on the production server to see which is the fastest and by what amount.

Thank you guys for your help. After further testing, If I don't get stuck somewhere, I will mark the thread as solved, as I understand it's a good thing to do.

Last edited by cosminel; 10-07-2013 at 07:38 PM.
 
Old 10-07-2013, 04:24 PM   #14
GazL
LQ Veteran
 
Registered: May 2008
Posts: 6,897

Rep: Reputation: 5019Reputation: 5019Reputation: 5019Reputation: 5019Reputation: 5019Reputation: 5019Reputation: 5019Reputation: 5019Reputation: 5019Reputation: 5019Reputation: 5019
Quote:
Originally Posted by cosminel View Post
Thank you for your input GazL. I have to say, awk looks cleaner at this point
it usually does. Regexes never look pretty.

I'd be interested to see the results of your benchmarking of awk v grep if you'd be kind enough to come back and let us know.
 
Old 10-07-2013, 04:58 PM   #15
cosminel
Member
 
Registered: Sep 2013
Posts: 31

Original Poster
Rep: Reputation: Disabled
Sure thing! Once things settle down around here I will begin testing and get back to you with my findings.

I am also wondering how much is the speed of awk affected by the complexity of the command that involves it.

But I will begin with a plain string search.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] Retain first occurence of a pattern, remove all others hector00 Programming 11 05-31-2013 02:07 PM
print pattern matching lines until immediate occurence of a character keerthika Linux - Newbie 7 04-11-2012 05:58 AM
[SOLVED] Grep until certain character or pattern appears ohijames Programming 7 06-28-2010 08:38 PM
how to delete nth character in a text file? xiawinter Linux - Software 3 05-13-2008 10:50 AM
pattern file with no return character ksun Linux - Newbie 1 12-28-2004 06:40 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie

All times are GMT -5. The time now is 08:09 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration