[SOLVED] grep for pattern following the nth occurence of a character in a file
Linux - NewbieThis Linux forum is for members that are new to Linux.
Just starting out and have a question?
If it is not in the man pages or the how-to's this is the place!
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
grep for pattern following the nth occurence of a character in a file
Hello everyone,
After days of searching articles, forums etc I still can't get grep to do what I want. I have some files that contain data in the following format and I am interested in the my_string and my_string_2 as shown below:
Things to consider:
- "data" may contain anything and the lenght may vary
- as it is clearly shown the data strings are separated by ; or ;; or ;;;;
- sometimes I want grep to look for the 2nd "my_string", sometimes for "my_string_2" as they will represent user input in a script, something like: "Enter [my_string] or leave blank" and "Enter [my_string_2] or leave blank"
So basicaully I want to grep for the 2nd "my_string" or "my_string_2". The only constant, non-changing markers I have in all this is the ";" character. So what I know for sure is that after the 7th ";" the 2nd "my_string" will always follow and after the 15th ";" "my_string_3" will always follow.
I was hoping that grep has the ability to do what I want using a more complicated extended regexp which I can't determine at this point.
Firerat, the position of my_string changes its significance, this is why I want grep to match it at precisely that position. Furthermore, in several cases my_string = my_string_2 and as I said, depending on the user input, the meaning of the value differs.
If what I need grep to do is not possible, I will try the awk instead.
I was hoping that grep has the ability to do what I want using a more complicated extended regexp which I can't determine at this point.
Firerat, the position of my_string changes its significance, this is why I want grep to match it at precisely that position. Furthermore, in several cases my_string = my_string_2 and as I said, depending on the user input, the meaning of the value differs.
If what I need grep to do is not possible, I will try the awk instead.
with awk you can test each field, you can the report which field it is
but
at the moment I still do not understand what you want from your description
show us your code and some input data, multiple lines.
so we have some context
but here I give you an awk ( not certain it fits with what you want/need )
Code:
awk -F\; -v string1="my_string" -v string2="my_string_2" '{for (i=1;i<=NF;i++)
{
{if ( $i == string1) print "String1 found at field "i}
{if ( $i == string2) print "String2 found at field "i}
}
}' Input
Thank you for your help. I will try to see which proposed solution returns the desired result.
To tell you the truth I thought it would be easier to write instructions for returning the whole line if the searched string is found at nth semicolon (which is used as a separatror).
Firerat, the information I have in those files is written in such a way that "my_string = received data" and "my_string_2 = sent data", and this can be determined solely on where they are positioned inside the line, having the semicolons as separators for all the data strings.
Also note that my_string and my_string_2 are interchangeable.
All I want is to extend a script that I made in order to contain these prompts:
"Enter received data string or leave blank:"
"Enter sent data string or leave blank:"
As the searched string may be positioned at the "received data" location or at the "sent data" location (which is determined by the nth semicolon), I want the returned results to conform to the user's choices when using grep to search the files based on the above prompts.
Now, the trick is to pass the string which is a user input variable into the awk command. This is where I'm currently stuck. I looked over Firerat's command, searched the web but for the life of me I cannot figure out how to pass the script variable into awk. I do not understand the syntax. Here is part of my script:
Code:
#!/bin/bash
cd /root
read -p "Enter received data or leave blank: " rcvdata
read -p "Enter sent data or leave blank: " sntdata
if [ -z $sntdata ]; then
grep $rcvdata testfile* | awk -F";+" '$13 ~ "$rcvdata" {print $0}'
fi
As you can see "rcvdata" and "sntdata" are user generated variables. Now, from what I understand I need to pass the script variable "rcvdata" to awk with -v (and here is the point where I get completely lost)
Usage: awk [POSIX or GNU style options] -f progfile [--] file ...
Usage: awk [POSIX or GNU style options] [--] 'program' file ...
POSIX options: GNU long options: (standard)
-f progfile --file=progfile
-F fs --field-separator=fs
-v var=val --assign=var=val
Short options: GNU long options: (extensions)
-b --characters-as-bytes
-c --traditional
-C --copyright
-d[file] --dump-variables[=file]
-e 'program-text' --source='program-text'
-E file --exec=file
-g --gen-pot
-h --help
-L [fatal] --lint[=fatal]
-n --non-decimal-data
-N --use-lc-numeric
-O --optimize
-p[file] --profile[=file]
-P --posix
-r --re-interval
-S --sandbox
-t --lint-old
-V --version
man awk
Code:
......
-v var=val
--assign var=val
Assign the value val to the variable var, before execution of the program begins. Such variable values are available to the BEGIN block of an AWK program.
......
Code:
#!/bin/bash
cd /root
read -p "Enter received data or leave blank: " rcvdata
read -p "Enter sent data or leave blank: " sntdata
if [ -z $sntdata ]; then
grep $rcvdata testfile* | awk -F";+" '$13 ~ "$rcvdata" {print $0}'
#^^^ You do not need this, ^^^^^^^^^^^^ that does it
fi
Code:
#!/bin/bash
cd /root
read -p "Enter received data or leave blank: " rcvdata
read -p "Enter sent data or leave blank: " sntdata
if [ -z $sntdata ]; then
awk -v Foo="$rcvdata" -F";+" '$13 ~ Foo {print $0}' testfile*
#or
# awk -F";+" '$13 ~ "'"$rcvdata"'" {print $0}' testfile*
# the seaGreen is protected from shell expansion
# echo awk -F";+" '$13 ~ "$rcvdata" {print $0}' testfile*
# echo awk -F";+" '$13 ~ "'"$rcvdata"'" {print $0}' testfile*
# see the difference the '' makes
# don't think of them as being around "$rcvdata", think "$rcvdata" as being outside the ''
fi
Last edited by Firerat; 10-07-2013 at 03:45 PM.
Reason: switched to Foo=$rcvdata , I think better example than rcvdata=$rcvdata
So basicaully I want to grep for the 2nd "my_string" or "my_string_2". The only constant, non-changing markers I have in all this is the ";" character. So what I know for sure is that after the 7th ";" the 2nd "my_string" will always follow and after the 15th ";" "my_string_3" will always follow.
Is it possible to do the above with grep?
Thank you in advance.
If I'm understanding your requirements correctly then this grep string looks like it does what you're asking.
Oh I see now Firerat, I needed to define the variable for awk for the defined variable in the script Either do this or use the ' ' to separate. The syntax format is killing me since I am a total beginner.
I already knew that I could grab the data without grep but I had this impression that using solely awk would slow down the search considerably. I didn't get the chance to test this in the working environment (a server with loads of data). So I just temporarily thought of letting grep (or I could use fgrep) of grabbing the data and then pass the results to awk.
Thank you for your input GazL. I have to say, awk looks cleaner at this point
I will test grep/fgrep against awk on the production server to see which is the fastest and by what amount.
Thank you guys for your help. After further testing, If I don't get stuck somewhere, I will mark the thread as solved, as I understand it's a good thing to do.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.