grep for pattern following the nth occurence of a character in a file
Hello everyone,
After days of searching articles, forums etc I still can't get grep to do what I want. I have some files that contain data in the following format and I am interested in the my_string and my_string_2 as shown below: data;data;data;data;;data;my_string;my_string;data;data;data;data;;;;my_string_2;;;;etc;etc Things to consider: - "data" may contain anything and the lenght may vary - as it is clearly shown the data strings are separated by ; or ;; or ;;;; - sometimes I want grep to look for the 2nd "my_string", sometimes for "my_string_2" as they will represent user input in a script, something like: "Enter [my_string] or leave blank" and "Enter [my_string_2] or leave blank" So basicaully I want to grep for the 2nd "my_string" or "my_string_2". The only constant, non-changing markers I have in all this is the ";" character. So what I know for sure is that after the 7th ";" the 2nd "my_string" will always follow and after the 15th ";" "my_string_3" will always follow. Is it possible to do the above with grep? Thank you in advance. |
I really don't understand what you are trying to do
why do you want the n'th match? can you post your script so I can get an idea of what you want I have a feeling you really want awk, but full(er) context will help Code:
awk -F\; '{printf "%s %s",$8,$16}' InputFile [code] awk -F\; '{printf "%s %s",$8,$16}' InputFile [/code] |
grep is probably not the tool you want to use, as the regular expression matching is 'greedy'.
Another alternative is the 'cut' command. Code:
bash-4.2$ echo 'data;data;data;data;data;;my_string;data;data;data;data;;;;my_string_2;;;;' | cut -d';' -f7,15 |
Thank you for your replies.
I was hoping that grep has the ability to do what I want using a more complicated extended regexp which I can't determine at this point. Firerat, the position of my_string changes its significance, this is why I want grep to match it at precisely that position. Furthermore, in several cases my_string = my_string_2 and as I said, depending on the user input, the meaning of the value differs. If what I need grep to do is not possible, I will try the awk instead. |
Quote:
with awk you can test each field, you can the report which field it is but at the moment I still do not understand what you want from your description show us your code and some input data, multiple lines. so we have some context but here I give you an awk ( not certain it fits with what you want/need ) Code:
awk -F\; -v string1="my_string" -v string2="my_string_2" '{for (i=1;i<=NF;i++) |
I think Firerat is on the mark, my only addition would be to alter the separator to include one or more semicolons:
Code:
awk -F";+" ... |
an alternative might be to put your data into an array
e.g. Code:
MyArray=( $(sed -e 's/^/"/' -e 's/;/" "/g' -e s/$/\"/ Input )) Code:
while read -d\; Element;do MyArray+=("$Element");done < Input Code:
echo "Number of elements in MyArray= ${#MyArray[@]}" http://www.tldp.org/LDP/abs/html/ http://mywiki.wooledge.org/BashGuide http://www.gnu.org/software/bash/manual/bashref.html specifically http://www.tldp.org/LDP/abs/html/arrays.html |
Thank you for your help. I will try to see which proposed solution returns the desired result.
To tell you the truth I thought it would be easier to write instructions for returning the whole line if the searched string is found at nth semicolon (which is used as a separatror). Firerat, the information I have in those files is written in such a way that "my_string = received data" and "my_string_2 = sent data", and this can be determined solely on where they are positioned inside the line, having the semicolons as separators for all the data strings. Also note that my_string and my_string_2 are interchangeable. All I want is to extend a script that I made in order to contain these prompts: "Enter received data string or leave blank:" "Enter sent data string or leave blank:" As the searched string may be positioned at the "received data" location or at the "sent data" location (which is determined by the nth semicolon), I want the returned results to conform to the user's choices when using grep to search the files based on the above prompts. I hope this clarifies what I aim to do. |
I think either would work
if you are still stuck, post a sample script, with sample data along with some user input to test it |
I finally found some time to investigate your solutions. I found the command string that I was looking for:
grep string file* | awk -F";+" '$13 ~ "string" {print $0}' Now, the trick is to pass the string which is a user input variable into the awk command. This is where I'm currently stuck. I looked over Firerat's command, searched the web but for the life of me I cannot figure out how to pass the script variable into awk. I do not understand the syntax. Here is part of my script: Code:
#!/bin/bash |
awk --help
Code:
Usage: awk [POSIX or GNU style options] -f progfile [--] file ... Code:
...... Code:
Code:
#!/bin/bash |
Quote:
Code:
gazl@ws1:/tmp$ cat testdata |
Oh I see now Firerat, I needed to define the variable for awk for the defined variable in the script :) Either do this or use the ' ' to separate. The syntax format is killing me since I am a total beginner.
I already knew that I could grab the data without grep but I had this impression that using solely awk would slow down the search considerably. I didn't get the chance to test this in the working environment (a server with loads of data). So I just temporarily thought of letting grep (or I could use fgrep) of grabbing the data and then pass the results to awk. Thank you for your input GazL. I have to say, awk looks cleaner at this point :) I will test grep/fgrep against awk on the production server to see which is the fastest and by what amount. Thank you guys for your help. After further testing, If I don't get stuck somewhere, I will mark the thread as solved, as I understand it's a good thing to do. |
Quote:
I'd be interested to see the results of your benchmarking of awk v grep if you'd be kind enough to come back and let us know. |
Sure thing! Once things settle down around here I will begin testing and get back to you with my findings.
I am also wondering how much is the speed of awk affected by the complexity of the command that involves it. But I will begin with a plain string search. |
All times are GMT -5. The time now is 08:32 PM. |