Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game. |
Notices |
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
Are you new to LinuxQuestions.org? Visit the following links:
Site Howto |
Site FAQ |
Sitemap |
Register Now
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
|
 |
01-06-2011, 12:14 AM
|
#1
|
LQ Newbie
Registered: Dec 2010
Posts: 12
Rep:
|
printing a specific word out of a file.
Hello
so i have a file that has the following
file1.txt
ID1 age_11 dog_n3 parent_dog_n1
ID1 age_7 dog_n4 parent_dog_n3
ID1 dog_n5 age_4
ID1 dog_n6 age_4
ID1 age_7 dog_n7
ID1 age_11 dog_n1
ID1 dog_n2 age_3 parent_dog_n3
and i would like the output to be
dog_n3
dog_n4
dog_n5
dog_n6
dog_n7
dog_n1
dog_n2
As you can see i would like the output file to be just the dogs, not the otehr information. But because the information is mixed up how can i extract only the dogs? (i cant do and awk '{print }' because the dogs are found in colounm 2 or 3 or sometimes even 4. and the sed command is confusing me!
please help me!
PS programming is in bash.
|
|
|
01-06-2011, 01:24 AM
|
#2
|
Member
Registered: Nov 2010
Distribution: Debian Lenny
Posts: 136
Rep:
|
Sounds like homework so I'll say that it can be done with grep uniq and cut.
|
|
|
01-06-2011, 02:14 AM
|
#3
|
LQ Guru
Registered: Sep 2009
Location: Perth
Distribution: Arch
Posts: 10,038
|
Or you could use awk and just loop over the fields for something starting with 'dog'
|
|
0 members found this post helpful.
|
01-06-2011, 05:05 PM
|
#4
|
LQ Newbie
Registered: Dec 2010
Posts: 12
Original Poster
Rep:
|
hahah thanks for the reply guys
but just to clarify it is not homework hahah, im just trying to teach my self some bash scripting for linux. (just for my own understanding of scripts)
in reality the file has nothing to do with dogs and cats, i just got some data that i want to extract certain names and numbers from (but that is to complicated to write up in this post so im using as an example dog/cat for simplicity)
but so far im trying to understand how grep works>> Im aware that you got to use grep; so using grep dog file1.txt will print all lines that have dog in it..... so how can it be done to only print the specific word that i want? using grep -w "specific word" file1.txt doesnt seem to do much! please help
|
|
|
01-06-2011, 05:08 PM
|
#5
|
LQ Guru
Registered: Sep 2003
Location: Bologna
Distribution: CentOS 6.5 OpenSuSE 12.3
Posts: 10,509
|
Code:
grep -o -w dog_.. file1.txt
|
|
|
01-06-2011, 06:23 PM
|
#6
|
Member
Registered: Nov 2010
Distribution: Debian Lenny
Posts: 136
Rep:
|
I did it like this: (note that the [1-9] only allows for a single digit following the _n)
Code:
04:20:38 /home/barrie/tmp $ > grep -on dog_n[1-9] ./file1.txt | uniq -w 2 | cut -c 3-
dog_n3
dog_n4
dog_n5
dog_n6
dog_n7
dog_n1
dog_n2
04:22:02 /home/barrie/tmp $ >
|
|
|
01-06-2011, 07:00 PM
|
#7
|
Senior Member
Registered: Dec 2010
Location: Finland
Distribution: Xubuntu, CentOS, LFS
Posts: 1,723
|
If the "dog_" field is never at the start of the line,
Code:
sed -ne 's|^.*[\t\v\f ]\(dog_[^\t\v\f ]*\).*$|\1|p' file1.txt
otherwise
Code:
sed -ne 's|^\(dog_[^\t\v\f ]*\).*$|\1|p; s|^.*[\t\v\f ]\(dog_[^\t\v\f ]*\).*$|\1|p' file1.txt
Here's the breakdown of the first pattern:
Code:
s| This is a replacement command, with | as the separator.
^.* The line may start with anything (or nothing).
[\t\v\f ] Then there must be a tab, a vertical tab, a linefeed, or a space character.
\(dog_[^\t\v\f ]*\) Then there must be "dog_", then any number of characters other than
tab, vertical tab, linefeed, or space.
This matching bit is saved as "\1" for use in the replacement.
.*$ There may be anything or nothing up to the end of the line.
| Replacement follows:
\1 The marked bit.
| Options follow.
p If there was a match, print the line after the replacement.
Because the pattern matches an entire line, only the replacement is printed.
The second pattern is the same, except it first matches the line starting with "dog_", and then elsewhere.
I sometimes use a text editor or even pen and paper to build up a model (in my "own" markup -- basically doodling) of the desired pattern, then just write it as a regular expression. I've found that this really saves time, and makes it pretty easy to construct even complex regular expressions for sed, grep and friends.
Or, adapting from barriehie's solution,
Code:
grep -ow 'dog_[^\t\v\f ]*' file1.txt
does the same thing in most locales.
Nominal Animal
Last edited by Nominal Animal; 03-21-2011 at 01:44 AM.
|
|
|
01-06-2011, 07:00 PM
|
#8
|
Moderator
Registered: Apr 2002
Location: earth
Distribution: slackware by choice, others too :} ... android.
Posts: 23,067
|
Quote:
Originally Posted by barriehie
I did it like this: (note that the [1-9] only allows for a single digit following the _n)
Code:
04:20:38 /home/barrie/tmp $ > grep -on dog_n[1-9] ./file1.txt | uniq -w 2 | cut -c 3-
dog_n3
dog_n4
dog_n5
dog_n6
dog_n7
dog_n1
dog_n2
04:22:02 /home/barrie/tmp $ >
|
Heh ... how's that for obfuscation w/ commandline tools? =o)
|
|
0 members found this post helpful.
|
01-06-2011, 08:24 PM
|
#9
|
LQ Guru
Registered: Sep 2009
Location: Perth
Distribution: Arch
Posts: 10,038
|
So in awk it would be:
Code:
awk '{for(i=1;i<=NF;i++)if($i ~ /^dog/)print $i}' file
Edit: or even simpler
Code:
awk '/^dog/' RS="[ \n]" file
Last edited by grail; 01-06-2011 at 08:28 PM.
|
|
|
01-23-2011, 06:11 PM
|
#10
|
Senior Member
Registered: Oct 2004
Location: Houston, TX (usa)
Distribution: MEPIS, Debian, Knoppix,
Posts: 4,727
|
IMNRHO, grep -o is the only reasonable base to build this command on. It is direct & elegant.
The way the problem is stated, w/ a space being the field delimiter & "dog" the common portion, this is all that is necessary:
Code:
grep -ow 'dog[^ ]*' file1.txt
Although there was patently no sort in the example, & it was ambiguous as to whether entries are unique, sort -u could be added:
Code:
grep -ow 'dog[^ ]*' file1.txt | sort -u
Sometimes I might complicate it like this to expose the logic better:
Code:
cat file1.txt | grep -ow 'dog[^ ]*' | sort -u
Notes:
1. I built the file1.txt & tested all 3 of the above; including w/ an add'l line of only "dog_x".
2. Using -w to allow for the "dog_" field to be 1st on the line is superior to this alternative:
Code:
grep -o '[ ^]dog[^ ]*' file1.txt
Even though I love the symmetry  , there's an unwanted space in the output most of the time.
|
|
|
All times are GMT -5. The time now is 01:58 PM.
|
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.
|
Latest Threads
LQ News
|
|