LinuxQuestions.org
Welcome to the most active Linux Forum on the web.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Software
User Name
Password
Linux - Software This forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.

Notices



Reply
 
Search this Thread
Old 07-26-2005, 09:21 PM   #1
jrdioko
Member
 
Registered: Oct 2002
Distribution: Debian 6.0.2 (squeeze)
Posts: 944

Rep: Reputation: 30
Question Sed and printing only part of a line


There've been a few times when I've wanted to do something like this, but all the pages I've read about sed still leave me confused about how it's supposed to be done. I have some output that I pipe through grep first to get the lines I want. I then want to only print the part of that line between two phrases I already know. For example, something prints the following:

This is the output of foo
Here is some useless information
Here is some important: DESIREDOUTPUT, information that you want

I can "grep important" to get the last line, but I want to use sed to tell it to print everything between "important: " and ", information". I'd like to be able to do this even if I don't know exactly what is before "important: " and after , information" and I don't know how many words or what characters DESIREDOUTPUT contains.
 
Click here to see the post LQ members have rated as the most helpful post in this thread.
Old 07-27-2005, 09:32 AM   #2
MensaWater
Guru
 
Registered: May 2005
Location: Atlanta Georgia USA
Distribution: Redhat (RHEL), CentOS, Fedora, Debian, FreeBSD, HP-UX, Solaris, SCO
Posts: 6,028
Blog Entries: 5

Rep: Reputation: 791Reputation: 791Reputation: 791Reputation: 791Reputation: 791Reputation: 791Reputation: 791
You're making it hard on yourself. If the file always has ":" after important and "," after DESIREDOUTPUT you can use awk instead:

grep important filename |awk -F: '{print $2}' | awk -F, '{print $1}'

The first awk says to set the delimiter to colon (instead of default which is white space). This would split the entry into two fields. Everything up to "important" is the first and everthing after the colon is the second.

The second awk says to split the remainder using the comma as delimiter at which point your DESIREDOUTPUT becomes the first field.

Alternatively if the lines always have the same number of words you can just do
grep important filename |awk '{print $5}' which would print the 5th word (as delimited by white space). If you need to get rid of the comma in DESIREDOUTPUT, you'd have to do the pipe shown above.
 
2 members found this post helpful.
Old 07-27-2005, 09:41 AM   #3
theYinYeti
Senior Member
 
Registered: Jul 2004
Location: France
Distribution: Arch Linux
Posts: 1,897

Rep: Reputation: 61
Quote:
Originally posted by jlightner You're making it hard on yourself. If the file always has ":" after important and "," after DESIREDOUTPUT you can use awk instead:...
I disagree. First, what he gave was obviously an example, so maybe it is not ':' and ',' that will actually precede and follow. Next, it is less secure: who knows, DESIREDOUTPUT may itself contains those characters. Finally, I actually find the sed solution to be simpler.
Here it is:
Code:
... | grep 'important' | sed 's/^.*important: \(.*\), information.*$/\1/'
Or even simpler (no grep):
Code:
... | sed -n 's/^.*important: \(.*\), information.*$/\1/p'
Yves.
 
Old 07-27-2005, 09:31 PM   #4
jrdioko
Member
 
Registered: Oct 2002
Distribution: Debian 6.0.2 (squeeze)
Posts: 944

Original Poster
Rep: Reputation: 30
Thank you both. What is the difference between awk and sed anyway? I looked at both and it seems both work differently, are better for different things, but essential accomplish the same goals. Is it important to know both well for dealing with things like that or can one handle most?

-- EDIT --
Also, does * match one character and .* match any number of characters?

Last edited by jrdioko; 07-27-2005 at 09:34 PM.
 
Old 07-28-2005, 03:24 AM   #5
theYinYeti
Senior Member
 
Registered: Jul 2004
Location: France
Distribution: Arch Linux
Posts: 1,897

Rep: Reputation: 61
In short, sed is a line editor: each line is read one by one (in the "pattern space"), and you do what you want using mostly regular expressions. sed has no variables, very few functions, and only one "buffer" for storing data (the "hold space").

awk is more of a programming language: it has some C-like functions and variables. It also has the built-in ability to split a line into fields, or ouput a line made of fields, using either separators of your choice, or fixed-length widths. awk, like sed, reads lines one by one.

Yves.
 
Old 07-28-2005, 08:47 AM   #6
MensaWater
Guru
 
Registered: May 2005
Location: Atlanta Georgia USA
Distribution: Redhat (RHEL), CentOS, Fedora, Debian, FreeBSD, HP-UX, Solaris, SCO
Posts: 6,028
Blog Entries: 5

Rep: Reputation: 791Reputation: 791Reputation: 791Reputation: 791Reputation: 791Reputation: 791Reputation: 791
Generally ? means is a meta matching one character and * is a meta matching any number of characters. Do man on egrep and awk and go to the section for "Regular Expressions" for more detail. grep typically doesn't do well with the metacharacters in most flavors of Unix/Linux but egrep has more support for regular expressions.

Example of metacharacter usage for the two you listed - Say you have files named:

charlie
charles
charlene
harlan

ls charl?e would find only charlie

ls charl*e would find charlie and charlene (of course so would ls *e).

ls ?har* would find the first three entries but not the last.

ls *arl* would find all four. (If these were the only four in the directory so would ls *)
 
Old 07-28-2005, 09:44 AM   #7
theYinYeti
Senior Member
 
Registered: Jul 2004
Location: France
Distribution: Arch Linux
Posts: 1,897

Rep: Reputation: 61
jlightner did a good summary of shell patterns.

Regular expressions are another thing, though. And unfortunately, there are different variants.

In short, what is common to all:

REPLACED ITEMS:
^ stands for the beginning of the line
$ stands for the end of the line
. stands for any character
[any] stands for letter 'a' or 'n' or 'y'
[^any] stands for any letter except 'a', 'n', or 'y'.
( and ) are used to group things, that can thereafter be refered-to elsewhere
\1 to \9 are references to the grouping () number 1 through 9 (\0 is the whole pattern)
[:group:] indicates a group of possible characters among which any can be chosed; 'group' can be 'space', 'letter', 'digit', 'blank', 'print', 'alnum', 'alpha'...; this notation is only possible inside [...] or [^...]

QUANTIFIERS:
? after something means that this thing is there 0 or 1 time.
* after something means that this thing is there any number of times, including 0
{n} after something means that this thing is there n times
{m,} after something means that this thing is there m times or more
{m,n} after something means that this thing is there between m and n times

If no quantifier is used then the "thing" is there exactly 1 time.

Now the differences

Some applications (let's say group A) need the grouping ( and ) to be escaped like that: \( and \); same for the { and } quantifier delimiters.
For those, (, ), {, and } are simply standard characters.

Other applications (let's say group B) don't need those escapes.
So for those, normal characters (, ), {, and } have to be escaped with a \.

Additionnaly, perl-compatable regular expressions accept some usefull shorthand notations. See the PHP manual for details: I find it well explained.


sed and awk are in group A. Javascript and most text editors are in group B.

Yves.
 
Old 07-28-2005, 09:50 AM   #8
MensaWater
Guru
 
Registered: May 2005
Location: Atlanta Georgia USA
Distribution: Redhat (RHEL), CentOS, Fedora, Debian, FreeBSD, HP-UX, Solaris, SCO
Posts: 6,028
Blog Entries: 5

Rep: Reputation: 791Reputation: 791Reputation: 791Reputation: 791Reputation: 791Reputation: 791Reputation: 791
My post was intended to answer his questions regarding metacharacters. I mentioned regular expressions as they allow for more granular selections than the simple metacharacters do.

I was surprised to find that neither my Debian nor my RedHat have regexp man pages. Most Unix variants do.
 
Old 07-28-2005, 12:54 PM   #9
archtoad6
Senior Member
 
Registered: Oct 2004
Location: Houston, TX (usa)
Distribution: MEPIS, Debian, Knoppix,
Posts: 4,727
Blog Entries: 15

Rep: Reputation: 231Reputation: 231Reputation: 231
(grep), sed, awk

You could also buy a book, or 2.

One the best buys I ever made was the 3rd edition (4th is the current) of Linux in a Nutshell from O'Reilly for US$4.98. It has good short chapters on both sed & awk. It also has the command reference (unlike the 2nd ed.) in one big chapter.

O'Reilly also publishes sed & awk & sed and awk Pocket Reference. (Ok, so that's 3 books.) I own the 1st 2 & can recommend them highly. I [sus|ex]pect the 3rd is equally good.

Last edited by archtoad6; 08-02-2005 at 08:52 AM.
 
Old 07-28-2005, 01:43 PM   #10
jrdioko
Member
 
Registered: Oct 2002
Distribution: Debian 6.0.2 (squeeze)
Posts: 944

Original Poster
Rep: Reputation: 30
Thanks again. That looks like just the book to go out and buy, but I'm going to have to hold off until later. Thanks for the general regexp explanation, though. I've tried to learn some basics online but I get overwhelmed with the details.
 
Old 05-17-2012, 10:47 AM   #11
Deedee393
LQ Newbie
 
Registered: May 2012
Posts: 5

Rep: Reputation: Disabled
[QUOTE=MensaWater;1768168]You're making it hard on yourself. If the file always has ":" after important and "," after DESIREDOUTPUT you can use awk instead:

grep important filename |awk -F: '{print $2}' | awk -F, '{print $1}'

How would you go about saving the output produced?
 
Old 05-17-2012, 11:05 AM   #12
MensaWater
Guru
 
Registered: May 2005
Location: Atlanta Georgia USA
Distribution: Redhat (RHEL), CentOS, Fedora, Debian, FreeBSD, HP-UX, Solaris, SCO
Posts: 6,028
Blog Entries: 5

Rep: Reputation: 791Reputation: 791Reputation: 791Reputation: 791Reputation: 791Reputation: 791Reputation: 791
[QUOTE=Deedee393;4680939]
Quote:
Originally Posted by MensaWater View Post
You're making it hard on yourself. If the file always has ":" after important and "," after DESIREDOUTPUT you can use awk instead:

grep important filename |awk -F: '{print $2}' | awk -F, '{print $1}'

How would you go about saving the output produced?
You really shouldn't append to ancient threads (this was from 2005). The only people likely to see it are those who originally subscribed and that assumes they are still around. It is better to open a new thread and if desired post a link to the old thread.

Having said that:

The way to save output from most commands is with redirection.
grep important filename |awk -F: '{print $2}' | awk -F, '{print $1}' >outputfile

You can name outputfile anything you want.

You should investigate "file descriptors" and "redirection" but the most important information is that file descriptor 1 is standard output (a/k/a stdout) and file descriptor 2 is standard error (a/k/a stderr). The ">outputfile" is actually shorthand for "1>outputfile" to redirect stdout into the file. It is common in scripts to redirect stderr to stdout to insure both output types go to the same place:

grep important filename |awk -F: '{print $2}' | awk -F, '{print $1}' >outputfile 2>&1
The 2>&1 tells it to send stderr to same location as stdout.


You can also use the information as a variable within a program by assigning the output of the command to a variable. (If it is more than one line or more than one word you'd have to investigate arrays to get most use out of it).

VAR=$(grep important filename |awk -F: '{print $2}' | awk -F, '{print $1}')

I'd suggest you do a web search for "shell scripting tutorial". There are many available and they will give you a good start on the basics.
 
1 members found this post helpful.
Old 05-17-2012, 08:15 PM   #13
chrism01
Guru
 
Registered: Aug 2004
Location: Sydney
Distribution: Centos 6.6, Centos 5.10
Posts: 16,324

Rep: Reputation: 2041Reputation: 2041Reputation: 2041Reputation: 2041Reputation: 2041Reputation: 2041Reputation: 2041Reputation: 2041Reputation: 2041Reputation: 2041Reputation: 2041
Seeing as this has been re-opened, the book on regex (imho) is here http://regex.info/
Also the orig qn sounds like using word boundary matches may help.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
trying to delete a line with sed deoren Linux - General 2 01-03-2005 10:26 PM
Insert character into a line with sed? & variables in sed? jago25_98 Programming 5 03-11-2004 07:12 AM
part of line disappears when scrolling in Mozilla andrewstr Linux - Software 4 11-05-2003 07:38 PM
How to extract a part of a line by sed? J_Szucs Programming 2 02-15-2003 07:49 PM
sed: replace one line with >one line bbeers Programming 3 11-19-2002 06:27 PM


All times are GMT -5. The time now is 03:49 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration