awk to remove the end considering the last field?

patrick295767 · 07-28-2012, 12:54 AM

Hi,

I would like to remove using awk what is after the last matching field:

Code:

echo "my documents here that are made (bla).doc" | awk ...

here the field for the example would be the space before "made" and "(bla)"

Desired output/ wished output:

Code:

"my documents here that are made"

Any ideas would be very welcome !

Thanks

firstfire · 07-28-2012, 01:21 AM

Hi.

Code:

$ echo "my documents here that are made (bla).doc" | awk -F ' +[(]bla' '{print $1}'
my documents here that are made
$ echo "my documents here that are made (bla).doc" | sed 's/ *(bla.*//'
my documents here that are made

SED looks better suited for this problem.

grail · 07-28-2012, 02:51 AM

Not sure if the data at the end needs to be checked, but if not:

Code:

echo "my documents here that are made (bla).doc" | awk '$NF="\0"'

David the H. · 07-28-2012, 10:15 AM

grail gave you awk. Just set $NF to null and print.

firstfire gave you sed as well, although I'd make the regex a bit more generic: just strip everything from the last space to the end.

Code:

echo "my documents here that are made (bla).doc" | sed 's/ [^ ]*$//'

Or if the string is, or can be, stored in a shell variable first, then a simple parameter substitution can also be used.

Code:

$ text="my documents here that are made (bla).doc"
$ echo "${text% *}"

dru8274 · 07-28-2012, 11:16 PM

Quote:

Originally Posted by grail

Code:

echo "my documents here that are made (bla).doc" | awk '$NF="\0"'

I am fairly new to awk... could you explain why that works please? TIA.

amboxer21 · 07-28-2012, 11:48 PM

I do not like sed. Its way too ugly IMO.

Why not use a field separator, print everything before it, and store it in a variable that you can manipulate later if needed?

Code:

var=$(echo "my documents here that are made (bla).doc" | awk -F"(" '{print $1}'); echo $var

EDIT:
If you want quotes, then:

Code:

echo "my documents here that are made (bla).doc" | awk -F"(" '{print "\""$1"\""}'

David the H. · 07-29-2012, 04:27 AM

Quote:

Originally Posted by amboxer21

I do not like sed. Its way too ugly IMO.

Why not use a field separator, print everything before it, and store it in a variable that you can manipulate later if needed?

Code:

var=$(echo "my documents here that are made (bla).doc" | awk -F"(" '{print $1}'); echo $var

I can't see sed as "ugly". It's just a tool that applies regular expressions to lines of text. Quite simple and efficient, for the most part. I admit that sometimes the expressions it uses can get a bit complex, but that's a different thing, and awk can also be just as cryptic, depending on the job.

sed also has an advantage over awk in that awk's default field splitting doesn't preserve multiple whitespace characters.

The awk solution you gave does avoid that, since it uses a non-whitespace delimiter, but it now depends on there being a parentheses in the line, which is not necessarily a given according to the OP description. Indeed, he specifically stated that he wanted to remove the last space-delimited field.

Speaking of which, both yours and grail's solutions end up leaving that extra space tacked onto the end of the output. This is probably unwanted behavior.

Finally, if we're going to store the value in a variable anyway, just use the parameter substitution I gave earlier. It's even cleaner and much more efficient than either of the external tools.

grail · 07-29-2012, 10:50 AM

Quote:

Originally Posted by dru8274

I am fairly new to awk... could you explain why that works please?

As David has mentioned the default FS in awk is white space and all fields left after the splitting on white space are referenced by a number. Like FS, NF is another awk variable
which is equal to the number of fields created. By then placing the $ sign in front of NF we now reference the last field in the list and set it to null.

As an addition to what David has already mentioned, the awk output once assigned to a variable would also get rid of the pesky space at the end

If the data needs to be delivered without assignment, we could change it like so:

Code:

echo "my documents here that are made (bla).doc" | awk '$NF="\010"'

patrick295767 · 07-29-2012, 02:50 PM

Lot of attempts, looks like it is difficult to have enough xp to make it with awk.

If I recall well, it is fairly possible with awk

| awk with NF is on good way, and we need to add : -f " " to define as delimiter the space.

grail · 07-29-2012, 08:30 PM

Quote:

| awk with NF is on good way, and we need to add : -f " " to define as delimiter the space.

Not sure what you are trying to get at here? The awk solutions do work, all to differing levels I will agree. Also, why would you need to reset the
delimiter when it is already defaulting to white space?

amboxer21 · 07-29-2012, 08:43 PM

Grail is right

this:

Code:

awk -F" " '{ }'

is the same as:

Code:

awk '{ }'

The default delim for Awk is white space.

I have a question for david. Adding double quotations around the output with awk would be trivial '{ print "\""$1"\""}'. But how would you add double quotations to your provided sed example?

Code:

echo "my documents here that are made (bla).doc" | sed 's/ [^ ]*$//'

grail · 07-30-2012, 02:30 AM

If I may:

Code:

echo "my documents here that are made (bla).doc" | sed -r 's/(.*) [^ ]*$/"\1"/'

I would add that setting the delimiter in awk to exactly a space will not yield the same results all the time as the default FS is uniq in that it will gobble up
all white space and also remove any from the start of the first record, which if you are using spaces to say there are several fields missing at the start
is not what you would want. I do not see the issue occurring in the current case / example.

David the H. · 07-30-2012, 12:10 PM

grail has answered the last one for me. Modifying the output now means extracting a substring and adding to it, rather than just stripping off the unwanted part and printing the rest. So we use a set of capturing parentheses and a \n backreference to extract it and print out the desired part with quotemarks attached*.

The regex can also be made slightly simpler now, however. Due to the greedy behavior of "*", it's not necessary to use a negating character class or an anchor.

Code:

echo "my documents here that are made (bla).doc" | sed -r 's/(.*) .*/"\1"/'

This is all standard regex stuff. You really should study up on it. You'll be glad you did. Learning how to use regular expressions effectively is, IMO, the single most useful thing I studied when learning scripting.

Here are a few regular expressions tutorials:
http://mywiki.wooledge.org/RegularExpression
http://www.grymoire.com/Unix/Regular.html
http://www.regular-expressions.info/

*You do need to be aware of how the shell processes quotes here too. The single quotes around the entire expression escape the double quotes inside them, so that the shell passes them on literally to sed.

http://mywiki.wooledge.org/Arguments
http://mywiki.wooledge.org/WordSplitting
http://mywiki.wooledge.org/Quotes

amboxer21 · 07-31-2012, 07:31 PM

Sorry for high jacking the thread lol but I havent seen the OP ask any questions. So, while I have 2 great members already here, I figure I ask a question. I have been reading tutorials on sed and this tool is crazy awesome! There is so much to take in and memorize! I have a good understanding of awk already but sed seems so much more powerful! I could do more with less!

So, the question is; What would you reccomend as a beginner Sed project to reinforce the rules and tricks of the language/tool?

danielbmartin · 07-31-2012, 08:20 PM

Quote:

Originally Posted by amboxer21

I have a good understanding of awk ... What would you recommend as a beginner Sed project to reinforce the rules and tricks of the language/tool?

Suggestion: take any code you wrote which contains non-trivial awks and write a derivative version in which some of those awks are replaced with functionally equivalent seds. Then, make careful timings to determine which version runs faster. Compare the code to decide which version is more readable. Post your results on this forum.

Daniel B. Martin