LinuxQuestions.org
Did you know LQ has a Linux Hardware Compatibility List?
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices



Reply
 
Search this Thread
Old 11-04-2012, 02:30 AM   #1
marky9074
Member
 
Registered: Nov 2003
Posts: 43

Rep: Reputation: 15
Script to remove dynamic array from file - sed or grep


Hi guys I am trying to do a search and return a list of items in a file and then delete all but one of them. All of these lines are 80 characters (padded by spaces), so need to retain the padding....

Code:
H0019 1   1 06466745.12N 00513800.45E                                           
H0019 1   2 06467464.46N 00512968.25E                                           
H0019 1   3 06467783.25N 00512599.43E                                           
H0019 1   4 06467963.08N 00512391.38E                                           
H0019 1   5 06468682.42N 00511559.18E                                           
H0019 1   6 06469001.21N 00511190.36E                                           
H0019 1   7 06469189.22N 00510972.86E                                           
H0019 1   8 06469499.84N 00510613.50E                                           
H0019 1   9 06470186.48N 00509819.12E                                           
H0019 1  10 06470505.27N 00509450.30E                                           
H0019 1  11 06470693.28N 00509232.80E                                           
H0019 1  12 06471012.07N 00508863.98E                                           
H0019 1  13 06471715.06N 00508050.69E                                           
H0019 1  14 06472033.85N 00507681.87E                                           
H0019 1  15 06472221.86N 00507464.37E                                           
H0019 1  16 06472924.85N 00506651.08E                                           
H0019 1  17 06473243.64N 00506282.26E                                           
H0019 1  18 06473423.47N 00506074.21E                                           
H0019 1  19 06474142.81N 00505242.01E                                           
H0019 1  20 06474461.60N 00504873.19E                                           
H0019 1  21 06474649.61N 00504655.69E                                           
H0019 1  22 06474960.23N 00504296.33E                                           
H0019 1  23 06475646.87N 00503501.95E                                           
H0019 1  24 06476333.24N 00502707.38E
The above would be the result if I used grep to look for H00019 in the file. I could use

Code:
grep -n "H0019" | cut -f1 -d:
or

Code:
sed -n '/H0019/='
to return a list of row numbers. But how can I get this to loop around and delete (presumably using sed) all rows bar the last one? In addition I want to rename the last one in this example.

Code:
H0019 1  24
to
Code:
H0019 1   1
Any help would be much appreciated. I guess also I should handle multiple files, and not just assume I going to use this one file at a time.

Thanks,

Mark

Last edited by marky9074; 11-05-2012 at 05:18 PM.
 
Old 11-04-2012, 05:37 AM   #2
pixellany
LQ Veteran
 
Registered: Nov 2005
Location: Annapolis, MD
Distribution: Arch/XFCE
Posts: 17,802

Rep: Reputation: 729Reputation: 729Reputation: 729Reputation: 729Reputation: 729Reputation: 729Reputation: 729
To return only the last line (in a file or data stream):
Code:
sed -n '$p' filename
To also modify the last line:
Code:
sed -n '$s/old/new/p' filename
 
Old 11-04-2012, 07:41 AM   #3
marky9074
Member
 
Registered: Nov 2003
Posts: 43

Original Poster
Rep: Reputation: 15
OK, so I cobbled together a few things to get something similar to what I require:

Code:
grep H0019 header.p2 | sed -n '$s/H0019/E0019/p' > temp
grep -v H0019 header.p2 > temp2
sed -i 's/E0019/H0019/g' temp
sed -i '/H0018/ r temp' temp2
But it still doesnt get around in my example that I will return

Code:
H0019 1  24
Instead of

Code:
H0019 1   1
And plus I have now ended up with another file rather than doing it in the stream :/

Last edited by marky9074; 11-04-2012 at 05:08 PM.
 
Old 11-04-2012, 07:50 AM   #4
grail
Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 7,692

Rep: Reputation: 1987Reputation: 1987Reputation: 1987Reputation: 1987Reputation: 1987Reputation: 1987Reputation: 1987Reputation: 1987Reputation: 1987Reputation: 1987Reputation: 1987
Can we go back a step ... does the file in question only contain those 5 lines? Or are we likely to find the pattern amongst other lines and so need to preserve other data?
 
Old 11-04-2012, 09:19 AM   #5
marky9074
Member
 
Registered: Nov 2003
Posts: 43

Original Poster
Rep: Reputation: 15
No, the files have lots of other lines in them, this is just the header of the file, but the record is unique, so we can work with just searching for H0019. It would be impossible for H0019 to be in the data part of the file, so there is no need to think about preservation etc..

Cheers,

Mark
 
Old 11-04-2012, 10:38 AM   #6
grail
Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 7,692

Rep: Reputation: 1987Reputation: 1987Reputation: 1987Reputation: 1987Reputation: 1987Reputation: 1987Reputation: 1987Reputation: 1987Reputation: 1987Reputation: 1987Reputation: 1987
Sorry to harp on, but does this mean the position in the file of the last entry is important or the fact that you end up with 'H0019 1' in the file, ie you could print the first and
delete the rest?
 
Old 11-04-2012, 11:24 AM   #7
marky9074
Member
 
Registered: Nov 2003
Posts: 43

Original Poster
Rep: Reputation: 15
I can't print the first and delete the rest, as the only line that is correct is the last one... but as the array length changes on every file it is difficult to pin down. For example say on one file H0019 is present in rows 10-20, I wan't to delete 10-19, keep 20, but rename it as H0019 1 rather than H0019 10 (everytime there is a H0019 it increments by one, so given that I only want one record it will always be H0019 1). The next file the rows could be 10-30, and keep row 30 etc.

Hope that makes sense...

Last edited by marky9074; 11-04-2012 at 11:26 AM.
 
Old 11-04-2012, 12:12 PM   #8
grail
Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 7,692

Rep: Reputation: 1987Reputation: 1987Reputation: 1987Reputation: 1987Reputation: 1987Reputation: 1987Reputation: 1987Reputation: 1987Reputation: 1987Reputation: 1987Reputation: 1987
Well I was originally trying to come up with a solution that would deliver your value on one read of the file, but as we do not know if another entry will be found until we hit it we may
not be able to replace the line where it is in the file (is this a problem?). My idea would be to place the entry at the end of the file, ie last line ... is this any good?

Other wise the multi-pass idea you are presently using would have to do.
Here is another idea on the multi-pass:
Code:
l=$(awk /H0019/{x=NR}END{print x}' file)

sed -i "1,$((l-1)){/H0019/d};$l,${/H0019/s/.$/1/}" file
 
Old 11-04-2012, 01:20 PM   #9
marky9074
Member
 
Registered: Nov 2003
Posts: 43

Original Poster
Rep: Reputation: 15
As awk returns the line we want to keep, and the starting row is always static, coould we then get it in a single pass, by deleting the unwanted rows prior to sed for substituting the text?
 
Old 11-04-2012, 01:46 PM   #10
David the H.
Bash Guru
 
Registered: Jun 2004
Location: Osaka, Japan
Distribution: Debian sid + kde 3.5 & 4.4
Posts: 6,823

Rep: Reputation: 1950Reputation: 1950Reputation: 1950Reputation: 1950Reputation: 1950Reputation: 1950Reputation: 1950Reputation: 1950Reputation: 1950Reputation: 1950Reputation: 1950
Since sed's workflow is one-way, probably the easiest thing to do is start working from the end of the file.

Code:
tac file | sed '0,/H0019/! { /H0019/d } ; /H0019/ s/[0-9]\+$/1/' | tac
tac prints the lines of the file in reverse order. The first sed expression then ignores (!) everything from the start of the (reversed) file to the first desired entry, and deletes any found in the rest of it. The second expression modifies the one line remaining. Finally just re-reverse the file with another tac command.


Edit: Here's another option I came up with that uses ed. It's a bit clunky, but at least only a single command is required. There are probably other, better ways to do it.

Code:
printf '%s\n' '?H0019? s/[0-9]\+$/1#/' 'g/H0019.*[^#]$/ d' '/H0019.*#$/ s/#$//' '%p' | ed -s file
'?..?' is like '/../', except that it searches backwards through the file. Since ed starts with the last line as the working line, it means that it will match the last entry in the file. We then modify it to end with '1#'.

Next, we globally delete all lines that match the pattern, except the one with the '#' at the end.

Finally we remove the '#' from the remaining line and output the result. '%p' prints the entire file to stdout. Change it to 'w' to write the changes back to the original file.


How to use ed:
http://wiki.bash-hackers.org/howto/edit-ed
http://snap.nlc.dcccd.edu/learn/nlc/ed.html
(also read the info page)

Last edited by David the H.; 11-04-2012 at 02:15 PM. Reason: as stated
 
Old 11-04-2012, 03:16 PM   #11
marky9074
Member
 
Registered: Nov 2003
Posts: 43

Original Poster
Rep: Reputation: 15
Interesting, the tac option just seemed to vape all H0019 lines in my example....

Edit: If I change it to:

Code:
tac file | sed '1,/H0019/! { /H0019/d } ; /H0019/ s/[0-9]\+$/1/' | tac
Adding the '1' after sed, it keeps the line I want, but doesnt rename it..

Ahh, I see my example was wrong initially.. I've updated it in the original post.

I'm playing with the ed one now, but it is complaining at the end (I am using busybox/mobaxterm) about file not found.. what is the -s switch for?

Last edited by marky9074; 11-04-2012 at 05:10 PM.
 
Old 11-05-2012, 12:04 PM   #12
grail
Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 7,692

Rep: Reputation: 1987Reputation: 1987Reputation: 1987Reputation: 1987Reputation: 1987Reputation: 1987Reputation: 1987Reputation: 1987Reputation: 1987Reputation: 1987Reputation: 1987
hmmm ... so I am confused again, mainly by the example data (which has now changed).

Your example seems to imply that all lines that contain H0019 will be consecutive. Is this correct?

If we assume only 5 lines of your new example, could it perhaps look like the following:
Code:
blah blah
foo bar
H0019 1   1 06466745.12N 00513800.45E                                           
H0019 1   2 06467464.46N 00512968.25E                                           
H0019 1   3 06467783.25N 00512599.43E                                           
H0019 1   4 06467963.08N 00512391.38E                                           
H0019 1   5 06468682.42N 00511559.18E
more stuff here
and here
If above is likely then the following awk would create a new file with relevant data:
Code:
awk '/^H0019/{x=1;$3=1;l=$0}x && !/^H0019/{print l;x=0}!x' old_file > new_file
 
Old 11-05-2012, 01:07 PM   #13
marky9074
Member
 
Registered: Nov 2003
Posts: 43

Original Poster
Rep: Reputation: 15
Hi there,

Yes sorry about that, I didn't realise I had messed up my example, so reposted it with a little bit more detail, but your correct, stuff above and below.

Will try awk now!

Edit: Ok that works, but the whole line is 80 characters (padded by spaces to the end), and the substitution just has the data part (and has shuffled up a couple of characters at the start). That said, the part after the E is always the same number of characters, so should be easy to pad out?

Thanks,

Mark

Last edited by marky9074; 11-05-2012 at 05:17 PM.
 
Old 11-06-2012, 10:29 AM   #14
grail
Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 7,692

Rep: Reputation: 1987Reputation: 1987Reputation: 1987Reputation: 1987Reputation: 1987Reputation: 1987Reputation: 1987Reputation: 1987Reputation: 1987Reputation: 1987Reputation: 1987
Yes awk has a printf statement so you can have the output as you prefer.
 
Old 11-06-2012, 10:58 AM   #15
David the H.
Bash Guru
 
Registered: Jun 2004
Location: Osaka, Japan
Distribution: Debian sid + kde 3.5 & 4.4
Posts: 6,823

Rep: Reputation: 1950Reputation: 1950Reputation: 1950Reputation: 1950Reputation: 1950Reputation: 1950Reputation: 1950Reputation: 1950Reputation: 1950Reputation: 1950Reputation: 1950
Quote:
Originally Posted by marky9074 View Post
Adding the '1' after sed, it keeps the line I want, but doesnt rename it..
This is where is becomes important to state the environment you're using, if it's non-standard in some way. The '0' address is a gnu addition to sed (it allows an address range to work even if the 2nd pattern appears on line one), and is likely not available in the busybox implementation, which generally strips its commands down to only the most basic features.

Quote:
I'm playing with the ed one now, but it is complaining at the end (I am using busybox/mobaxterm) about file not found.. what is the -s switch for?
-s is the "silent" option. It simply allows you to feed scripted commands into it without getting unnecessary feedback.

Again though, you'll need to check the busybox documentation to see what features its versions of the commands support.

After seeing your revised input data, here's an update to my ed command too.

Code:
printf '%s\n' '?H0019? s/\(.\{8\}\).../#@\1  1/' 'g/^H0019/d' '/^#@/ s///' '%p' | ed -s filename
Since the column positions now appear to be fixed, it becomes easier. I changed the first command so that it simply matches the first 11 columns of the line. Then it keeps the first 8 and replaces the last three with ' 1'. It also adds the unique string #@ to the beginning of the line this time.

The second expression can now simply globally delete all lines that match the pattern, except for the one with #@ on it, naturally. The third command again follows up by removing the extra string, only now it too can be greatly simplified.

Last edited by David the H.; 11-06-2012 at 10:59 AM. Reason: fixed tags
 
1 members found this post helpful.
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] grep/sed/awk multi line array earthgecko Programming 13 10-08-2012 02:06 PM
Remove first character with sed and grep r_jr Programming 11 02-04-2012 04:37 AM
Help to remove script data from file using grep & sed djlane Programming 1 07-13-2010 09:10 AM
bash script with grep and sed: sed getting filenames from grep odysseus.lost Programming 1 07-17-2006 12:36 PM
sed & grep script? dolvmin Linux - Software 20 09-22-2003 07:30 AM


All times are GMT -5. The time now is 06:40 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration