LinuxQuestions.org
Review your favorite Linux distribution.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 06-21-2013, 06:34 PM   #1
BludGeonT
LQ Newbie
 
Registered: Feb 2013
Posts: 4

Rep: Reputation: 1
Thumbs up sed or awk to remove $8 in a | delimited file if non-numerical value exists


Hello,

I have a question for you all, I have a text file that is delimited with | between fields. The file has roughly half a million lines in it. I am looking for the fastest way to remove the lines in this file or generate a new file if the pattern in field $8 is non-numerical.

I could write a for loop with an if statement to do this with cut, but it would take forever to run, so I'm looking for a good awk or sed one liner that could do this instead.

There are many ways to do this, either managing the file directly with a sed statement, or by awk'ing it and outputting everything to another file - but I'm asking the community on what they think the most efficient way to do this, please.


Your help is much appreciated.
 
Old 06-21-2013, 07:06 PM   #2
dayid
Member
 
Registered: Apr 2012
Location: Austin, TX
Posts: 44

Rep: Reputation: Disabled
So you want to keep the entire line if $8 is numerical or you want to remove those?

Do you need it to handle large numbers or just one or two digits?

This will be far easier if you paste an example of how the data is now and how you wish it to end up: e.g.,

"I have this:"
Code:
a | b | c | d | e | f | g | h | i | j
1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 0
a | b | c | d | e | f | g | h | i | j
1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 0
a | b | c | d | e | f | g | h | i | j
1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 0
a | b | c | d | e | f | g | h | i | j
1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 0
a | b | c | d | e | f | g | h | i | j
1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 0
a | b | c | d | e | f | g | h | i | j
1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 0
a | b | c | d | e | f | g | h | i | j
1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 0
a | b | c | d | e | f | g | h | i | j
1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 0
a | b | c | d | e | f | g | h | i | j
1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 0
"I want this:"
Code:
a | b | c | d | e | f | g | h | i | j
a | b | c | d | e | f | g | h | i | j
a | b | c | d | e | f | g | h | i | j
a | b | c | d | e | f | g | h | i | j
a | b | c | d | e | f | g | h | i | j
a | b | c | d | e | f | g | h | i | j
a | b | c | d | e | f | g | h | i | j
a | b | c | d | e | f | g | h | i | j
a | b | c | d | e | f | g | h | i | j
 
Old 06-21-2013, 07:13 PM   #3
BludGeonT
LQ Newbie
 
Registered: Feb 2013
Posts: 4

Original Poster
Rep: Reputation: 1
Dayid,

Actually we just figured it out - basically it was to exclude the line entirely if it contained anything other than strictly numbers (including alpha-numeric values), here is what we came up with which did the trick:

Code:
# cat oldfile.txt | awk -F \| ' $8 ~ /^[0-9]*$/ ' > newfile.txt
~400000 lines of text completed in less than 8 seconds.

Wanted to share this with everyone in the event someone else needed this to work.

I appreciate your help Dayid, and in the future I'll pose my questions with examples such as how you mentioned.

Thanks again

Last edited by BludGeonT; 06-21-2013 at 07:15 PM.
 
1 members found this post helpful.
Old 06-21-2013, 08:24 PM   #4
dayid
Member
 
Registered: Apr 2012
Location: Austin, TX
Posts: 44

Rep: Reputation: Disabled
Quote:
Originally Posted by BludGeonT View Post
Dayid,

Actually we just figured it out - basically it was to exclude the line entirely if it contained anything other than strictly numbers (including alpha-numeric values), here is what we came up with which did the trick:

Code:
# cat oldfile.txt | awk -F \| ' $8 ~ /^[0-9]*$/ ' > newfile.txt
Great solution, but just a FWIW, piping "cat" is a common error.

If you have to do it again skip 'cat' as a whole:
Code:
awk -F '|' ' $8 ~ /^[0-9]*$/ ' oldfile.txt > newfile.txt
Glad you got it to work.
 
Old 06-21-2013, 10:22 PM   #5
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,006

Rep: Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191
It may not be important, but remember that your current solution also allows for the eighth field to be empty, ie ||
 
Old 06-21-2013, 11:57 PM   #6
BludGeonT
LQ Newbie
 
Registered: Feb 2013
Posts: 4

Original Poster
Rep: Reputation: 1
Thanks again everyone for your insight and recommendations, the Linux community never ceases to amaze me with the combined knowledge, have a good one folks.
 
  


Reply

Tags
awk, matching, pattern, sed



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
remove a line from a comma delimited file that contains a single digit in position 2 j-me Linux - General 6 05-30-2013 08:25 AM
[SOLVED] sed/awk : remove section from file vrusu Linux - Newbie 3 10-26-2010 08:49 AM
Get a list of delimited filenames from a text file (sed?) Ksearch Linux - Newbie 14 06-30-2009 04:51 PM
bash script using sed/scp/ssh has issues with delimited file ScottThornley Programming 5 03-18-2009 03:45 PM
using sed to remove line in a comma-delimited file seefor Programming 4 03-10-2009 03:35 PM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 12:08 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration