LinuxQuestions.org
Welcome to the most active Linux Forum on the web.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 09-27-2012, 09:24 AM   #1
white.horse
LQ Newbie
 
Registered: Sep 2012
Posts: 3

Rep: Reputation: Disabled
Idelete a row based on a mulitple field match


Good day,

I have an ascii file with approximately 15,000 rows. It has 121 columns that are not delimited in any way and I can't change that. The files are setup that way for a reason.
I need to delete all rows where columns 81-84 equal 8020.

Thanks in advance
 
Old 09-27-2012, 10:36 AM   #2
TB0ne
LQ Guru
 
Registered: Jul 2003
Location: Birmingham, Alabama
Distribution: SuSE, RedHat, Slack,CentOS
Posts: 18,811

Rep: Reputation: 4190Reputation: 4190Reputation: 4190Reputation: 4190Reputation: 4190Reputation: 4190Reputation: 4190Reputation: 4190Reputation: 4190Reputation: 4190Reputation: 4190
Quote:
Originally Posted by white.horse View Post
Good day,
I have an ascii file with approximately 15,000 rows. It has 121 columns that are not delimited in any way and I can't change that. The files are setup that way for a reason.

I need to delete all rows where columns 81-84 equal 8020.
Ok....but you don't give us an example of the data, nor what you've done/tried so far. A suggestion would be to look at the man page for the 'cut' command, and write a bash script that will look at columns 81-84, and check if it's 8020, then delete that line if so. Or you could check Google or this very site for script examples:


http://www.linuxquestions.org/questi...exists-197539/
 
Old 09-27-2012, 10:50 AM   #3
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 9,496

Rep: Reputation: 2867Reputation: 2867Reputation: 2867Reputation: 2867Reputation: 2867Reputation: 2867Reputation: 2867Reputation: 2867Reputation: 2867Reputation: 2867Reputation: 2867
Quote:
It has 121 columns that are not delimited in any way
I find this to be a curious statement. How do you know there are 121 columns if there is no delimiter?
 
Old 09-27-2012, 11:09 AM   #4
jschiwal
LQ Guru
 
Registered: Aug 2001
Location: Fargo, ND
Distribution: SuSE AMD64
Posts: 15,733

Rep: Reputation: 671Reputation: 671Reputation: 671Reputation: 671Reputation: 671Reputation: 671
I believe that the file has 121 characters per line rather than a record with 121 items per row.

Code:
grep -Ev '^.{80}8020'  file >newfile
This command filters out lines with the pattern you gave.

Last edited by jschiwal; 09-27-2012 at 11:10 AM.
 
Old 09-27-2012, 11:23 AM   #5
white.horse
LQ Newbie
 
Registered: Sep 2012
Posts: 3

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by white.horse View Post
Good day,

I have an ascii file with approximately 15,000 rows. It has 121 columns that are not delimited in any way and I can't change that. The files are setup that way for a reason.
I need to delete all rows where columns 81-84 equal 8020.

Thanks in advance

Sorry for not including the data. I have tried google and this Forum. The data looks like this and 8020 can occur in other places throughout the file. I am only looking for 8020 in columns 81-84. I call these columns because there is a spec sheet associated with this data and the 4 columns 81-84 represent a species. I've tried several things. The closest I've come is using `cut -c 81-84` in a variable and then looping through the file to find this variable. This does separate the data I'm looking for correctly. Now when I find 8020 I need to move the whole line to a new file and delete it from the old file. I want the old file minus all the 8020 species. Writing the deletes to a new file will give me a log of which lines were removed.

Data looks like [partial file]:

634417061120424017093PL33997465 19480110061067006540690 0464450156019600610432000003000000544502560100101
634417061120424017093PL33997465 19480110061067006540690 0464450156019600610444000009000025644502560100101
634417061120424017093PL33997465 19480110061067006540690 0464450156019600610453000001000000644502560100101
634417061120424017093PL33997465 19480110061067006540690 0464450156019600610478000025000014344502560100101
634417061120424017093PL33997465 19480110061067006540690 0464450156019600610853000004000000444502560100101
634417061120424017093PL33997465 19480110061067006540690 0464450156019600610890000009000008944502560100101
634417061120424017093PL33997465 19480110061067006540690 0464450156019600610892000008000060444502560100101
634417061120424017093PL33997465 19480110061067006540690 0464450156019600610969000002000001044502560100101
634417061120424017093PL33997465 19480110061067006540690 0464450156019600612040000000000008044502560100101
634417061120424017093PL33997465 19480110061067006540690 0464450156019600612165000006000018244502560100101
634417061120424017093PL33997465 19480110061067006540690 0464450156019600614545000002000000544502560100101
634417061120424017093PL33997465 19480110061067006540690 0464450156019600618020000000000012544502560100101
634417061120424017093PL33997465 19480110061067006540690 0464450156019600618379000020000006844502560100101
634417061120424017093PL33997465 19480110061067006540690 0464450156019600618447000004000000444502560100101
634417061120424017093PL33997465 19480110061067006540690 0464450156019600618448000012000000444502560100101
634417061120424017093PL33997465 19480110061067006540690 0464450156019600618479000001000001044502560100101
634417061120424017093PL33997465 19480110061067006540690 0464450156019600618497000001000001544502560100101
634417061120424017093PL33997465 19480110061067006540690 0464450156019600618553000000000000844502560100101
634417061120424017093PL33997465 19480110061067006540690 0464450156019600618914000204000012944502560100101
634417061120424017093PL33997465 19480110061067006540690 0464450156019600619983000000000000844502560100101
634417061120424017093PL33997465 19480110061067006540690 0464450156019600619987000000000000444502560100101
534417062120425013193PM32994464 00120150081009300930093 003451015537560061 45108553710101
634417062120425013193PM32994464 00120150081009300930093 0034510155375600610090000039000725845108553710101
 
Old 09-27-2012, 12:51 PM   #6
Fred-1.2.13
Member
 
Registered: Jan 2006
Location: Midwest USA
Distribution: Started with Slackware - 3.0 1995 Kernel 1.2.13 - Now Slackware 14.2 RC 2. Also some FreeBSD.
Posts: 116

Rep: Reputation: 52
I'm not sure it this is something you want to automate, or something you need to do over and over.... but based on the supplied data this is trivial to do in Excel (I see you are running Windows).

The last two lines don't match up, don't know if this is just a issue when you posted the data, but if the data varies like the last two lines it will make things more tricky either way you try and do this. but again, simple enough in Excel.

If this was my data I would start by removing the spaces (or replace them with a place holder like a #), then open the text file in Excel as a fixed length file, when prompted I would create a line break where the 8020 appears then finally save the whole thing as as text (you can convert it back to numbers later). Now you will have two columns, sort by column B and you can remove the lines with 8020.

Anyway, there are other ways to do this in Excel and simple ways to concatenate the data back together, but I am already WAY off topic for a Linux forum!! :-)

EDIT: Ignore the above Excel rambling and see grail's post below and my confirmation below that!

Last edited by Fred-1.2.13; 09-27-2012 at 01:30 PM.
 
Old 09-27-2012, 01:10 PM   #7
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 9,496

Rep: Reputation: 2867Reputation: 2867Reputation: 2867Reputation: 2867Reputation: 2867Reputation: 2867Reputation: 2867Reputation: 2867Reputation: 2867Reputation: 2867Reputation: 2867
So I get it that this is partial data, however, might it not have been an idea to include at least one line to have 8020 in it at the correct spot?
Performing your cut on the example would suggest this is not the case.

But I would think an easy way could be:
Code:
awk -F "" '$81$82$83$84 == 8020{print > "log";next}1' original > new

Last edited by grail; 09-27-2012 at 01:11 PM.
 
2 members found this post helpful.
Old 09-27-2012, 01:16 PM   #8
Fred-1.2.13
Member
 
Registered: Jan 2006
Location: Midwest USA
Distribution: Started with Slackware - 3.0 1995 Kernel 1.2.13 - Now Slackware 14.2 RC 2. Also some FreeBSD.
Posts: 116

Rep: Reputation: 52
Quote:
Originally Posted by grail View Post
So I get it that this is partial data, however, might it not have been an idea to include at least one line to have 8020 in it at the correct spot?
I found a 8020 on columns 75-78 and used it when I played with it...

EDIT:

I took out the spaces and the last two lines,


634417061120424017093PL33997465194801100610670065406900464450156019600610432000003000000544502560100 101
634417061120424017093PL33997465194801100610670065406900464450156019600610444000009000025644502560100 101
634417061120424017093PL33997465194801100610670065406900464450156019600610453000001000000644502560100 101
634417061120424017093PL33997465194801100610670065406900464450156019600610478000025000014344502560100 101
634417061120424017093PL33997465194801100610670065406900464450156019600610853000004000000444502560100 101
634417061120424017093PL33997465194801100610670065406900464450156019600610890000009000008944502560100 101
634417061120424017093PL33997465194801100610670065406900464450156019600610892000008000060444502560100 101
634417061120424017093PL33997465194801100610670065406900464450156019600610969000002000001044502560100 101
634417061120424017093PL33997465194801100610670065406900464450156019600612040000000000008044502560100 101
634417061120424017093PL33997465194801100610670065406900464450156019600612165000006000018244502560100 101
634417061120424017093PL33997465194801100610670065406900464450156019600614545000002000000544502560100 101
634417061120424017093PL33997465194801100610670065406900464450156019600618020000000000012544502560100 101
634417061120424017093PL33997465194801100610670065406900464450156019600618379000020000006844502560100 101
634417061120424017093PL33997465194801100610670065406900464450156019600618447000004000000444502560100 101
634417061120424017093PL33997465194801100610670065406900464450156019600618448000012000000444502560100 101
634417061120424017093PL33997465194801100610670065406900464450156019600618479000001000001044502560100 101
634417061120424017093PL33997465194801100610670065406900464450156019600618497000001000001544502560100 101
634417061120424017093PL33997465194801100610670065406900464450156019600618553000000000000844502560100 101
634417061120424017093PL33997465194801100610670065406900464450156019600618914000204000012944502560100 101
634417061120424017093PL33997465194801100610670065406900464450156019600619983000000000000844502560100 101
634417061120424017093PL33997465194801100610670065406900464450156019600619987000000000000444502560100 101

then ran:

Code:
awk -F "" '$73$74$75$76 == 8020{print > "log";next}1' original > new
And WOW but did that work sweet!!! I will have to add this awk script to my bag of tricks!! Thanks grail!! And thanks white.horse for asking the question!

Last edited by Fred-1.2.13; 09-27-2012 at 03:12 PM.
 
Old 09-27-2012, 03:33 PM   #9
white.horse
LQ Newbie
 
Registered: Sep 2012
Posts: 3

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by white.horse View Post
Good day,

I have an ascii file with approximately 15,000 rows. It has 121 columns that are not delimited in any way and I can't change that. The files are setup that way for a reason.
I need to delete all rows where columns 81-84 equal 8020.

Thanks in advance
Hi grail,

Thanks so much. Your code worked perfect. Sorry about the 8020 not being in columns 81-84. I tried to capture some data with it in the proper columns and also in different columns elsewhere. I must have copied too much. The last lines got scrambled and the 8020 in columns 81-84 didn't post.

Thanks again
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] Need help with awk and two csv: field match / substitution archduke83 Linux - Newbie 9 04-27-2012 11:49 AM
[SOLVED] Comparing two files and looking for the same line based on field - awk sopier Programming 8 12-26-2011 02:53 PM
Warning: [fnn_insert] Column count doesn't match value count at row 1 in bondoq Programming 2 09-27-2011 04:11 PM
DBD::mysql::st execute failed: Column count doesn't match value count at row 1 shifter Programming 2 02-24-2010 07:42 PM
How to Fetch Particular row value of field ts7300 Debian 4 10-09-2009 05:53 AM


All times are GMT -5. The time now is 05:07 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration