LinuxQuestions.org
Latest LQ Deal: Latest LQ Deals
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 01-12-2017, 07:24 AM   #1
azheruddin
Member
 
Registered: Dec 2011
Posts: 91
Blog Entries: 1

Rep: Reputation: Disabled
To extract certain columns with only header


Hi,
I have data which has many columns but i need to extract only 5 columns. three columns are direct extract but for two columns logic needs to be applied before getting output.
I have pasted sample content of the files.

1. I need to pick only those record under MOVEMENT CODE 26 which is in header section of each record other than MOVEMENT CODE 26 should not come in output.

2. only 2 digits needs to display from DOC NO.

3.To display DDMM colmmn also but with year in file records only date and month i.e 05 date and month 12 but year also needs be appended.
year logic is to read from the file name.
file name = R950CMA_01JAN15 so data is for the past month means dec14 so year is 14 in a same way go on..

so total 5 column needs to be extracted ( ORD NO,P A R T N U M B E R,DDMM,INV NO / SER NO,DOC NO).

Input file content:
Code:
1
     LIST 530                         Jeffron aliff system
     MASTERS                                       ISSUE    REPORT (MONTHLY)               01JAN15      PAGE      1
0    MOVEMENT CODE  26     
0     A/C NO   1
0    BCK     P A R T  N U M B E R    KEYWORD      J A N     ORD NO   DOC NO DDMM PGC / SER NO VENDOR GB     Q T Y   
 ------------ ------------------------ -------- ------------ --------- ------ ---- --------------- ------ -- ---------- 
 424009024545 GBGTGFRVKSNMSSSM         DMNMDB PE              490117701 110024 0112 8916            280819 58      20.0        
 424009020858 3GTUJ77                  D DJKND              490118901 110025 0112 HTV          280799 29       9.0      
 424009024756 DMDJKAYTTUUVNN           DNDMNDn             490126001 110026 0112 1412008         280477 01      10.0        
 
 1
     LIST 732                         Jeffron aliff system
     MASTERS                                       ISSUE   ANALYSIS   REPORT (MONTHLY)               01JAN15      PAGE      2
0    MOVEMENT CODE  27     
0    S A/C NO   140
0    I I C     P A R T  N U M B E R    KEYWORD      J A N     ORD NO   DOC NO DDMM PGC / SER NO VENDOR UM     Q T Y   
 ------------ ------------------------ -------- ------------ --------- ------ ---- --------------- ------ -- ---------- 
 440101000437 GBYI12                    FMNDMMD                421755201 397185 0512 3027490         D24170 15       2.0      
 440101000578 VGTYUMNBVYIIOPMMN         NMBDMNBd               421752701 361076 0512 15668           S22112 15       5.0      
 424009025224 JNDKNMNSDMHYTUBK          FMNBD                  490127301 110047 0512 111931          290040 01      20.0      
 
1
     LIST 673                        Jeffron aliff system
     MASTERS                                       ISSUE   ANALYSIS   REPORT (MONTHLY)               01JAN15      PAGE      3
0    MOVEMENT CODE  26     STORING OF ORDERED MATERIAL
0    S A/C NO   14
0    I I C     P A R T  N U M B E R    KEYWORD      J A N     ORD NO   DOC NO DDMM PGC / SER NO VENDOR UM     Q T Y   
 ------------ ------------------------ -------- ------------ --------- ------ ---- --------------- ------ -- ---------- 
 440006000061 DBNMMDNnDBJBD            DNBVNBDV              421766901 397205 1212 3029548         D24170 39       1.0       
 440002000198 DBNMDBMN                 DJHDMND               421766902 397206 1212 3029548         D24170 39       2.0       
 440001000246 DNMMNDMNBDNDDB           NMDBNMDN              421750101 381910 1212 D01341MY1       280911 15      50.0


OUTPUT:
Code:
P A R T  N U M B E R		ORD NO		DOC NO 		DDMM 	PGC / SER NO 
 GBGTGFRVKSNMSSSM		490117701	11	    011214		8916
 3GTUJ77			490118901		    011214		HTV
 DMDJKAYTTUUVNN			490126001	11	    011214		1412008
 DBNMMDNnDBJBD			421766901	39	    121214		3029548
 DBNMDBMN			421766902	39	    121214		3029548
 DNMMNDMNBDNDDB			421750101	38	    121214		3029548
all records which needs to be pick have standard 1 space in beginning.

Appreciate your assistance by AWK.

Last edited by azheruddin; 01-12-2017 at 07:27 AM.
 
Old 01-12-2017, 08:03 AM   #2
rtmistler
Moderator
 
Registered: Mar 2011
Location: USA
Distribution: MINT Debian, Angstrom, SUSE, Ubuntu, Debian
Posts: 8,422
Blog Entries: 13

Rep: Reputation: 3740Reputation: 3740Reputation: 3740Reputation: 3740Reputation: 3740Reputation: 3740Reputation: 3740Reputation: 3740Reputation: 3740Reputation: 3740Reputation: 3740
Hi azheruddin,

If your preference is to do this by awk, then you should have already made some attempt to do this.

Please post what you have so far for others to review and offer their suggestions towards.

As you know from your 6 years with LinuxQuestions.org, LQ is volunteer members who are here to help you learn the information you need to accomplish your goals versus an on-demand script provider.
 
Old 01-12-2017, 08:18 AM   #3
Turbocapitalist
Senior Member
 
Registered: Apr 2005
Distribution: Linux Mint, Devuan, OpenBSD
Posts: 4,428
Blog Entries: 3

Rep: Reputation: 2205Reputation: 2205Reputation: 2205Reputation: 2205Reputation: 2205Reputation: 2205Reputation: 2205Reputation: 2205Reputation: 2205Reputation: 2205Reputation: 2205
awk has some built-in variables like FNR and NR which can be used to change what the script does. So before a certain line number, you can print one set of data, after that you can print the five columns you want.
 
Old 01-12-2017, 10:42 AM   #4
azheruddin
Member
 
Registered: Dec 2011
Posts: 91

Original Poster
Blog Entries: 1

Rep: Reputation: Disabled
Hi,
awk '
/^1/ {SKIP = NR + 5 + HDFND
}
NR < SKIP {next
}
!HDFND {MX = split (COLUMNS, HD, ",")
for (i=1; i<=MX; i++) {match ($0, HD[i] " *")
P[i] = RSTART
L[i] = RLENGTH
}
HDFND = 2
}
{for (i=1; i<=MX; i++) printf "%s ", substr ($0, P[i], L[i])
printf RS
}
' COLUMNS=" ORD NO, P A R T N U M B E R,INV NO / SER NO" file

Got output as below.


Code:
ORD NO     P A R T  N U M B E R     PGC / SER NO  
---------  ------------------------  ---------------  
490117701  PEF0AM1MX2MX40MM          8916             
490118901  3M7447                    SM0883           
490126001  SAFETYC0NE30IN            1412008          
490105304  C0TT0NRAG                 13264            
490121901  ARINUS0940                D01100477        
421751301  8W550C3                   1557             
421755201  BR127                     3027490
now need to modify the code as per the additional requirement which as below.
1. records should be picked only for the movement code 26 as mentioned in top of every record header ,can be hard coded.
2. addition column only 2 digits needs to display from DOC NO so as per below record 39,36,11,11
3. To display DDMM colmmn also but with year in file records only date and month i.e 05 date and month 12 but year also needs be appended.
year logic is to read from the file name.
file name = R950CMA_01JAN15 so data is for the past month means dec14 so year is 14 in a same way go on..

expected output:

Code:
P A R T  N U M B E R		ORD NO		DOC NO 		DDMM 	PGC / SER NO 
 GBGTGFRVKSNMSSSM		490117701	11	    011214		8916
 3GTUJ77			490118901		    011214		HTV
 DMDJKAYTTUUVNN			490126001	11	    011214		1412008
 DBNMMDNnDBJBD			421766901	39	    121214		3029548
 DBNMDBMN			421766902	39	    121214		3029548
 DNMMNDMNBDNDDB			421750101	38	    121214		3029548
 
Old 01-12-2017, 11:40 AM   #5
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 9,769

Rep: Reputation: 3052Reputation: 3052Reputation: 3052Reputation: 3052Reputation: 3052Reputation: 3052Reputation: 3052Reputation: 3052Reputation: 3052Reputation: 3052Reputation: 3052
Have to say, the last post made me laugh. You have gone to extraordinary lengths to put all your data in code tags, but when it came to your actual code you left it twisting in the breeze so
there is zero formatting :lol:

Looking at you further requirements:

1. Check line contains the words :- /MOVEMENT CODE 26/, if not found, continue until the next blank line and start searching again

2. Look at substr function to get just the digits required

3. You seem to tell us that a file is of the format R950CMA_01JAN15, but in your code example it is simply called "file". I will assume this was a typo and advise you can either use the date functions provided or design your own function to perhaps use an array to retrieve the previous month and year
 
Old 01-12-2017, 01:23 PM   #6
rtmistler
Moderator
 
Registered: Mar 2011
Location: USA
Distribution: MINT Debian, Angstrom, SUSE, Ubuntu, Debian
Posts: 8,422
Blog Entries: 13

Rep: Reputation: 3740Reputation: 3740Reputation: 3740Reputation: 3740Reputation: 3740Reputation: 3740Reputation: 3740Reputation: 3740Reputation: 3740Reputation: 3740Reputation: 3740
Quote:
Originally Posted by azheruddin View Post
H]
Code:
awk '
/^1/            {SKIP = NR + 5 + HDFND
                }
NR < SKIP       {next
                }
!HDFND          {MX = split (COLUMNS, HD, ",")
                 for (i=1; i<=MX; i++)  {match ($0, HD[i] " *")
                                         P[i] = RSTART
                                         L[i] = RLENGTH
                                        }
                 HDFND = 2
                }
                {for (i=1; i<=MX; i++)  printf "%s ", substr ($0, P[i], L[i])
                 printf RS
                }
' COLUMNS=" ORD NO, P A R T  N U M B E R,INV NO / SER NO" file
Awk is not really my strength, however can't you instead use some form of print and $<argument> within an awk statement to resequence fields you've grabbed from a line? That seems easier.
Quote:
Originally Posted by grail View Post
Have to say, the last post made me laugh. You have gone to extraordinary lengths to put all your data in code tags, but when it came to your actual code you left it twisting in the breeze so there is zero formatting.
Valid, and what I noticed was that the data is not the same between each example, some of which does and doesn't match the sample input from the first post.

I do realize it is just example data for the question, however the discontinuity confused me as to what was being sought.

Still feel that using awk to process a line and then print out the fields from the line in the sequence you desire is more desirable of a solution.

From something I googled:
Code:
NR==1
{
     a=$1
     b=$2
     c=$3
}
{
     print "Blah"
}
END
{
     print "First Login:", a, b, c  RS "Last Login:", $1, $2, $3
}
And then just resequence those instead once you do the assignment.
 
Old 01-12-2017, 02:03 PM   #7
Turbocapitalist
Senior Member
 
Registered: Apr 2005
Distribution: Linux Mint, Devuan, OpenBSD
Posts: 4,428
Blog Entries: 3

Rep: Reputation: 2205Reputation: 2205Reputation: 2205Reputation: 2205Reputation: 2205Reputation: 2205Reputation: 2205Reputation: 2205Reputation: 2205Reputation: 2205Reputation: 2205
[duplicate due to keybounce]

Last edited by Turbocapitalist; 01-12-2017 at 02:12 PM.
 
Old 01-12-2017, 02:12 PM   #8
Turbocapitalist
Senior Member
 
Registered: Apr 2005
Distribution: Linux Mint, Devuan, OpenBSD
Posts: 4,428
Blog Entries: 3

Rep: Reputation: 2205Reputation: 2205Reputation: 2205Reputation: 2205Reputation: 2205Reputation: 2205Reputation: 2205Reputation: 2205Reputation: 2205Reputation: 2205Reputation: 2205
Ok. I see the pattern now.

One way is to use a pair of flags and toggle them on and off, only printing when the both are set.

The sets are separated by empty lines (or at least lines with only whitespace).
You're in the right set when you have "MOVEMENT CODE 26" or whatever.
The data starts after the line with many dashes.

When you are in the right set and on a data line, print. Otherwise go to the next line of input.

Maybe that's the 7th-grade approach but it should work with awk and can be don with a lot of nexts and ifs.


However, the field KEYWORD will cause awk to choke with data like "DMNMDB PE" and "D DJKND" because of the spaces. Are those typos? Or are the main fields separated by tabs, I hope?
 
Old 01-12-2017, 07:28 PM   #9
allend
LQ 5k Club
 
Registered: Oct 2003
Location: Melbourne
Distribution: Slackware-current
Posts: 5,360

Rep: Reputation: 1974Reputation: 1974Reputation: 1974Reputation: 1974Reputation: 1974Reputation: 1974Reputation: 1974Reputation: 1974Reputation: 1974Reputation: 1974Reputation: 1974
Quote:
However, the field KEYWORD will cause awk to choke with data like "DMNMDB PE" and "D DJKND" because of the spaces. Are those typos?
If those are extraneous spaces, then perhaps this is starting to do what you want.
Code:
awk '/MOVEMENT.*26/ {getline;getline;getline;getline;while (NF==10) {print $2,$4,$5,$6,$7;getline}}' <Input file>
Obviously there is still work required to achieve desired formatting.
You can add a header to the output by adding a print command in a BEGIN rule.
Code:
BEGIN {OFS="\t";print "PART NUMBER","ORD NO","DOC NO","DDMM","PGC/SER NO"}
 
Old 01-13-2017, 02:20 AM   #10
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 9,769

Rep: Reputation: 3052Reputation: 3052Reputation: 3052Reputation: 3052Reputation: 3052Reputation: 3052Reputation: 3052Reputation: 3052Reputation: 3052Reputation: 3052Reputation: 3052
I agree with rtmistler that the data needs to be clarified, ie how is it separated? and things like the 'JAN' field, will these ever contain data?

You may opt to use something like ruby/perl/python and then use the unpack function to set fixed widths for each field
 
Old 01-14-2017, 12:22 AM   #11
azheruddin
Member
 
Registered: Dec 2011
Posts: 91

Original Poster
Blog Entries: 1

Rep: Reputation: Disabled
Hello Guys,
sorry for the delayed feedback.

lot of queries raised, here are the answers

Quote:
You seem to tell us that a file is of the format R950CMA_01JAN15, but in your code example it is simply called "file". I will assume this was a typo and advise you can either use the date functions provided or design your own function to perhaps use an array to retrieve the previous month and yea
r
filename always as would be in format R950CMA_01JAN15 accordingly -1 month need to manipulated, in sample it was just given as a sample

Quote:
Valid, and what I noticed was that the data is not the same between each example, some of which does and doesn't match the sample input from the first post.
discrepancy in data yes since have masked it before posting , but data format/pattern remains same.

Quote:
However, the field KEYWORD will cause awk to choke with data like "DMNMDB PE" and "D DJKND" because of the spaces. Are those typos? Or are the main fields separated by tabs, I hope?
"DMNMDB PE" and "D DJKND" no spaces in data it isvjust typo . data is properly arranged with specific delimiters.

Quote:
You're in the right set when you have "MOVEMENT CODE 26" or whatever.
The data starts after the line with many dashes.

When you are in the right set and on a data line, print. Otherwise go to the next line of input.
you are exactly understood.

Now somebody can help?
 
Old 01-14-2017, 02:15 AM   #12
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 9,769

Rep: Reputation: 3052Reputation: 3052Reputation: 3052Reputation: 3052Reputation: 3052Reputation: 3052Reputation: 3052Reputation: 3052Reputation: 3052Reputation: 3052Reputation: 3052
awk has a variable called FILENAME, so you can use that to get the data wanted. As previously advised you will need to take said data and manipulate it with either awk related functions or create your own.

Also the requirement for only part of one of the fields is done using substr.

As you have advised that bogus spaces will not be in real data, here is a quick knock up to get the data from current input:
Code:
#!/usr/bin/awk -f

BEGIN{
	mrs = " -[- ]+\n"
	rrs = "\n"

	RS = mrs
}

/CODE *26/{
	RS = rrs
	p  = 1
	next
}

!NF{
	RS = mrs
	p  = 0
}

p{
	print $2,$4,$5,$6,$7
}
This should at least get you started. I would also use allend's advise to simply put the header in the BEGIN
 
Old 01-14-2017, 12:04 PM   #13
azheruddin
Member
 
Registered: Dec 2011
Posts: 91

Original Poster
Blog Entries: 1

Rep: Reputation: Disabled
Hi ,
getting some issue to merge code with mine, can anyone send me the code to complete the complete requirement on top of my code or new one.
 
Old 01-14-2017, 01:49 PM   #14
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 9,769

Rep: Reputation: 3052Reputation: 3052Reputation: 3052Reputation: 3052Reputation: 3052Reputation: 3052Reputation: 3052Reputation: 3052Reputation: 3052Reputation: 3052Reputation: 3052
Ultimately you are asking if someone will do the job for you, the short answer is no.

I am not following why it is so complicated. If using my example code, run it, check the output and slowly add in the pieces that are missing that you want.
You will never get any more proficient at this if someone else is doing the work for you and you will simply end up back here next time asking someone to do your next task.

At least present what you have done and advise where you are stuck???
 
1 members found this post helpful.
Old 01-14-2017, 04:42 PM   #15
TB0ne
LQ Guru
 
Registered: Jul 2003
Location: Birmingham, Alabama
Distribution: SuSE, RedHat, Slack,CentOS
Posts: 22,308

Rep: Reputation: 6008Reputation: 6008Reputation: 6008Reputation: 6008Reputation: 6008Reputation: 6008Reputation: 6008Reputation: 6008Reputation: 6008Reputation: 6008Reputation: 6008
Quote:
Originally Posted by azheruddin View Post
Hi ,
getting some issue to merge code with mine, can anyone send me the code to complete the complete requirement on top of my code or new one.
Sounds familiar:
http://www.linuxquestions.org/questi...141/page2.html
http://www.linuxquestions.org/questi...rn-4175516737/
http://www.linuxquestions.org/questi...on-4175438073/
http://www.linuxquestions.org/questi...pt-4175440649/
http://www.linuxquestions.org/questi...pt-4175444590/
http://www.linuxquestions.org/questi...at-4175448079/
http://www.linuxquestions.org/questi...ng-4175509609/

After being here for six years now, and essentially being told many times that you have to write your own scripts, I'm with Grail here...show us your efforts FIRST, and tell us where you're stuck.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
build a header with columns prince0 Programming 3 05-30-2016 11:57 AM
merge columns from multiple files in a directory based on match of two columns prasanthi yanamala Linux - Newbie 2 11-12-2015 11:11 AM
extract middle of file - how to strip header/footer usao Linux - Newbie 2 05-05-2015 08:59 PM
How can I extract columns from a file without using awk or perl? KG425 Programming 13 06-06-2012 12:40 PM
BASH or AWK: extract columns in multiple files and combine to a single file cristalp Programming 2 03-15-2012 12:55 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie

All times are GMT -5. The time now is 07:09 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration