To extract certain columns with only header

azheruddin · 01-12-2017, 06:24 AM

Hi,
I have data which has many columns but i need to extract only 5 columns. three columns are direct extract but for two columns logic needs to be applied before getting output.
I have pasted sample content of the files.

1. I need to pick only those record under MOVEMENT CODE 26 which is in header section of each record other than MOVEMENT CODE 26 should not come in output.

2. only 2 digits needs to display from DOC NO.

3.To display DDMM colmmn also but with year in file records only date and month i.e 05 date and month 12 but year also needs be appended.
year logic is to read from the file name.
file name = R950CMA_01JAN15 so data is for the past month means dec14 so year is 14 in a same way go on..

so total 5 column needs to be extracted ( ORD NO,P A R T N U M B E R,DDMM,INV NO / SER NO,DOC NO).

Input file content:

Code:

1
     LIST 530                         Jeffron aliff system
     MASTERS                                       ISSUE    REPORT (MONTHLY)               01JAN15      PAGE      1
0    MOVEMENT CODE  26     
0     A/C NO   1
0    BCK     P A R T  N U M B E R    KEYWORD      J A N     ORD NO   DOC NO DDMM PGC / SER NO VENDOR GB     Q T Y   
 ------------ ------------------------ -------- ------------ --------- ------ ---- --------------- ------ -- ---------- 
 424009024545 GBGTGFRVKSNMSSSM         DMNMDB PE              490117701 110024 0112 8916            280819 58      20.0        
 424009020858 3GTUJ77                  D DJKND              490118901 110025 0112 HTV          280799 29       9.0      
 424009024756 DMDJKAYTTUUVNN           DNDMNDn             490126001 110026 0112 1412008         280477 01      10.0        
 
 1
     LIST 732                         Jeffron aliff system
     MASTERS                                       ISSUE   ANALYSIS   REPORT (MONTHLY)               01JAN15      PAGE      2
0    MOVEMENT CODE  27     
0    S A/C NO   140
0    I I C     P A R T  N U M B E R    KEYWORD      J A N     ORD NO   DOC NO DDMM PGC / SER NO VENDOR UM     Q T Y   
 ------------ ------------------------ -------- ------------ --------- ------ ---- --------------- ------ -- ---------- 
 440101000437 GBYI12                    FMNDMMD                421755201 397185 0512 3027490         D24170 15       2.0      
 440101000578 VGTYUMNBVYIIOPMMN         NMBDMNBd               421752701 361076 0512 15668           S22112 15       5.0      
 424009025224 JNDKNMNSDMHYTUBK          FMNBD                  490127301 110047 0512 111931          290040 01      20.0      
 
1
     LIST 673                        Jeffron aliff system
     MASTERS                                       ISSUE   ANALYSIS   REPORT (MONTHLY)               01JAN15      PAGE      3
0    MOVEMENT CODE  26     STORING OF ORDERED MATERIAL
0    S A/C NO   14
0    I I C     P A R T  N U M B E R    KEYWORD      J A N     ORD NO   DOC NO DDMM PGC / SER NO VENDOR UM     Q T Y   
 ------------ ------------------------ -------- ------------ --------- ------ ---- --------------- ------ -- ---------- 
 440006000061 DBNMMDNnDBJBD            DNBVNBDV              421766901 397205 1212 3029548         D24170 39       1.0       
 440002000198 DBNMDBMN                 DJHDMND               421766902 397206 1212 3029548         D24170 39       2.0       
 440001000246 DNMMNDMNBDNDDB           NMDBNMDN              421750101 381910 1212 D01341MY1       280911 15      50.0

OUTPUT:

Code:

P A R T  N U M B E R		ORD NO		DOC NO 		DDMM 	PGC / SER NO 
 GBGTGFRVKSNMSSSM		490117701	11	    011214		8916
 3GTUJ77			490118901		    011214		HTV
 DMDJKAYTTUUVNN			490126001	11	    011214		1412008
 DBNMMDNnDBJBD			421766901	39	    121214		3029548
 DBNMDBMN			421766902	39	    121214		3029548
 DNMMNDMNBDNDDB			421750101	38	    121214		3029548

all records which needs to be pick have standard 1 space in beginning.

Appreciate your assistance by AWK.

rtmistler · 01-12-2017, 07:03 AM

Hi azheruddin,

If your preference is to do this by awk, then you should have already made some attempt to do this.

Please post what you have so far for others to review and offer their suggestions towards.

As you know from your 6 years with LinuxQuestions.org, LQ is volunteer members who are here to help you learn the information you need to accomplish your goals versus an on-demand script provider.

Turbocapitalist · 01-12-2017, 07:18 AM

awk has some built-in variables like FNR and NR which can be used to change what the script does. So before a certain line number, you can print one set of data, after that you can print the five columns you want.

azheruddin · 01-12-2017, 09:42 AM

Hi,
awk '
/^1/ {SKIP = NR + 5 + HDFND
}
NR < SKIP {next
}
!HDFND {MX = split (COLUMNS, HD, ",")
for (i=1; i<=MX; i++) {match ($0, HD[i] " *")
P[i] = RSTART
L[i] = RLENGTH
}
HDFND = 2
}
{for (i=1; i<=MX; i++) printf "%s ", substr ($0, P[i], L[i])
printf RS
}
' COLUMNS=" ORD NO, P A R T N U M B E R,INV NO / SER NO" file

Got output as below.

Code:

ORD NO     P A R T  N U M B E R     PGC / SER NO  
---------  ------------------------  ---------------  
490117701  PEF0AM1MX2MX40MM          8916             
490118901  3M7447                    SM0883           
490126001  SAFETYC0NE30IN            1412008          
490105304  C0TT0NRAG                 13264            
490121901  ARINUS0940                D01100477        
421751301  8W550C3                   1557             
421755201  BR127                     3027490

now need to modify the code as per the additional requirement which as below.
1. records should be picked only for the movement code 26 as mentioned in top of every record header ,can be hard coded.
2. addition column only 2 digits needs to display from DOC NO so as per below record 39,36,11,11
3. To display DDMM colmmn also but with year in file records only date and month i.e 05 date and month 12 but year also needs be appended.
year logic is to read from the file name.
file name = R950CMA_01JAN15 so data is for the past month means dec14 so year is 14 in a same way go on..

expected output:

Code:

P A R T  N U M B E R		ORD NO		DOC NO 		DDMM 	PGC / SER NO 
 GBGTGFRVKSNMSSSM		490117701	11	    011214		8916
 3GTUJ77			490118901		    011214		HTV
 DMDJKAYTTUUVNN			490126001	11	    011214		1412008
 DBNMMDNnDBJBD			421766901	39	    121214		3029548
 DBNMDBMN			421766902	39	    121214		3029548
 DNMMNDMNBDNDDB			421750101	38	    121214		3029548

grail · 01-12-2017, 10:40 AM

Have to say, the last post made me laugh. You have gone to extraordinary lengths to put all your data in code tags, but when it came to your actual code you left it twisting in the breeze so
there is zero formatting :lol:

Looking at you further requirements:

1. Check line contains the words :- /MOVEMENT CODE 26/, if not found, continue until the next blank line and start searching again

2. Look at substr function to get just the digits required

3. You seem to tell us that a file is of the format R950CMA_01JAN15, but in your code example it is simply called "file". I will assume this was a typo and advise you can either use the date functions provided or design your own function to perhaps use an array to retrieve the previous month and year

rtmistler · 01-12-2017, 12:23 PM

Quote:

Originally Posted by azheruddin

H]

Code:

awk '
/^1/            {SKIP = NR + 5 + HDFND
                }
NR < SKIP       {next
                }
!HDFND          {MX = split (COLUMNS, HD, ",")
                 for (i=1; i<=MX; i++)  {match ($0, HD[i] " *")
                                         P[i] = RSTART
                                         L[i] = RLENGTH
                                        }
                 HDFND = 2
                }
                {for (i=1; i<=MX; i++)  printf "%s ", substr ($0, P[i], L[i])
                 printf RS
                }
' COLUMNS=" ORD NO, P A R T  N U M B E R,INV NO / SER NO" file

Awk is not really my strength, however can't you instead use some form of print and $<argument> within an awk statement to resequence fields you've grabbed from a line? That seems easier.

Quote:

Originally Posted by grail

Have to say, the last post made me laugh. You have gone to extraordinary lengths to put all your data in code tags, but when it came to your actual code you left it twisting in the breeze so there is zero formatting.

Valid, and what I noticed was that the data is not the same between each example, some of which does and doesn't match the sample input from the first post.

I do realize it is just example data for the question, however the discontinuity confused me as to what was being sought.

Still feel that using awk to process a line and then print out the fields from the line in the sequence you desire is more desirable of a solution.

From something I googled:

Code:

NR==1
{
     a=$1
     b=$2
     c=$3
}
{
     print "Blah"
}
END
{
     print "First Login:", a, b, c  RS "Last Login:", $1, $2, $3
}

And then just resequence those instead once you do the assignment.

Turbocapitalist · 01-12-2017, 01:03 PM

[duplicate due to keybounce]

Turbocapitalist · 01-12-2017, 01:12 PM

Ok. I see the pattern now.

One way is to use a pair of flags and toggle them on and off, only printing when the both are set.

The sets are separated by empty lines (or at least lines with only whitespace).
You're in the right set when you have "MOVEMENT CODE 26" or whatever.
The data starts after the line with many dashes.

When you are in the right set and on a data line, print. Otherwise go to the next line of input.

Maybe that's the 7th-grade approach but it should work with awk and can be don with a lot of nexts and ifs.

However, the field KEYWORD will cause awk to choke with data like "DMNMDB PE" and "D DJKND" because of the spaces. Are those typos? Or are the main fields separated by tabs, I hope?

allend · 01-12-2017, 06:28 PM

Quote:

However, the field KEYWORD will cause awk to choke with data like "DMNMDB PE" and "D DJKND" because of the spaces. Are those typos?

If those are extraneous spaces, then perhaps this is starting to do what you want.

Code:

awk '/MOVEMENT.*26/ {getline;getline;getline;getline;while (NF==10) {print $2,$4,$5,$6,$7;getline}}' <Input file>

Obviously there is still work required to achieve desired formatting.
You can add a header to the output by adding a print command in a BEGIN rule.

Code:

BEGIN {OFS="\t";print "PART NUMBER","ORD NO","DOC NO","DDMM","PGC/SER NO"}

grail · 01-13-2017, 01:20 AM

I agree with rtmistler that the data needs to be clarified, ie how is it separated? and things like the 'JAN' field, will these ever contain data?

You may opt to use something like ruby/perl/python and then use the unpack function to set fixed widths for each field

azheruddin · 01-13-2017, 11:22 PM

Hello Guys,
sorry for the delayed feedback.

lot of queries raised, here are the answers

Quote:

You seem to tell us that a file is of the format R950CMA_01JAN15, but in your code example it is simply called "file". I will assume this was a typo and advise you can either use the date functions provided or design your own function to perhaps use an array to retrieve the previous month and yea

r
filename always as would be in format R950CMA_01JAN15 accordingly -1 month need to manipulated, in sample it was just given as a sample

Quote:

Valid, and what I noticed was that the data is not the same between each example, some of which does and doesn't match the sample input from the first post.

discrepancy in data yes since have masked it before posting , but data format/pattern remains same.

Quote:

However, the field KEYWORD will cause awk to choke with data like "DMNMDB PE" and "D DJKND" because of the spaces. Are those typos? Or are the main fields separated by tabs, I hope?

"DMNMDB PE" and "D DJKND" no spaces in data it isvjust typo . data is properly arranged with specific delimiters.

Quote:

You're in the right set when you have "MOVEMENT CODE 26" or whatever.
The data starts after the line with many dashes.

When you are in the right set and on a data line, print. Otherwise go to the next line of input.

you are exactly understood.

Now somebody can help?

grail · 01-14-2017, 01:15 AM

awk has a variable called FILENAME, so you can use that to get the data wanted. As previously advised you will need to take said data and manipulate it with either awk related functions or create your own.

Also the requirement for only part of one of the fields is done using substr.

As you have advised that bogus spaces will not be in real data, here is a quick knock up to get the data from current input:

Code:

#!/usr/bin/awk -f

BEGIN{
	mrs = " -[- ]+\n"
	rrs = "\n"

	RS = mrs
}

/CODE *26/{
	RS = rrs
	p  = 1
	next
}

!NF{
	RS = mrs
	p  = 0
}

p{
	print $2,$4,$5,$6,$7
}

This should at least get you started. I would also use allend's advise to simply put the header in the BEGIN

azheruddin · 01-14-2017, 11:04 AM

Hi ,
getting some issue to merge code with mine, can anyone send me the code to complete the complete requirement on top of my code or new one.

grail · 01-14-2017, 12:49 PM

Ultimately you are asking if someone will do the job for you, the short answer is no.

I am not following why it is so complicated. If using my example code, run it, check the output and slowly add in the pieces that are missing that you want.
You will never get any more proficient at this if someone else is doing the work for you and you will simply end up back here next time asking someone to do your next task.

At least present what you have done and advise where you are stuck???

TB0ne · 01-14-2017, 03:42 PM

Quote:

Originally Posted by azheruddin

Hi ,
getting some issue to merge code with mine, can anyone send me the code to complete the complete requirement on top of my code or new one.

Sounds familiar:
http://www.linuxquestions.org/questi...141/page2.html
http://www.linuxquestions.org/questi...rn-4175516737/
http://www.linuxquestions.org/questi...on-4175438073/
http://www.linuxquestions.org/questi...pt-4175440649/
http://www.linuxquestions.org/questi...pt-4175444590/
http://www.linuxquestions.org/questi...at-4175448079/
http://www.linuxquestions.org/questi...ng-4175509609/

After being here for six years now, and essentially being told many times that you have to write your own scripts, I'm with Grail here...show us your efforts FIRST, and tell us where you're stuck.