Linux - NewbieThis Linux forum is for members that are new to Linux.
Just starting out and have a question?
If it is not in the man pages or the how-to's this is the place!
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
Hi,
I have data which has many columns but i need to extract only 5 columns. three columns are direct extract but for two columns logic needs to be applied before getting output.
I have pasted sample content of the files.
1. I need to pick only those record under MOVEMENT CODE 26 which is in header section of each record other than MOVEMENT CODE 26 should not come in output.
2. only 2 digits needs to display from DOC NO.
3.To display DDMM colmmn also but with year in file records only date and month i.e 05 date and month 12 but year also needs be appended.
year logic is to read from the file name.
file name = R950CMA_01JAN15 so data is for the past month means dec14 so year is 14 in a same way go on..
so total 5 column needs to be extracted ( ORD NO,P A R T N U M B E R,DDMM,INV NO / SER NO,DOC NO).
Input file content:
Code:
1
LIST 530 Jeffron aliff system
MASTERS ISSUE REPORT (MONTHLY) 01JAN15 PAGE 1
0 MOVEMENT CODE 26
0 A/C NO 1
0 BCK P A R T N U M B E R KEYWORD J A N ORD NO DOC NO DDMM PGC / SER NO VENDOR GB Q T Y
------------ ------------------------ -------- ------------ --------- ------ ---- --------------- ------ -- ----------
424009024545 GBGTGFRVKSNMSSSM DMNMDB PE 490117701 110024 0112 8916 280819 58 20.0
424009020858 3GTUJ77 D DJKND 490118901 110025 0112 HTV 280799 29 9.0
424009024756 DMDJKAYTTUUVNN DNDMNDn 490126001 110026 0112 1412008 280477 01 10.0
1
LIST 732 Jeffron aliff system
MASTERS ISSUE ANALYSIS REPORT (MONTHLY) 01JAN15 PAGE 2
0 MOVEMENT CODE 27
0 S A/C NO 140
0 I I C P A R T N U M B E R KEYWORD J A N ORD NO DOC NO DDMM PGC / SER NO VENDOR UM Q T Y
------------ ------------------------ -------- ------------ --------- ------ ---- --------------- ------ -- ----------
440101000437 GBYI12 FMNDMMD 421755201 397185 0512 3027490 D24170 15 2.0
440101000578 VGTYUMNBVYIIOPMMN NMBDMNBd 421752701 361076 0512 15668 S22112 15 5.0
424009025224 JNDKNMNSDMHYTUBK FMNBD 490127301 110047 0512 111931 290040 01 20.0
1
LIST 673 Jeffron aliff system
MASTERS ISSUE ANALYSIS REPORT (MONTHLY) 01JAN15 PAGE 3
0 MOVEMENT CODE 26 STORING OF ORDERED MATERIAL
0 S A/C NO 14
0 I I C P A R T N U M B E R KEYWORD J A N ORD NO DOC NO DDMM PGC / SER NO VENDOR UM Q T Y
------------ ------------------------ -------- ------------ --------- ------ ---- --------------- ------ -- ----------
440006000061 DBNMMDNnDBJBD DNBVNBDV 421766901 397205 1212 3029548 D24170 39 1.0
440002000198 DBNMDBMN DJHDMND 421766902 397206 1212 3029548 D24170 39 2.0
440001000246 DNMMNDMNBDNDDB NMDBNMDN 421750101 381910 1212 D01341MY1 280911 15 50.0
OUTPUT:
Code:
P A R T N U M B E R ORD NO DOC NO DDMM PGC / SER NO
GBGTGFRVKSNMSSSM 490117701 11 011214 8916
3GTUJ77 490118901 011214 HTV
DMDJKAYTTUUVNN 490126001 11 011214 1412008
DBNMMDNnDBJBD 421766901 39 121214 3029548
DBNMDBMN 421766902 39 121214 3029548
DNMMNDMNBDNDDB 421750101 38 121214 3029548
all records which needs to be pick have standard 1 space in beginning.
Appreciate your assistance by AWK.
Last edited by azheruddin; 01-12-2017 at 06:27 AM.
If your preference is to do this by awk, then you should have already made some attempt to do this.
Please post what you have so far for others to review and offer their suggestions towards.
As you know from your 6 years with LinuxQuestions.org, LQ is volunteer members who are here to help you learn the information you need to accomplish your goals versus an on-demand script provider.
awk has some built-in variables like FNR and NR which can be used to change what the script does. So before a certain line number, you can print one set of data, after that you can print the five columns you want.
Hi,
awk '
/^1/ {SKIP = NR + 5 + HDFND
}
NR < SKIP {next
}
!HDFND {MX = split (COLUMNS, HD, ",")
for (i=1; i<=MX; i++) {match ($0, HD[i] " *")
P[i] = RSTART
L[i] = RLENGTH
}
HDFND = 2
}
{for (i=1; i<=MX; i++) printf "%s ", substr ($0, P[i], L[i])
printf RS
}
' COLUMNS=" ORD NO, P A R T N U M B E R,INV NO / SER NO" file
Got output as below.
Code:
ORD NO P A R T N U M B E R PGC / SER NO
--------- ------------------------ ---------------
490117701 PEF0AM1MX2MX40MM 8916
490118901 3M7447 SM0883
490126001 SAFETYC0NE30IN 1412008
490105304 C0TT0NRAG 13264
490121901 ARINUS0940 D01100477
421751301 8W550C3 1557
421755201 BR127 3027490
now need to modify the code as per the additional requirement which as below.
1. records should be picked only for the movement code 26 as mentioned in top of every record header ,can be hard coded.
2. addition column only 2 digits needs to display from DOC NO so as per below record 39,36,11,11
3. To display DDMM colmmn also but with year in file records only date and month i.e 05 date and month 12 but year also needs be appended.
year logic is to read from the file name.
file name = R950CMA_01JAN15 so data is for the past month means dec14 so year is 14 in a same way go on..
expected output:
Code:
P A R T N U M B E R ORD NO DOC NO DDMM PGC / SER NO
GBGTGFRVKSNMSSSM 490117701 11 011214 8916
3GTUJ77 490118901 011214 HTV
DMDJKAYTTUUVNN 490126001 11 011214 1412008
DBNMMDNnDBJBD 421766901 39 121214 3029548
DBNMDBMN 421766902 39 121214 3029548
DNMMNDMNBDNDDB 421750101 38 121214 3029548
Have to say, the last post made me laugh. You have gone to extraordinary lengths to put all your data in code tags, but when it came to your actual code you left it twisting in the breeze so
there is zero formatting :lol:
Looking at you further requirements:
1. Check line contains the words :- /MOVEMENT CODE 26/, if not found, continue until the next blank line and start searching again
2. Look at substr function to get just the digits required
3. You seem to tell us that a file is of the format R950CMA_01JAN15, but in your code example it is simply called "file". I will assume this was a typo and advise you can either use the date functions provided or design your own function to perhaps use an array to retrieve the previous month and year
awk '
/^1/ {SKIP = NR + 5 + HDFND
}
NR < SKIP {next
}
!HDFND {MX = split (COLUMNS, HD, ",")
for (i=1; i<=MX; i++) {match ($0, HD[i] " *")
P[i] = RSTART
L[i] = RLENGTH
}
HDFND = 2
}
{for (i=1; i<=MX; i++) printf "%s ", substr ($0, P[i], L[i])
printf RS
}
' COLUMNS=" ORD NO, P A R T N U M B E R,INV NO / SER NO" file
Awk is not really my strength, however can't you instead use some form of print and $<argument> within an awk statement to resequence fields you've grabbed from a line? That seems easier.
Quote:
Originally Posted by grail
Have to say, the last post made me laugh. You have gone to extraordinary lengths to put all your data in code tags, but when it came to your actual code you left it twisting in the breeze so there is zero formatting.
Valid, and what I noticed was that the data is not the same between each example, some of which does and doesn't match the sample input from the first post.
I do realize it is just example data for the question, however the discontinuity confused me as to what was being sought.
Still feel that using awk to process a line and then print out the fields from the line in the sequence you desire is more desirable of a solution.
From something I googled:
Code:
NR==1
{
a=$1
b=$2
c=$3
}
{
print "Blah"
}
END
{
print "First Login:", a, b, c RS "Last Login:", $1, $2, $3
}
And then just resequence those instead once you do the assignment.
One way is to use a pair of flags and toggle them on and off, only printing when the both are set.
The sets are separated by empty lines (or at least lines with only whitespace).
You're in the right set when you have "MOVEMENT CODE 26" or whatever.
The data starts after the line with many dashes.
When you are in the right set and on a data line, print. Otherwise go to the next line of input.
Maybe that's the 7th-grade approach but it should work with awk and can be don with a lot of nexts and ifs.
However, the field KEYWORD will cause awk to choke with data like "DMNMDB PE" and "D DJKND" because of the spaces. Are those typos? Or are the main fields separated by tabs, I hope?
You seem to tell us that a file is of the format R950CMA_01JAN15, but in your code example it is simply called "file". I will assume this was a typo and advise you can either use the date functions provided or design your own function to perhaps use an array to retrieve the previous month and yea
r
filename always as would be in format R950CMA_01JAN15 accordingly -1 month need to manipulated, in sample it was just given as a sample
Quote:
Valid, and what I noticed was that the data is not the same between each example, some of which does and doesn't match the sample input from the first post.
discrepancy in data yes since have masked it before posting , but data format/pattern remains same.
Quote:
However, the field KEYWORD will cause awk to choke with data like "DMNMDB PE" and "D DJKND" because of the spaces. Are those typos? Or are the main fields separated by tabs, I hope?
"DMNMDB PE" and "D DJKND" no spaces in data it isvjust typo . data is properly arranged with specific delimiters.
Quote:
You're in the right set when you have "MOVEMENT CODE 26" or whatever.
The data starts after the line with many dashes.
When you are in the right set and on a data line, print. Otherwise go to the next line of input.
awk has a variable called FILENAME, so you can use that to get the data wanted. As previously advised you will need to take said data and manipulate it with either awk related functions or create your own.
Also the requirement for only part of one of the fields is done using substr.
As you have advised that bogus spaces will not be in real data, here is a quick knock up to get the data from current input:
Ultimately you are asking if someone will do the job for you, the short answer is no.
I am not following why it is so complicated. If using my example code, run it, check the output and slowly add in the pieces that are missing that you want.
You will never get any more proficient at this if someone else is doing the work for you and you will simply end up back here next time asking someone to do your next task.
At least present what you have done and advise where you are stuck???
After being here for six years now, and essentially being told many times that you have to write your own scripts, I'm with Grail here...show us your efforts FIRST, and tell us where you're stuck.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.