LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   To extract certain columns with only header (https://www.linuxquestions.org/questions/linux-newbie-8/to-extract-certain-columns-with-only-header-4175597282/)

azheruddin 01-12-2017 06:24 AM

To extract certain columns with only header
 
Hi,
I have data which has many columns but i need to extract only 5 columns. three columns are direct extract but for two columns logic needs to be applied before getting output.
I have pasted sample content of the files.

1. I need to pick only those record under MOVEMENT CODE 26 which is in header section of each record other than MOVEMENT CODE 26 should not come in output.

2. only 2 digits needs to display from DOC NO.

3.To display DDMM colmmn also but with year in file records only date and month i.e 05 date and month 12 but year also needs be appended.
year logic is to read from the file name.
file name = R950CMA_01JAN15 so data is for the past month means dec14 so year is 14 in a same way go on..

so total 5 column needs to be extracted ( ORD NO,P A R T N U M B E R,DDMM,INV NO / SER NO,DOC NO).

Input file content:
Code:

1
    LIST 530                        Jeffron aliff system
    MASTERS                                      ISSUE    REPORT (MONTHLY)              01JAN15      PAGE      1
0    MOVEMENT CODE  26    
0    A/C NO  1
0    BCK    P A R T  N U M B E R    KEYWORD      J A N    ORD NO  DOC NO DDMM PGC / SER NO VENDOR GB    Q T Y 
 ------------ ------------------------ -------- ------------ --------- ------ ---- --------------- ------ -- ----------
 424009024545 GBGTGFRVKSNMSSSM        DMNMDB PE              490117701 110024 0112 8916            280819 58      20.0       
 424009020858 3GTUJ77                  D DJKND              490118901 110025 0112 HTV          280799 29      9.0     
 424009024756 DMDJKAYTTUUVNN          DNDMNDn            490126001 110026 0112 1412008        280477 01      10.0       
 
 1
    LIST 732                        Jeffron aliff system
    MASTERS                                      ISSUE  ANALYSIS  REPORT (MONTHLY)              01JAN15      PAGE      2
0    MOVEMENT CODE  27   
0    S A/C NO  140
0    I I C    P A R T  N U M B E R    KEYWORD      J A N    ORD NO  DOC NO DDMM PGC / SER NO VENDOR UM    Q T Y 
 ------------ ------------------------ -------- ------------ --------- ------ ---- --------------- ------ -- ----------
 440101000437 GBYI12                    FMNDMMD                421755201 397185 0512 3027490        D24170 15      2.0     
 440101000578 VGTYUMNBVYIIOPMMN        NMBDMNBd              421752701 361076 0512 15668          S22112 15      5.0     
 424009025224 JNDKNMNSDMHYTUBK          FMNBD                  490127301 110047 0512 111931          290040 01      20.0     
 
1
    LIST 673                        Jeffron aliff system
    MASTERS                                      ISSUE  ANALYSIS  REPORT (MONTHLY)              01JAN15      PAGE      3
0    MOVEMENT CODE  26    STORING OF ORDERED MATERIAL
0    S A/C NO  14
0    I I C    P A R T  N U M B E R    KEYWORD      J A N    ORD NO  DOC NO DDMM PGC / SER NO VENDOR UM    Q T Y 
 ------------ ------------------------ -------- ------------ --------- ------ ---- --------------- ------ -- ----------
 440006000061 DBNMMDNnDBJBD            DNBVNBDV              421766901 397205 1212 3029548        D24170 39      1.0     
 440002000198 DBNMDBMN                DJHDMND              421766902 397206 1212 3029548        D24170 39      2.0     
 440001000246 DNMMNDMNBDNDDB          NMDBNMDN              421750101 381910 1212 D01341MY1      280911 15      50.0



OUTPUT:
Code:

P A R T  N U M B E R                ORD NO                DOC NO                DDMM        PGC / SER NO
 GBGTGFRVKSNMSSSM                490117701        11            011214                8916
 3GTUJ77                        490118901                    011214                HTV
 DMDJKAYTTUUVNN                        490126001        11            011214                1412008
 DBNMMDNnDBJBD                        421766901        39            121214                3029548
 DBNMDBMN                        421766902        39            121214                3029548
 DNMMNDMNBDNDDB                        421750101        38            121214                3029548

all records which needs to be pick have standard 1 space in beginning.

Appreciate your assistance by AWK.

rtmistler 01-12-2017 07:03 AM

Hi azheruddin,

If your preference is to do this by awk, then you should have already made some attempt to do this.

Please post what you have so far for others to review and offer their suggestions towards.

As you know from your 6 years with LinuxQuestions.org, LQ is volunteer members who are here to help you learn the information you need to accomplish your goals versus an on-demand script provider.

Turbocapitalist 01-12-2017 07:18 AM

awk has some built-in variables like FNR and NR which can be used to change what the script does. So before a certain line number, you can print one set of data, after that you can print the five columns you want.

azheruddin 01-12-2017 09:42 AM

Hi,
awk '
/^1/ {SKIP = NR + 5 + HDFND
}
NR < SKIP {next
}
!HDFND {MX = split (COLUMNS, HD, ",")
for (i=1; i<=MX; i++) {match ($0, HD[i] " *")
P[i] = RSTART
L[i] = RLENGTH
}
HDFND = 2
}
{for (i=1; i<=MX; i++) printf "%s ", substr ($0, P[i], L[i])
printf RS
}
' COLUMNS=" ORD NO, P A R T N U M B E R,INV NO / SER NO" file

Got output as below.


Code:

ORD NO    P A R T  N U M B E R    PGC / SER NO 
---------  ------------------------  --------------- 
490117701  PEF0AM1MX2MX40MM          8916           
490118901  3M7447                    SM0883         
490126001  SAFETYC0NE30IN            1412008         
490105304  C0TT0NRAG                13264           
490121901  ARINUS0940                D01100477       
421751301  8W550C3                  1557           
421755201  BR127                    3027490

now need to modify the code as per the additional requirement which as below.
1. records should be picked only for the movement code 26 as mentioned in top of every record header ,can be hard coded.
2. addition column only 2 digits needs to display from DOC NO so as per below record 39,36,11,11
3. To display DDMM colmmn also but with year in file records only date and month i.e 05 date and month 12 but year also needs be appended.
year logic is to read from the file name.
file name = R950CMA_01JAN15 so data is for the past month means dec14 so year is 14 in a same way go on..

expected output:

Code:

P A R T  N U M B E R                ORD NO                DOC NO                DDMM        PGC / SER NO
 GBGTGFRVKSNMSSSM                490117701        11            011214                8916
 3GTUJ77                        490118901                    011214                HTV
 DMDJKAYTTUUVNN                        490126001        11            011214                1412008
 DBNMMDNnDBJBD                        421766901        39            121214                3029548
 DBNMDBMN                        421766902        39            121214                3029548
 DNMMNDMNBDNDDB                        421750101        38            121214                3029548


grail 01-12-2017 10:40 AM

Have to say, the last post made me laugh. You have gone to extraordinary lengths to put all your data in code tags, but when it came to your actual code you left it twisting in the breeze so
there is zero formatting :lol:

Looking at you further requirements:

1. Check line contains the words :- /MOVEMENT CODE 26/, if not found, continue until the next blank line and start searching again

2. Look at substr function to get just the digits required

3. You seem to tell us that a file is of the format R950CMA_01JAN15, but in your code example it is simply called "file". I will assume this was a typo and advise you can either use the date functions provided or design your own function to perhaps use an array to retrieve the previous month and year

rtmistler 01-12-2017 12:23 PM

Quote:

Originally Posted by azheruddin (Post 5653855)
H]
Code:

awk '
/^1/            {SKIP = NR + 5 + HDFND
                }
NR < SKIP      {next
                }
!HDFND          {MX = split (COLUMNS, HD, ",")
                for (i=1; i<=MX; i++)  {match ($0, HD[i] " *")
                                        P[i] = RSTART
                                        L[i] = RLENGTH
                                        }
                HDFND = 2
                }
                {for (i=1; i<=MX; i++)  printf "%s ", substr ($0, P[i], L[i])
                printf RS
                }
' COLUMNS=" ORD NO, P A R T  N U M B E R,INV NO / SER NO" file


Awk is not really my strength, however can't you instead use some form of print and $<argument> within an awk statement to resequence fields you've grabbed from a line? That seems easier.
Quote:

Originally Posted by grail (Post 5653893)
Have to say, the last post made me laugh. You have gone to extraordinary lengths to put all your data in code tags, but when it came to your actual code you left it twisting in the breeze so there is zero formatting.

Valid, and what I noticed was that the data is not the same between each example, some of which does and doesn't match the sample input from the first post.

I do realize it is just example data for the question, however the discontinuity confused me as to what was being sought.

Still feel that using awk to process a line and then print out the fields from the line in the sequence you desire is more desirable of a solution.

From something I googled:
Code:

NR==1
{
    a=$1
    b=$2
    c=$3
}
{
    print "Blah"
}
END
{
    print "First Login:", a, b, c  RS "Last Login:", $1, $2, $3
}

And then just resequence those instead once you do the assignment.

Turbocapitalist 01-12-2017 01:03 PM

[duplicate due to keybounce]

Turbocapitalist 01-12-2017 01:12 PM

Ok. I see the pattern now.

One way is to use a pair of flags and toggle them on and off, only printing when the both are set.

The sets are separated by empty lines (or at least lines with only whitespace).
You're in the right set when you have "MOVEMENT CODE 26" or whatever.
The data starts after the line with many dashes.

When you are in the right set and on a data line, print. Otherwise go to the next line of input.

Maybe that's the 7th-grade approach but it should work with awk and can be don with a lot of nexts and ifs.


However, the field KEYWORD will cause awk to choke with data like "DMNMDB PE" and "D DJKND" because of the spaces. Are those typos? Or are the main fields separated by tabs, I hope?

allend 01-12-2017 06:28 PM

Quote:

However, the field KEYWORD will cause awk to choke with data like "DMNMDB PE" and "D DJKND" because of the spaces. Are those typos?
If those are extraneous spaces, then perhaps this is starting to do what you want.
Code:

awk '/MOVEMENT.*26/ {getline;getline;getline;getline;while (NF==10) {print $2,$4,$5,$6,$7;getline}}' <Input file>
Obviously there is still work required to achieve desired formatting.
You can add a header to the output by adding a print command in a BEGIN rule.
Code:

BEGIN {OFS="\t";print "PART NUMBER","ORD NO","DOC NO","DDMM","PGC/SER NO"}

grail 01-13-2017 01:20 AM

I agree with rtmistler that the data needs to be clarified, ie how is it separated? and things like the 'JAN' field, will these ever contain data?

You may opt to use something like ruby/perl/python and then use the unpack function to set fixed widths for each field

azheruddin 01-13-2017 11:22 PM

Hello Guys,
sorry for the delayed feedback.

lot of queries raised, here are the answers

Quote:

You seem to tell us that a file is of the format R950CMA_01JAN15, but in your code example it is simply called "file". I will assume this was a typo and advise you can either use the date functions provided or design your own function to perhaps use an array to retrieve the previous month and yea
r
filename always as would be in format R950CMA_01JAN15 accordingly -1 month need to manipulated, in sample it was just given as a sample

Quote:

Valid, and what I noticed was that the data is not the same between each example, some of which does and doesn't match the sample input from the first post.
discrepancy in data yes since have masked it before posting , but data format/pattern remains same.

Quote:

However, the field KEYWORD will cause awk to choke with data like "DMNMDB PE" and "D DJKND" because of the spaces. Are those typos? Or are the main fields separated by tabs, I hope?
"DMNMDB PE" and "D DJKND" no spaces in data it isvjust typo . data is properly arranged with specific delimiters.

Quote:

You're in the right set when you have "MOVEMENT CODE 26" or whatever.
The data starts after the line with many dashes.

When you are in the right set and on a data line, print. Otherwise go to the next line of input.
you are exactly understood.

Now somebody can help?

grail 01-14-2017 01:15 AM

awk has a variable called FILENAME, so you can use that to get the data wanted. As previously advised you will need to take said data and manipulate it with either awk related functions or create your own.

Also the requirement for only part of one of the fields is done using substr.

As you have advised that bogus spaces will not be in real data, here is a quick knock up to get the data from current input:
Code:

#!/usr/bin/awk -f

BEGIN{
        mrs = " -[- ]+\n"
        rrs = "\n"

        RS = mrs
}

/CODE *26/{
        RS = rrs
        p  = 1
        next
}

!NF{
        RS = mrs
        p  = 0
}

p{
        print $2,$4,$5,$6,$7
}

This should at least get you started. I would also use allend's advise to simply put the header in the BEGIN

azheruddin 01-14-2017 11:04 AM

Hi ,
getting some issue to merge code with mine, can anyone send me the code to complete the complete requirement on top of my code or new one.

grail 01-14-2017 12:49 PM

Ultimately you are asking if someone will do the job for you, the short answer is no.

I am not following why it is so complicated. If using my example code, run it, check the output and slowly add in the pieces that are missing that you want.
You will never get any more proficient at this if someone else is doing the work for you and you will simply end up back here next time asking someone to do your next task.

At least present what you have done and advise where you are stuck???

TB0ne 01-14-2017 03:42 PM

Quote:

Originally Posted by azheruddin (Post 5654894)
Hi ,
getting some issue to merge code with mine, can anyone send me the code to complete the complete requirement on top of my code or new one.

Sounds familiar:
http://www.linuxquestions.org/questi...141/page2.html
http://www.linuxquestions.org/questi...rn-4175516737/
http://www.linuxquestions.org/questi...on-4175438073/
http://www.linuxquestions.org/questi...pt-4175440649/
http://www.linuxquestions.org/questi...pt-4175444590/
http://www.linuxquestions.org/questi...at-4175448079/
http://www.linuxquestions.org/questi...ng-4175509609/

After being here for six years now, and essentially being told many times that you have to write your own scripts, I'm with Grail here...show us your efforts FIRST, and tell us where you're stuck.


All times are GMT -5. The time now is 02:39 PM.