LinuxQuestions.org
Go Job Hunting at the LQ Job Marketplace
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices

Reply
 
Search this Thread
Old 01-04-2008, 12:23 PM   #1
khairil
LQ Newbie
 
Registered: May 2005
Distribution: gentoo
Posts: 23

Rep: Reputation: 15
bash/sed/awk fill each line in text file with space to fixed length


hello all,

i need to reformat a text file to have the length of each line is fixed at 1968bytes and fill the line with space if the line length is less than 1968bytes. How to accomplish this using sed/awk/bash.

my sample data is as follows;
Code:
...
000403627451770129138730        01000071203223309000113000008     0    2MKPO7OKEC7I  02200        0
000003627472830197779630        01000071203223419000003000005     0    SCDT7O KEC7I  02200        0
000001635817860342943071        01000071203223403000019000008     0    TAG1C7O6MSAI7I02100        0
000001228111130362732497        01000071203223331000051000003     1    KEC7O  2MHTI7I02100        0      0132177636
000001639462060361874069        01000071203223307000115000006     0    GBKC7O 6MSEI7I02100        0
000003625806300122719555        01000071203223418000004000026     0    2MKPO7OBAT4C7I02100        0
000003602113380122449036        13000071203222535000848000010     0    2MKPO7OBBDC7I 02100        0
000403627368870193678506        01000071203222655000729000005     0  
...
currently i'm using dd (with conv=block) and loop through each line to accomplish this task but it really slow on big file (around 20000 of lines) and assuming my sample data have no CR/LF characters.

Code:
...
for (( i = 0; i <= 19999; i++ ))
do
        dd if=${FILENAME} bs=123 cbs=1968 conv=block,sync count=1 skip=${i} >> ${OUTPUTFILE}
done
...
tqvm!

Last edited by khairil; 01-04-2008 at 12:28 PM.
 
Old 01-04-2008, 12:35 PM   #2
ghostdog74
Senior Member
 
Registered: Aug 2006
Posts: 2,695
Blog Entries: 5

Rep: Reputation: 240Reputation: 240Reputation: 240
Code:
awk  'length <= 1968 { printf "%-1968s\n",$0 }'  "file"
 
Old 01-04-2008, 01:03 PM   #3
radoulov
Member
 
Registered: Apr 2007
Location: Milano, Italia/Варна, България
Distribution: Ubuntu, Open SUSE
Posts: 212

Rep: Reputation: 35
Or:

Code:
awk '{printf "%-1968s\n",$0}' data
Code:
(IFS=$'\n';printf "%-1968s\n" $(<data))

Last edited by radoulov; 01-04-2008 at 01:06 PM.
 
Old 01-04-2008, 08:11 PM   #4
khairil
LQ Newbie
 
Registered: May 2005
Distribution: gentoo
Posts: 23

Original Poster
Rep: Reputation: 15
thx a lot ghostdog74 and radoulov for the solutions. all those is working for files that have CR/LF.

but my situation is like this;

the 'data' file is currently a single line file (as on magnetic tape) which have no CR/LF characters, but i know the length for each records is 123bytes so i can get the above 'data' if i run;
Code:
fold -w 123 data > DataTemp
and i run your awk/bash commands on DataTemp to fill the space for each line to become 1968bytes.
Code:
awk  'length <= 1968 { printf "%-1968s",$0 }'  DataTemp
or
Code:
(IFS=$'\n';printf "%-1968s" $(<DataTemp))
but this will only receive the DataTemp with CR/LF.

Code:
visually, i want to format this kind of data (no cr/lf in both input and output)

|---123bytes---|---123bytes---|---123bytes---|

to become

|-----1968bytes-----|-----1968bytes-----|-----1968bytes-----|
currently

Code:
the awk script format from this 
|---123bytes---|
|---123bytes---|
|---123bytes---|

to 
|-----1968bytes-----|-----1968bytes-----|-----1968bytes-----|
tq.

Last edited by khairil; 01-04-2008 at 08:14 PM.
 
Old 01-05-2008, 04:44 AM   #5
radoulov
Member
 
Registered: Apr 2007
Location: Milano, Italia/Варна, България
Distribution: Ubuntu, Open SUSE
Posts: 212

Rep: Reputation: 35
Like this?

GNU Awk:

Code:
awk --re-interval '{printf "%-1968s",RT}' RS=".{123}" data
 
Old 01-05-2008, 05:04 AM   #6
khairil
LQ Newbie
 
Registered: May 2005
Distribution: gentoo
Posts: 23

Original Poster
Rep: Reputation: 15
that's it! bingo! thx a lot people.!

can u explain the command (parameters; RT, RS) and notation.

wow! how d u guys mastering sed/awk? is there good tuts?

tqvm.

Last edited by khairil; 01-05-2008 at 05:07 AM.
 
Old 01-05-2008, 05:42 AM   #7
khairil
LQ Newbie
 
Registered: May 2005
Distribution: gentoo
Posts: 23

Original Poster
Rep: Reputation: 15
another scenario (but still related to the topic)

i have a text file (it's a CDR;Call Data Record actually) in this format but with no CR/LF character;


|---VOL 80bytes---|
|---HDR1 80bytes---|
|---HDR2 80bytes---|
|-----Data_1 123bytes-----|
|-----Data_2 123bytes-----|
...
|-----Data_n 123bytes-----|
|---EOF1 80bytes---|
|---EOF2 80bytes---|
|---HDR1 80bytes---|
|---HDR2 80bytes---|
|-----Data_1 123bytes-----|
|-----Data_2 123bytes-----|
...
|-----Data_n 123bytes-----|
|---EOF1 80bytes---|
|---EOF2 80bytes---|
|---HDR1 80bytes---|
|---HDR2 80bytes---|
|-----Data_1 123bytes-----|
|-----Data_2 123bytes-----|
...
|-----Data_n 123bytes-----|
|---EOF1 80bytes---|
|---EOF2 80bytes---|


my task is to split this file from HDR1 to EOF2 to become individual file, so in this example there will be 3 individual file. i'm trying to use only sed/awk/dd/bash for this task. Can anyone out there can help me?

in addition i also want each section to become 1968bytes in length (fill with space like earlier post in this thread).

tqvm in advance.

Last edited by khairil; 01-05-2008 at 07:26 PM.
 
Old 01-05-2008, 07:21 AM   #8
radoulov
Member
 
Registered: Apr 2007
Location: Milano, Italia/Варна, България
Distribution: Ubuntu, Open SUSE
Posts: 212

Rep: Reputation: 35
If you post a sample from the real input data
and an example (with real data) of the desired output,
it would be easier.
 
Old 01-05-2008, 11:56 AM   #9
khairil
LQ Newbie
 
Registered: May 2005
Distribution: gentoo
Posts: 23

Original Poster
Rep: Reputation: 15
ok radoulov,

here is my sample data;

sample_data (2885bytes)
Code:
VOL148M005                           48BATDT                                   1HDR1TTFILE00         48M00500010001814300 07338 00000 000000                    HDR2F0196801968                                   00                            0000036253218576268899          01000071203222436000414000001     0    KLJDT7OBAT4C7I02200        0                        000001225488200342969982        01000071203222837000014000005     0    TAG2C7O2MKPI7I02100        0                        000403620157880122845822        01000071203222807000044000006     0    2MKPO7ODTAC7I 02100        0                        000001462181140340251050        01000071203222843000009000008     0    SET4C7O6MSAI7I02100        0                        000001623519100380704228        01000071203222833000020000009     0    BGIDT7O6MSAI7I02100        0                        EOF1TTFILE00         48M00500010001814300 07338 00000 002000                    EOF2F0196801968                                   00                            HDR1TTFILE00         48M00500010001814300 07338 00000 000000                    HDR2F0196801968                                   00                            0000036253218576268899          01000071203222436000414000001     0    KLJDT7OBAT4C7I02200        0                        000001225488200342969982        01000071203222837000014000005     0    TAG2C7O2MKPI7I02100        0                        000403620157880122845822        01000071203222807000044000006     0    2MKPO7ODTAC7I 02100        0                        000001462181140340251050        01000071203222843000009000008     0    SET4C7O6MSAI7I02100        0                        000001623519100380704228        01000071203222833000020000009     0    BGIDT7O6MSAI7I02100        0                        EOF1TTFILE00         48M00500010001814300 07338 00000 002000                    EOF2F0196801968                                   00                            HDR1TTFILE00         48M00500010001814300 07338 00000 000000                    HDR2F0196801968                                   00                            0000036253218576268899          01000071203222436000414000001     0    KLJDT7OBAT4C7I02200        0                        000001225488200342969982        01000071203222837000014000005     0    TAG2C7O2MKPI7I02100        0                        000403620157880122845822        01000071203222807000044000006     0    2MKPO7ODTAC7I 02100        0                        000001462181140340251050        01000071203222843000009000008     0    SET4C7O6MSAI7I02100        0                        000001623519100380704228        01000071203222833000020000009     0    BGIDT7O6MSAI7I02100        0                        EOF1TTFILE00         48M00500010001814300 07338 00000 002000                    EOF2F0196801968                                   00                            
the sample_data is in this format;
|--VOL 80bytes--|--HDR1 80bytes--|--HDR2 80 bytes--|---Data_1 123bytes--|-Data_2 123bytes--|--Data_3 123bytes--|--Data_4 123bytes--|--Data_5 123bytes--|--EOF1 80 bytes--|--EOF2 bytes--|--HDR1 80bytes--|--HDR2 80 bytes--|---Data_1 123bytes--|-Data_2 123bytes--|--Data_3 123bytes--|--Data_4 123bytes--|--Data_5 123bytes--|--EOF1 80 bytes--|--EOF2 bytes--|--HDR1 80bytes--|--HDR2 80 bytes--|---Data_1 123bytes--|-Data_2 123bytes--|--Data_3 123bytes--|--Data_4 123bytes--|--Data_5 123bytes--|--EOF1 80 bytes--|--EOF2 bytes--|

sample_out (5488bytes)
Code:
VOL148M005                           48BATDT                                   1                                                                                                                    HDR1TTFILE00         48M00500010001814300 07338 00000 000000                                                                                                                                        HDR2F0196801968                                   00                                                                                                                                                0000036253218576268899          01000071203222436000414000001     0    KLJDT7OBAT4C7I02200        0                                                                                                 000001225488200342969982        01000071203222837000014000005     0    TAG2C7O2MKPI7I02100        0                                                                                                 000403620157880122845822        01000071203222807000044000006     0    2MKPO7ODTAC7I 02100        0                                                                                                 000001462181140340251050        01000071203222843000009000008     0    SET4C7O6MSAI7I02100        0                                                                                                 000001623519100380704228        01000071203222833000020000009     0    BGIDT7O6MSAI7I02100        0                                                                                                 EOF1TTFILE00         48M00500010001814300 07338 00000 002000                                                                                                                                        EOF2F0196801968                                   00                                                                                                                                                HDR1TTFILE00         48M00500010001814300 07338 00000 000000                                                                                                                                        HDR2F0196801968                                   00                                                                                                                                                0000036253218576268899          01000071203222436000414000001     0    KLJDT7OBAT4C7I02200        0                                                                                                 000001225488200342969982        01000071203222837000014000005     0    TAG2C7O2MKPI7I02100        0                                                                                                 000403620157880122845822        01000071203222807000044000006     0    2MKPO7ODTAC7I 02100        0                                                                                                 000001462181140340251050        01000071203222843000009000008     0    SET4C7O6MSAI7I02100        0                                                                                                 000001623519100380704228        01000071203222833000020000009     0    BGIDT7O6MSAI7I02100        0                                                                                                 EOF1TTFILE00         48M00500010001814300 07338 00000 002000                                                                                                                                        EOF2F0196801968                                   00                                                                                                                                                HDR1TTFILE00         48M00500010001814300 07338 00000 000000                                                                                                                                        HDR2F0196801968                                   00                                                                                                                                                0000036253218576268899          01000071203222436000414000001     0    KLJDT7OBAT4C7I02200        0                                                                                                 000001225488200342969982        01000071203222837000014000005     0    TAG2C7O2MKPI7I02100        0                                                                                                 000403620157880122845822        01000071203222807000044000006     0    2MKPO7ODTAC7I 02100        0                                                                                                 000001462181140340251050        01000071203222843000009000008     0    SET4C7O6MSAI7I02100        0                                                                                                 000001623519100380704228        01000071203222833000020000009     0    BGIDT7O6MSAI7I02100        0                                                                                                 EOF1TTFILE00         48M00500010001814300 07338 00000 002000                                                                                                                                        EOF2F0196801968                                   00                                                                                                                                                
the sample_out is in this format;
|---VOL 196bytes---|---HDR1 196bytes---|---HDR2 196bytes---|---Data_1 196bytes---|---Data_2 196bytes---|---Data_3 196bytes---|---Data_4 196bytes---|---Data_5 196bytes---|---EOF1 196bytes---|---EOF2 196bytes---|---HDR1 196bytes---|---HDR2 196bytes---|---Data_1 196bytes---|---Data_2 196bytes---|---Data_3 196bytes---|---Data_4 196bytes---|---Data_5 196bytes---|---EOF1 196bytes---|---EOF2 196bytes---|---HDR1 196bytes---|---HDR2 196bytes---|---Data_1 196bytes---|---Data_2 196bytes---|---Data_3 196bytes---|---Data_4 196bytes---|---Data_5 196bytes---|---EOF1 196bytes---|---EOF2 196bytes---|

since there is a restriction of post size in this forum so for the sample out i fixed the length at 196bytes instead of 1968bytes as discussed before.

tq.

Last edited by khairil; 01-05-2008 at 07:38 PM.
 
Old 01-05-2008, 02:27 PM   #10
radoulov
Member
 
Registered: Apr 2007
Location: Milano, Italia/Варна, България
Distribution: Ubuntu, Open SUSE
Posts: 212

Rep: Reputation: 35
You may need to adjust the fieldwidths:

Code:
awk --re-interval '/HDR/ { 
	for (i=1; i<=NF; i++)
		$i = sprintf("%-1968s",$i)
        RT = sprintf("%-1968s",RT)
	printf "%s",$0RT > (FILENAME"_"++c) 
}' FIELDWIDTHS="80 123 123 123 123 123 80 80" RS="VOL.{77}|EOF2.{76}" data

Last edited by radoulov; 01-06-2008 at 03:08 AM.
 
Old 01-09-2008, 02:24 AM   #11
khairil
LQ Newbie
 
Registered: May 2005
Distribution: gentoo
Posts: 23

Original Poster
Rep: Reputation: 15
thanks a lot radoulov. your solution works for the sample file, but what if the data block count is variable and not fixed to only five in each section (from HDR1 to EOF2).. i may have more or less than five data in each section. in the real file (which the size was 7MB) i have almost 32000 data block in each section.

tq.
 
Old 01-09-2008, 05:28 AM   #12
radoulov
Member
 
Registered: Apr 2007
Location: Milano, Italia/Варна, България
Distribution: Ubuntu, Open SUSE
Posts: 212

Rep: Reputation: 35
I understand.
If the number of 80 bytes records is fixed
(i.e. always four: HDR1, HDR2, EOF1 and EOF2)
and the number of 123 bytes records varies,
the FIELDWIDTHS could be calculated: (length-320)/123,
in my example: (length-240)/123 as the EOF2 section
is part of the RS/RT.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
using sed to replace text on one line in a text file vo1pwf Linux - Newbie 5 06-24-2009 07:54 AM
[SOLVED] use awk &/or sed to read file 1 line 1 & file 2 line 1 gr8scot Linux - General 6 07-12-2007 08:13 PM
sed: delete text till <pattern2> depending on length of text oyarsamoh Programming 2 05-05-2007 01:40 AM
SED - display text on specific line of text file 3saul Linux - Software 3 12-29-2005 04:32 PM
Help with a script to edit text file (awk? sed?) rickh Linux - Newbie 8 04-21-2005 08:24 PM


All times are GMT -5. The time now is 07:53 PM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration