LinuxQuestions.org
Visit Jeremy's Blog.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 10-13-2015, 01:21 PM   #16
HMW
Member
 
Registered: Aug 2013
Location: Sweden
Distribution: Debian, Arch, Red Hat, CentOS
Posts: 773
Blog Entries: 3

Rep: Reputation: 369Reputation: 369Reputation: 369Reputation: 369

As I suspected, with Python this was fairly straightforward, and since we already have a solution up there, here is mine.

With this infile (siranjeevi.txt):
Code:
13F1SomeTxt someother: 78
12DR1:RANDOMTXT:
12FR:OTHERRANDOMTXT:
08GR         123                 997.2084586228524981.8281353167449.0005412762294gutt 01d 5h 5s
09FT          256                 1007.257457877084992.1472768690449.0261941388321sat 02d 6h 8s
And this Python (reg16.py):
Code:
#!/usr/bin/env python3

"""
http://www.linuxquestions.org/questions/showthread.php?p=5433986#post5433986
"""
import re
import sys

# File to open via arg 1
theFile = sys.argv[1]
# Regex start of line
lineStart = re.compile('[0-9]{2}[A-Z]{2} +')
# Regex to split
toSplit = re.compile('  +')

with open(theFile) as infile:
    for line in infile:
        if re.match(lineStart, line):
            listLine = toSplit.split(line)
            listLine[2] = listLine[2][:16] + "," + listLine[2][16:]
            newLine = ",".join(listLine)
            print(newLine, end="")

exit(0)
I get this result:
Code:
./reg16.py siranjeevi.txt
08GR,123,997.208458622852,4981.8281353167449.0005412762294gutt 01d 5h 5s
09FT,256,1007.25745787708,4992.1472768690449.0261941388321sat 02d 6h 8s

Last edited by HMW; 10-13-2015 at 01:28 PM. Reason: Bad wording, blank lines...
 
1 members found this post helpful.
Old 10-13-2015, 02:44 PM   #17
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,008

Rep: Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193
Nice work HMW, but you still need one more comma:
Code:
08GR,123,997.208458622852,4981.8281353167449.0005412762294gutt,01d 5h 5s
09FT,256,1007.25745787708,4992.1472768690449.0261941388321sat,02d 6h 8s
 
1 members found this post helpful.
Old 10-13-2015, 07:52 PM   #18
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,140

Rep: Reputation: 4122Reputation: 4122Reputation: 4122Reputation: 4122Reputation: 4122Reputation: 4122Reputation: 4122Reputation: 4122Reputation: 4122Reputation: 4122Reputation: 4122
What am I missing here ?. A single sed should be able to accomplish everything requested - define the required fields and use back-references.
 
Old 10-14-2015, 01:09 AM   #19
HMW
Member
 
Registered: Aug 2013
Location: Sweden
Distribution: Debian, Arch, Red Hat, CentOS
Posts: 773
Blog Entries: 3

Rep: Reputation: 369Reputation: 369Reputation: 369Reputation: 369
Quote:
Originally Posted by grail
Nice work HMW, but you still need one more comma
Ugh! Yes, you're right, I missed that somehow <irony>despite the very clear and obvious specs</irony>. Thanks for pointing this out.

So, anyway... Here we go. Including the final comma:

Code:
#!/usr/bin/env python3

"""
http://www.linuxquestions.org/questions/showthread.php?p=5433986#post5433986
"""
import re
import sys 

# File to open via arg 1
theFile = sys.argv[1]
# Regex start of line
lineStart = re.compile('[0-9]{2}[A-Z]{2} +')
# Regex to split
toSplit = re.compile('  +')

with open(theFile) as infile:
    for line in infile:
        if re.match(lineStart, line):
            listLine = toSplit.split(line)
            listLine[2] = listLine[2][:16] + "," + listLine[2][16:]
            # Get the position of final comma
            lastCommaPos = re.search('[a-z]{2,}', listLine[2])
            lastComma = lastCommaPos.end()            
            # Insert comma at lastComma
            listLine[2] = listLine[2][:lastComma] + "," + listLine[2][lastComma:]
            newLine = ",".join(listLine)
            # Finally, remove the last unwanted whitespace character
            newLine = newLine.replace(" ", "", 1)
            print(newLine, end="")

exit(0)
Produces...
Code:
./reg16v2.py siranjeevi.txt 
08GR,123,997.208458622852,4981.8281353167449.0005412762294gutt,01d 5h 5s
09FT,256,1007.25745787708,4992.1472768690449.0261941388321sat,02d 6h 8s
Best regards,
HMW

Last edited by HMW; 10-14-2015 at 01:35 AM. Reason: Fixed script. Removed too many spaces!
 
1 members found this post helpful.
Old 10-14-2015, 01:10 AM   #20
siranjeevi
Member
 
Registered: May 2010
Location: India
Posts: 79

Original Poster
Rep: Reputation: 7
hey all,

Thanks for your help, we are almost there,

the final output should print separate the 17th digit (incluing period irrespective or numbers or alphabets) in 3rd column by comma. so the final output should be.

Code:
08GR,123,997.208458622852,4981.82813531674,49.0005412762294,gutt 01d 5h 5s
09FT,256,1007.25745787708,4992.14727686904,49.0261941388321,sat 02d 6h 8s
 
Old 10-14-2015, 01:22 AM   #21
HMW
Member
 
Registered: Aug 2013
Location: Sweden
Distribution: Debian, Arch, Red Hat, CentOS
Posts: 773
Blog Entries: 3

Rep: Reputation: 369Reputation: 369Reputation: 369Reputation: 369
Quote:
Originally Posted by siranjeevi View Post
hey all,

Thanks for your help, we are almost there,

the final output should print separate the 17th digit (incluing period irrespective or numbers or alphabets) in 3rd column by comma. so the final output should be.

Code:
08GR,123,997.208458622852,4981.82813531674,49.0005412762294,gutt 01d 5h 5s
09FT,256,1007.25745787708,4992.14727686904,49.0261941388321,sat 02d 6h 8s
Using what have already been given in this thread, this should be a walk in the park for you now.

Good luck!
HMW
 
Old 10-14-2015, 04:56 AM   #22
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,008

Rep: Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193
And of course now the OP has taken that freshly created comma you put in, out of his last post

And as per syg00's suggestion (with no fun whatsoever ... lol):
Code:
sed -rn '/[0-9]{2}[A-Z]{2} /s/([^ ]*) *([^ ]*) *(.{16})(.{16})(.{16})(.*)/\1,\2,\3,\4,\5,\6/p' file
And some ruby, which of course could have just used the referencing as well:
Code:
ruby -ane 'if /\d{2}\w{2} /;1.upto($F[2].size / 16){|n| $F[2].insert(n * 16 + n -1, ",")};puts $F[0..2] * "," + " " + $F[3..-1] * " ";end' file

Last edited by grail; 10-14-2015 at 05:43 AM.
 
2 members found this post helpful.
Old 10-14-2015, 06:00 AM   #23
siranjeevi
Member
 
Registered: May 2010
Location: India
Posts: 79

Original Poster
Rep: Reputation: 7
Thumbs up thanks Grail.

grail, Thank you so much and that is the exactly what I wanted.

and thank you HMW, Firstfire and all others who helped me and the others who is looking forward for similar task.
 
Old 10-14-2015, 06:57 AM   #24
HMW
Member
 
Registered: Aug 2013
Location: Sweden
Distribution: Debian, Arch, Red Hat, CentOS
Posts: 773
Blog Entries: 3

Rep: Reputation: 369Reputation: 369Reputation: 369Reputation: 369
Quote:
Originally Posted by grail View Post
And of course now the OP has taken that freshly created comma you put in, out of his last post


Quote:
Originally Posted by grail View Post
And as per syg00's suggestion (with no fun whatsoever ... lol):
Code:
sed -rn '/[0-9]{2}[A-Z]{2} /s/([^ ]*) *([^ ]*) *(.{16})(.{16})(.{16})(.*)/\1,\2,\3,\4,\5,\6/p' file
And some ruby, which of course could have just used the referencing as well:
Code:
ruby -ane 'if /\d{2}\w{2} /;1.upto($F[2].size / 16){|n| $F[2].insert(n * 16 + n -1, ",")};puts $F[0..2] * "," + " " + $F[3..-1] * " ";end' file
Nice work. Haven't tried them, but I'm sure they do the job. I lean more towards this approach myself, but your sed and ruby are certainly impressive!

All the best!
HMW
 
Old 10-14-2015, 07:15 AM   #25
siranjeevi
Member
 
Registered: May 2010
Location: India
Posts: 79

Original Poster
Rep: Reputation: 7
Thumbs up

Brillant Grill !

A million thanks to the contributors to this thread, here is the complete script that i used, may be it might be useful for someone else.

HVM, I didn't use python because I am zero in it. So, I used bash. I still used grep '08GR\|08TR\|08AC\|09FT\|09F1\|08JA\|08TS\|08RX' because using sed -rn '/[0-9]{2}[A-Z]{2} /s/([^ ]*) *([^ ]*) *(.{16})(.{16})(.{16})(.*)/\1,\2,\3,\4,\5,\6/p alone prints lines starting with 12FR: which I didn't want.

The following script will appends the file name to the end of each lines and save all the lines to file named output.

Code:
#!/bin/bash
for f in *.txt
do
 sed -i 's/$/ '",$f"'/' "$f"
cat $f | grep '08GR\|08TR\|08AC\|09FT\|09F1\|08JA\|08TS\|08RX' | sed -rn '/[0-9]{2}[A-Z]{2} /s/([^ ]*) *([^ ]*) *(.{16})(.{16})(.{16})(.*)/\1,\2,\3,\4,\5,\6/p' >> output
done
 
Old 10-14-2015, 07:22 AM   #26
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,140

Rep: Reputation: 4122Reputation: 4122Reputation: 4122Reputation: 4122Reputation: 4122Reputation: 4122Reputation: 4122Reputation: 4122Reputation: 4122Reputation: 4122Reputation: 4122
Quote:
Originally Posted by siranjeevi View Post
prints lines starting with 12FR: which I didn't want.
It shouldn't - did you cut-and-paste grail's solution ?.
Better if you had.
 
Old 10-14-2015, 07:35 AM   #27
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,008

Rep: Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193
Also, a few points:

1. Do not use cat when all the commands you are using can already read files :- Useless use of cat

2. If you are going to use grep then the following piece of sed should be removed :- /[0-9]{2}[A-Z]{2} /

3. It is not possible for the sed structure to display 12FR: as /[0-9]{2}[A-Z]{2} / has a space at the end before closing / so the colon ending string would not match

4. No need for the individual sed and appending the file name, simply place your variable in the main sed

5. In addition to above (4), you also do not need to go crazy with the opening and closing quotes:
Code:
sed -i "s/$/ ,$f/" "$f"
 
  


Reply

Tags
linux, scripting



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Shell script to find a string and print x lines before and y lines after the string igorza Linux - Newbie 6 04-18-2013 04:31 PM
[SOLVED] sort alpha numeric column with numbers first and then alphabets ip_address Programming 7 04-15-2012 02:51 PM
[SOLVED] Print numbers and associated text belonging to an interval of numbers Trd300 Linux - Newbie 27 03-11-2012 05:58 AM
How to print lines when debugging shell script shayke23 Linux - Software 3 03-07-2011 03:54 AM
Script to print range of numbers. First script raige Programming 2 11-15-2010 08:01 AM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 01:22 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration