[SOLVED] script to print only lines that are starting with two numbers and two alphabets

HMW · 10-13-2015, 01:21 PM

As I suspected, with Python this was fairly straightforward, and since we already have a solution up there, here is mine.

With this infile (siranjeevi.txt):

Code:

13F1SomeTxt someother: 78
12DR1:RANDOMTXT:
12FR:OTHERRANDOMTXT:
08GR         123                 997.2084586228524981.8281353167449.0005412762294gutt 01d 5h 5s
09FT          256                 1007.257457877084992.1472768690449.0261941388321sat 02d 6h 8s

And this Python (reg16.py):

Code:

#!/usr/bin/env python3

"""
http://www.linuxquestions.org/questions/showthread.php?p=5433986#post5433986
"""
import re
import sys

# File to open via arg 1
theFile = sys.argv[1]
# Regex start of line
lineStart = re.compile('[0-9]{2}[A-Z]{2} +')
# Regex to split
toSplit = re.compile('  +')

with open(theFile) as infile:
    for line in infile:
        if re.match(lineStart, line):
            listLine = toSplit.split(line)
            listLine[2] = listLine[2][:16] + "," + listLine[2][16:]
            newLine = ",".join(listLine)
            print(newLine, end="")

exit(0)

I get this result:

Code:

./reg16.py siranjeevi.txt
08GR,123,997.208458622852,4981.8281353167449.0005412762294gutt 01d 5h 5s
09FT,256,1007.25745787708,4992.1472768690449.0261941388321sat 02d 6h 8s

grail · 10-13-2015, 02:44 PM

Nice work HMW, but you still need one more comma:

Code:

08GR,123,997.208458622852,4981.8281353167449.0005412762294gutt,01d 5h 5s
09FT,256,1007.25745787708,4992.1472768690449.0261941388321sat,02d 6h 8s

syg00 · 10-13-2015, 07:52 PM

What am I missing here ?. A single sed should be able to accomplish everything requested - define the required fields and use back-references.

HMW · 10-14-2015, 01:09 AM

Quote:

Originally Posted by grail

Nice work HMW, but you still need one more comma

Ugh! Yes, you're right, I missed that somehow <irony>despite the very clear and obvious specs</irony>. Thanks for pointing this out.

So, anyway... Here we go. Including the final comma:

Code:

#!/usr/bin/env python3

"""
http://www.linuxquestions.org/questions/showthread.php?p=5433986#post5433986
"""
import re
import sys 

# File to open via arg 1
theFile = sys.argv[1]
# Regex start of line
lineStart = re.compile('[0-9]{2}[A-Z]{2} +')
# Regex to split
toSplit = re.compile('  +')

with open(theFile) as infile:
    for line in infile:
        if re.match(lineStart, line):
            listLine = toSplit.split(line)
            listLine[2] = listLine[2][:16] + "," + listLine[2][16:]
            # Get the position of final comma
            lastCommaPos = re.search('[a-z]{2,}', listLine[2])
            lastComma = lastCommaPos.end()            
            # Insert comma at lastComma
            listLine[2] = listLine[2][:lastComma] + "," + listLine[2][lastComma:]
            newLine = ",".join(listLine)
            # Finally, remove the last unwanted whitespace character
            newLine = newLine.replace(" ", "", 1)
            print(newLine, end="")

exit(0)

Produces...

Code:

./reg16v2.py siranjeevi.txt 
08GR,123,997.208458622852,4981.8281353167449.0005412762294gutt,01d 5h 5s
09FT,256,1007.25745787708,4992.1472768690449.0261941388321sat,02d 6h 8s

Best regards,
HMW

siranjeevi · 10-14-2015, 01:10 AM

hey all,

Thanks for your help, we are almost there,

the final output should print separate the 17th digit (incluing period irrespective or numbers or alphabets) in 3rd column by comma. so the final output should be.

Code:

08GR,123,997.208458622852,4981.82813531674,49.0005412762294,gutt 01d 5h 5s
09FT,256,1007.25745787708,4992.14727686904,49.0261941388321,sat 02d 6h 8s

HMW · 10-14-2015, 01:22 AM

Quote:

Originally Posted by siranjeevi

hey all,

Thanks for your help, we are almost there,

the final output should print separate the 17th digit (incluing period irrespective or numbers or alphabets) in 3rd column by comma. so the final output should be.

Code:

08GR,123,997.208458622852,4981.82813531674,49.0005412762294,gutt 01d 5h 5s
09FT,256,1007.25745787708,4992.14727686904,49.0261941388321,sat 02d 6h 8s

Using what have already been given in this thread, this should be a walk in the park for you now.

Good luck!
HMW

grail · 10-14-2015, 04:56 AM

And of course now the OP has taken that freshly created comma you put in, out of his last post

And as per syg00's suggestion (with no fun whatsoever ... lol):

Code:

sed -rn '/[0-9]{2}[A-Z]{2} /s/([^ ]*) *([^ ]*) *(.{16})(.{16})(.{16})(.*)/\1,\2,\3,\4,\5,\6/p' file

And some ruby, which of course could have just used the referencing as well:

Code:

ruby -ane 'if /\d{2}\w{2} /;1.upto($F[2].size / 16){|n| $F[2].insert(n * 16 + n -1, ",")};puts $F[0..2] * "," + " " + $F[3..-1] * " ";end' file

siranjeevi · 10-14-2015, 06:00 AM

grail, Thank you so much and that is the exactly what I wanted.

and thank you HMW, Firstfire and all others who helped me and the others who is looking forward for similar task.

HMW · 10-14-2015, 06:57 AM

Quote:

Originally Posted by grail

And of course now the OP has taken that freshly created comma you put in, out of his last post

Quote:

Originally Posted by grail

And as per syg00's suggestion (with no fun whatsoever ... lol):

Code:

sed -rn '/[0-9]{2}[A-Z]{2} /s/([^ ]*) *([^ ]*) *(.{16})(.{16})(.{16})(.*)/\1,\2,\3,\4,\5,\6/p' file

And some ruby, which of course could have just used the referencing as well:

Code:

ruby -ane 'if /\d{2}\w{2} /;1.upto($F[2].size / 16){|n| $F[2].insert(n * 16 + n -1, ",")};puts $F[0..2] * "," + " " + $F[3..-1] * " ";end' file

Nice work. Haven't tried them, but I'm sure they do the job. I lean more towards this approach myself, but your sed and ruby are certainly impressive!

All the best!
HMW

siranjeevi · 10-14-2015, 07:15 AM

Brillant Grill !

A million thanks to the contributors to this thread, here is the complete script that i used, may be it might be useful for someone else.

HVM, I didn't use python because I am zero in it. So, I used bash. I still used grep '08GR\|08TR\|08AC\|09FT\|09F1\|08JA\|08TS\|08RX' because using sed -rn '/[0-9]{2}[A-Z]{2} /s/([^ ]*) *([^ ]*) *(.{16})(.{16})(.{16})(.*)/\1,\2,\3,\4,\5,\6/p alone prints lines starting with 12FR: which I didn't want.

The following script will appends the file name to the end of each lines and save all the lines to file named output.

Code:

#!/bin/bash
for f in *.txt
do
 sed -i 's/$/ '",$f"'/' "$f"
cat $f | grep '08GR\|08TR\|08AC\|09FT\|09F1\|08JA\|08TS\|08RX' | sed -rn '/[0-9]{2}[A-Z]{2} /s/([^ ]*) *([^ ]*) *(.{16})(.{16})(.{16})(.*)/\1,\2,\3,\4,\5,\6/p' >> output
done

syg00 · 10-14-2015, 07:22 AM

Quote:

Originally Posted by siranjeevi

prints lines starting with 12FR: which I didn't want.

It shouldn't - did you cut-and-paste grail's solution ?.
Better if you had.

grail · 10-14-2015, 07:35 AM

Also, a few points:

1. Do not use cat when all the commands you are using can already read files :- Useless use of cat

2. If you are going to use grep then the following piece of sed should be removed :- /[0-9]{2}[A-Z]{2} /

3. It is not possible for the sed structure to display 12FR: as /[0-9]{2}[A-Z]{2} / has a space at the end before closing / so the colon ending string would not match

4. No need for the individual sed and appending the file name, simply place your variable in the main sed

5. In addition to above (4), you also do not need to go crazy with the opening and closing quotes:

Code:

sed -i "s/$/ ,$f/" "$f"