[SOLVED] script to print only lines that are starting with two numbers and two alphabets
ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
#!/usr/bin/env python3
"""
http://www.linuxquestions.org/questions/showthread.php?p=5433986#post5433986
"""
import re
import sys
# File to open via arg 1
theFile = sys.argv[1]
# Regex start of line
lineStart = re.compile('[0-9]{2}[A-Z]{2} +')
# Regex to split
toSplit = re.compile(' +')
with open(theFile) as infile:
for line in infile:
if re.match(lineStart, line):
listLine = toSplit.split(line)
listLine[2] = listLine[2][:16] + "," + listLine[2][16:]
newLine = ",".join(listLine)
print(newLine, end="")
exit(0)
Ugh! Yes, you're right, I missed that somehow <irony>despite the very clear and obvious specs</irony>. Thanks for pointing this out.
So, anyway... Here we go. Including the final comma:
Code:
#!/usr/bin/env python3
"""
http://www.linuxquestions.org/questions/showthread.php?p=5433986#post5433986
"""
import re
import sys
# File to open via arg 1
theFile = sys.argv[1]
# Regex start of line
lineStart = re.compile('[0-9]{2}[A-Z]{2} +')
# Regex to split
toSplit = re.compile(' +')
with open(theFile) as infile:
for line in infile:
if re.match(lineStart, line):
listLine = toSplit.split(line)
listLine[2] = listLine[2][:16] + "," + listLine[2][16:]
# Get the position of final comma
lastCommaPos = re.search('[a-z]{2,}', listLine[2])
lastComma = lastCommaPos.end()
# Insert comma at lastComma
listLine[2] = listLine[2][:lastComma] + "," + listLine[2][lastComma:]
newLine = ",".join(listLine)
# Finally, remove the last unwanted whitespace character
newLine = newLine.replace(" ", "", 1)
print(newLine, end="")
exit(0)
the final output should print separate the 17th digit (incluing period irrespective or numbers or alphabets) in 3rd column by comma. so the final output should be.
the final output should print separate the 17th digit (incluing period irrespective or numbers or alphabets) in 3rd column by comma. so the final output should be.
A million thanks to the contributors to this thread, here is the complete script that i used, may be it might be useful for someone else.
HVM, I didn't use python because I am zero in it. So, I used bash. I still used grep '08GR\|08TR\|08AC\|09FT\|09F1\|08JA\|08TS\|08RX' because using sed -rn '/[0-9]{2}[A-Z]{2} /s/([^ ]*) *([^ ]*) *(.{16})(.{16})(.{16})(.*)/\1,\2,\3,\4,\5,\6/p alone prints lines starting with 12FR: which I didn't want.
The following script will appends the file name to the end of each lines and save all the lines to file named output.
Code:
#!/bin/bash
for f in *.txt
do
sed -i 's/$/ '",$f"'/' "$f"
cat $f | grep '08GR\|08TR\|08AC\|09FT\|09F1\|08JA\|08TS\|08RX' | sed -rn '/[0-9]{2}[A-Z]{2} /s/([^ ]*) *([^ ]*) *(.{16})(.{16})(.{16})(.*)/\1,\2,\3,\4,\5,\6/p' >> output
done
1. Do not use cat when all the commands you are using can already read files :- Useless use of cat
2. If you are going to use grep then the following piece of sed should be removed :- /[0-9]{2}[A-Z]{2} /
3. It is not possible for the sed structure to display 12FR: as /[0-9]{2}[A-Z]{2} / has a space at the end before closing / so the colon ending string would not match
4. No need for the individual sed and appending the file name, simply place your variable in the main sed
5. In addition to above (4), you also do not need to go crazy with the opening and closing quotes:
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.