Convert fields in one column to single line using common field separator in the column

say_hi_ravi · 08-08-2017, 12:37 PM

I have a file

Code:

---
_id:abc
config:qa
---
_id:lmn
config:dev
---
_id:xyz
config:xyz
---

I want following o/p

Code:

--- _id:abc config:qa
--- _id:lmn config:dev
--- _id:xyz config:xyz
---

I tried using it with awk with no luck

Code:

awk '{BEGIN OFS = "---"; ORS =" " } {print}'

MensaWater · 08-08-2017, 01:04 PM

Code:

awk '/---/ {printf $0" "}
/_id/ {printf $0" "}
/config/ {printf $0 "\n"}' <filename>
echo ""

You can have have awk look for multiple lines based on separate search patterns and output on each based on that line. Using printf rather than print it will output the combined 3 lines on one line.

So in the above we search for the line that contains literal "---" and print that followed by a space, we then look for the line that contains the literal "_id" and print that followed by a space and finally look for the line that contains literal "config" and print that followed by a newline.
At end we do the echo to put in a blank line.

Notice on the third line you specify the file you're using where I put <filename>.

The above of course assumes that the lines are all as you show in your sample. You'd have to play with regex for other differences or similarities.

Sefyir · 08-08-2017, 05:24 PM

This is a good use of positive lookup in regular expressions.
This starts with ---, then continues until the next character matches --- and repeats. Interestingly, This does not read line by line, but as a single stream. Because it iterates the results as it reads it, it'll begin printing out results immediately and uses minimal memory.

Code:

$ ./converter.py -d'---' -c'\n' myfile # Default action
--- _id:abc config:qa 
--- _id:lmn config:dev 
--- _id:xyz config:xyz

Code:

#!/usr/bin/env python3

import re
import sys
import argparse
import codecs

def unescaped_str(arg_str):
    return codecs.decode(str(arg_str), 'unicode_escape')

parser = argparse.ArgumentParser(description='Convert delimiter to singlelines')
parser.add_argument('-d', '--row-delimiter',
        type=unescaped_str,
        default='---')
parser.add_argument('-c', '--column-delimiter',
        type=unescaped_str,
        action='store',
        default='\n')
args, other_args = parser.parse_known_args()

dash_column_regex = re.compile('{d}.*?(?={d})'.format(d=args.row_delimiter), re.DOTALL)
data = (other_args if other_args else sys.stdin)

def column_to_rows(regex_object, file_object):
    file_object = file_object.read()
    return (column.group().replace(args.column_delimiter, ' ')
            for column in regex_object.finditer(file_object))

if data is sys.stdin:
    results = [column_to_rows(dash_column_regex, data)]
else:
    results = list()
    for _file in data:
        with open(_file) as f:
            results.append(column_to_rows(dash_column_regex, f))

for _file in results:
    for line in _file:
        print(line)

pan64 · 08-09-2017, 06:56 AM

Code:

awk 'BEGIN{RS="---\n"; FS="\n"; ORS="\n"; OFS=" "} { print "---",$1,$2,$3 }'