LinuxQuestions.org
Review your favorite Linux distribution.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 03-09-2018, 05:54 PM   #1
shriankur
LQ Newbie
 
Registered: Mar 2018
Posts: 8

Rep: Reputation: Disabled
How to merge lines based on number of columns


Hi - I have a pipe delimited file like below

2018-01-04 16:08:00|STOP|123|True
2018-01-04 15:50:00|STOP
|456|True


I want to merge and make it like

2018-01-04 16:08:00|STOP|123|True
2018-01-04 15:50:00|STOP|456|True

How to do it. Please note number of pipes of varying based on filename.

Also I want to remove special/junk character from file. How to do it?
Please help here

Last edited by shriankur; 03-09-2018 at 05:57 PM.
 
Old 03-09-2018, 06:37 PM   #2
keefaz
LQ Guru
 
Registered: Mar 2004
Distribution: Slackware
Posts: 6,230

Rep: Reputation: 724Reputation: 724Reputation: 724Reputation: 724Reputation: 724Reputation: 724Reputation: 724
From your example data, here is a sed example to get you started
Code:
sed ':a;N;s/\n|/|/;ba' file
Code:
:a       => create a label named a
 N       => append the next input line
s/\n|/|/ => replace a newline followed by a '|' with a '|', ie delete the newline before
ba       => branch to a, ie goto label a
 
1 members found this post helpful.
Old 03-09-2018, 06:45 PM   #3
shriankur
LQ Newbie
 
Registered: Mar 2018
Posts: 8

Original Poster
Rep: Reputation: Disabled
Thanks. Will it work for any number of columns. Meaning this file should have 4 columns. In the 2nd row I got newline character. Hence needed merge.
I will test and come back to you

Thanks four help

Last edited by shriankur; 03-09-2018 at 06:46 PM.
 
Old 03-09-2018, 11:21 PM   #4
shriankur
LQ Newbie
 
Registered: Mar 2018
Posts: 8

Original Poster
Rep: Reputation: Disabled
Hi Keefaz - I just tested. It is working :-)
You rock sir.

I also wants to remove junk characters (Any unreadable character) from file.
Could you please help Keefaz?
 
Old 03-10-2018, 06:17 AM   #5
keefaz
LQ Guru
 
Registered: Mar 2004
Distribution: Slackware
Posts: 6,230

Rep: Reputation: 724Reputation: 724Reputation: 724Reputation: 724Reputation: 724Reputation: 724Reputation: 724
Just tested another sed command without branching
Code:
sed 'N;s/\n|/|/;P;D' file
For non readable characters, maybe try to remove any char that does not belong to [:print:] character class
 
2 members found this post helpful.
Old 03-10-2018, 02:05 PM   #6
shriankur
LQ Newbie
 
Registered: Mar 2018
Posts: 8

Original Poster
Rep: Reputation: Disabled
Thanks.
Keefaz sorry need your help. Your code is not working for below condition :-

2018-01-04 16:08:00|STOP|123|True
2018-01-04 15:50:00|STOP
abc|456|True

My output should be :-

2018-01-04 16:08:00|STOP|123|True
2018-01-04 15:50:00|STOP abc|456|True

Similarly for below condition

2018-01-04 16:08:00|STOP|123|True
2018-01-04 15:50:00|STOP
abc
|456|True

Output should be :-

2018-01-04 16:08:00|STOP|123|True
2018-01-04 15:50:00|STOP abc|456|True

This is a file with equal number of columns basically (It is 4 here which is configurable based on file)
Could you please help me?

Last edited by shriankur; 03-10-2018 at 02:14 PM.
 
Old 03-10-2018, 02:46 PM   #7
TB0ne
LQ Guru
 
Registered: Jul 2003
Location: Birmingham, Alabama
Distribution: SuSE, RedHat, Slack,CentOS
Posts: 21,954

Rep: Reputation: 5814Reputation: 5814Reputation: 5814Reputation: 5814Reputation: 5814Reputation: 5814Reputation: 5814Reputation: 5814Reputation: 5814Reputation: 5814Reputation: 5814
Quote:
Originally Posted by shriankur View Post
Thanks.
Keefaz sorry need your help. Your code is not working for below condition :-

2018-01-04 16:08:00|STOP|123|True
2018-01-04 15:50:00|STOP
abc|456|True

My output should be :-

2018-01-04 16:08:00|STOP|123|True
2018-01-04 15:50:00|STOP abc|456|True

Similarly for below condition

2018-01-04 16:08:00|STOP|123|True
2018-01-04 15:50:00|STOP
abc
|456|True

Output should be :-

2018-01-04 16:08:00|STOP|123|True
2018-01-04 15:50:00|STOP abc|456|True

This is a file with equal number of columns basically (It is 4 here which is configurable based on file) Could you please help me?
Keefaz *HAS* helped you a lot...but you also need to read the "Question Guidelines" link in my posting signature. At some point, *YOU* are going to have to actually do/try something of your own. keefaz has given you a HUGE head-start, but now is the time for you to apply what you were told, and accomplish your own work...or at least TRY to.

Come back and post what you've done/tried/written on your own, and what results you get.
 
2 members found this post helpful.
Old 03-10-2018, 02:53 PM   #8
shriankur
LQ Newbie
 
Registered: Mar 2018
Posts: 8

Original Poster
Rep: Reputation: Disabled
Sure TB0ne. I appreciate Keefaz has really helped me. Let me proceed further on this. Thanks for your reply TB0ne :-)
Sorry I am a newbie in the forum just joined yesterday
 
Old 03-10-2018, 03:02 PM   #9
shriankur
LQ Newbie
 
Registered: Mar 2018
Posts: 8

Original Poster
Rep: Reputation: Disabled
Thumbs up

Quote:
Originally Posted by keefaz View Post
Just tested another sed command without branching
Code:
sed 'N;s/\n|/|/;P;D' file
For non readable characters, maybe try to remove any char that does not belong to [rint:] character class
Thanks this is working. I am able to remove the junk character using tr -d by my own.
I am working on merging below file and will come back to you

2018-01-04 16:08:00|STOP|123|True
2018-01-04 15:50:00|STOP
abc|456|True

Last edited by shriankur; 03-10-2018 at 03:05 PM. Reason: Adding further points
 
Old 03-10-2018, 04:22 PM   #10
keefaz
LQ Guru
 
Registered: Mar 2004
Distribution: Slackware
Posts: 6,230

Rep: Reputation: 724Reputation: 724Reputation: 724Reputation: 724Reputation: 724Reputation: 724Reputation: 724
shriankur, TB0ne reminded a forum rule. You have to post some code to get help on it.
 
1 members found this post helpful.
Old 03-10-2018, 04:54 PM   #11
shriankur
LQ Newbie
 
Registered: Mar 2018
Posts: 8

Original Poster
Rep: Reputation: Disabled
Ok fair enough Kafeez.

Input file is

123|APAC|Australia|Sydney
456|APAC|New
Zealand|Aukland


I tried below code to make the file as 4 column delimited file.

awk -F"|" '{if (pre!="") {print pre $0; pre=""} else if (NF>=4) print; else pre=$0}' filename.

Output I got and as needed

123|APAC|Australia|Sydney
456|APAC|NewZealand|Aukland

Problem here is I need to pass 4 as an argument since file columns are varying. Is there any generic solution?
 
Old 03-10-2018, 05:29 PM   #12
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 18,134

Rep: Reputation: 2927Reputation: 2927Reputation: 2927Reputation: 2927Reputation: 2927Reputation: 2927Reputation: 2927Reputation: 2927Reputation: 2927Reputation: 2927Reputation: 2927
awk has command line options - one of which will allow you to set a variable. Have you tested that code on your own data as specified in post #6 ?. Specifically where input data spans more than 2 lines.
 
1 members found this post helpful.
Old 03-10-2018, 06:06 PM   #13
shriankur
LQ Newbie
 
Registered: Mar 2018
Posts: 8

Original Poster
Rep: Reputation: Disabled
Yes syg00 you are right.
I actually used awk -v

awk -F"|" -v var1="$VARIABLE" '{if (pre!="") {print pre $0; pre=""} else if (NF>=var1) print; else pre=$0}' filename

VARIABLE=4 I will pass as an argument from my shell

Thank you syg00 for your reply.
Special thanks to Kafeez for initial heads up.
I am marking this thread as closed and solved.
 
Old 03-10-2018, 07:16 PM   #14
keefaz
LQ Guru
 
Registered: Mar 2004
Distribution: Slackware
Posts: 6,230

Rep: Reputation: 724Reputation: 724Reputation: 724Reputation: 724Reputation: 724Reputation: 724Reputation: 724
Glad you had it working, nice to use forum SOLVED thread feature, not all new users do this

I had a perl solution, but your hawk solution seems cool too
Code:
perl -slne '$s.=$_; (split /\|/, $s)==$col and print($s), $s=""' -- -col=4 filename
 
Old 03-12-2018, 01:34 AM   #15
BudiKusasi
Member
 
Registered: Apr 2017
Posts: 54

Rep: Reputation: Disabled
Code:
sed -r ':x;/true$/I!{N;s/\n//; bx}' file
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
merge columns from multiple files in a directory based on match of two columns prasanthi yanamala Linux - Newbie 2 11-12-2015 10:11 AM
merge columns from multiple files vijay_babu1981 Linux - Newbie 21 06-24-2014 06:59 AM
[SOLVED] repeating lines X number of times based on variable using awk(?) captainentropy Linux - Newbie 2 06-25-2012 02:19 PM
unix logic to merge lines based on search ravi t Programming 6 04-12-2012 12:32 PM
how to merge multiple columns into one column linuxon Linux - Newbie 6 03-14-2012 11:17 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie

All times are GMT -5. The time now is 12:50 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration