LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   Remove Word wrap from a file in Unix (https://www.linuxquestions.org/questions/linux-newbie-8/remove-word-wrap-from-a-file-in-unix-4175522263/)

debumail186 10-15-2014 06:13 PM

Remove Word wrap from a file in Unix
 
Hi friends,

I am trying to remove word wrap from a file in Unix..

The contents of the file are as below.. (just as example)

Entries:

ENTRY TIME SUMMARY
entry-000 2014-10-13 15:49:06 oracle_agent, ep01client01.comfin.ge.com, User
logged in via CLI
entry-001 2014-10-13 15:49:07 oracle_agent, ep01client01.comfin.ge.com, User
logged out of CLI
entry-002 2014-10-13 15:49:08 oracle_agent, ep01client01.comfin.ge.com, User
logged out of CLI
entry-003 2014-10-13 16:06 oracle_agent, ep01client01.comfin.ge.com, User
logged in via CLI
entry-004 2014-10-13 16:06:02 oracle_agent, ep01client01.comfin.ge.com, User
logged out of CLI
entry-005 2014-10-13 16:09:05 oracle_agent, ep01client01.comfin.ge.com, User
logged in via CLI
entry-006 2014-10-13 16:09:07 oracle_agent, ep01client01.comfin.ge.com, User
logged in via CLI
entry-007 2014-10-13 16:09:08 oracle_agent, ep01client01.comfin.ge.com, User
logged out of CLI
As you may note, the string 'logged out of CLI' goes on the next line , since it exceeds the screen width of 80 chars.

I want to have all of the contents for a line starting with entryxxx , in the same line and not have it jump over to the new line.

Please advise , what would be the best way to do it.

Thanks in advance..

Best
Dev

frankbell 10-15-2014 07:58 PM

What program are you using the view the file?

debumail186 10-15-2014 08:14 PM

Hello Frank,

Thanks for your reply.

The file is normal text file which is generated from another server.. I want to remove the word wrap and filter out a few other things from the file.

Basically want to generate a converted file with no word wrap (so that all those 'logged in via CLI' or 'logged out via CLI' lines are inline with the previous line) and next filter out a few other contents from the file which are not useful for me.

I use vi to open the file in Linux.

Kind Regards & Thanks
Dev

evo2 10-15-2014 08:24 PM

Hi,

I think frankbell's point is that word wrap like this is normally an artifact of the program you are using to view the file not the file itself. He is looking for some sort of confirmation that the file really does contain those newlines.

Evo2.

debumail186 10-15-2014 08:34 PM

1 Attachment(s)
Hi Evo,

Yes the file does contain new lines...

Attaching a sample log file.

Kind Regards & Thanks
Debasish

frankbell 10-15-2014 08:48 PM

I think evo2 explained what I was thinking better than I did. I also should have asked on what OS the file was created.

I see in the screenshot references to Oracle. Was that Oracle on Windows or Linux?

Windows and *nix handle new-lines differently. This link explains it nicely: http://www.cs.toronto.edu/~krueger/c...e-endings.html

jpollard 10-15-2014 10:01 PM

That isn't word wrap... That is the formatting, and the second line is deliberately indented.

You can try some perl:

Code:

#!/usr/bin/perl

while (<>) {
  chop;
  if (/^\s+(.*)/) {
        print " ",$1;
  } else {
        print "\n",$_;
  }
}

Note, this will leave the last line without a newline terminator unless there is an empty line

debumail186 10-20-2014 06:47 PM

Thanks jpollard.. that works really good.

Cheers
Dev

debumail186 10-21-2014 05:22 PM

Hi jpollard,

I did a little change to my initial code (used diff and awk) which resulted in a file formatted as below,


entry-110832 2014-10-21 21:21:05 zfsauditlogger, ed01client01.comfin.ge.com,
User logged in via CLI
entry-110833 2014-10-21 21:21:05 zfsauditlogger, ed01client01.comfin.ge.com,
User logged out of CLI
entry-110834 2014-10-21 21:21:29 zfsauditlogger, ed01client01.comfin.ge.com,
User logged out of CLI
entry-110835 2014-10-21 21:22:43 zfsauditlogger, ed01client01.comfin.ge.com,
User logged out of CLI
entry-110836 2014-10-21 21:24:58 zfsauditlogger, ed01client01.comfin.ge.com,
User logged in via CLI

Need some advise on changes to the below perl code to remove line wrap from the above formatted text.
...........
#!/usr/bin/perl

while (<>) {
chop;
if (/^\s+(.*)/) {
print " ",$1;
} else {
print "\n",$_;
}
}
...........

Thanks in advance for your help
Debasish

jpollard 10-21-2014 07:56 PM

That last is harder because of the lack of simple identification of the line continuation.

The following is a bit of an assumption: the code assumes that the start of the record is always "entry-" followed by exactly 6 digits. If the second line is in the same format as the first, it will be joined onto the first.

Code:

#!/usr/bin/perl

while (<>) {
  chop;
  if (!/^entry-\d{6}/) {
        print " ",$_,"\n";
  } else {
        print $_;
  }
}

The reason this is an assumption is that the second line (being merged) may not have a unique string (which is why the indentation format worked - indentation is always blank). So the key has to be the first line - and the next line is assumed to be a continuation.

debumail186 10-21-2014 08:18 PM

Hey jpollard,

doing some research on that..

worked quite well .. only thing which seems to be different are the below lines in the output..

entry-110896 2014-10-21 23:45:16 oracle_agent, 3.154.219.140, timed out
entry-110897 2014-10-21 23:46:31 oracle_agent, 3.154.219.140,entry-110898 2014-10-21 23:46:33 oracle_agent, 3.154.219.140,entry-110899 2014-10-21 23:46:52 oracle_agent, 3.154.219.140,entry-110900 2014-10-21 23:49:34 oracle_agent, ep01client02.comfin.ge.com, logged in via CLI
entry-110901 2014-10-21 23:49:39 oracle_agent, ep01client02.comfin.ge.com, logged out of CLI
entry-110902 2014-10-21 23:50:05 oracle_agent, ep01client02.comfin.ge.com, logged in via CLI
entry-110903 2014-10-21 23:50:08 oracle_agent, ep01client02.comfin.ge.com, logged out of CLI
entry-110904 2014-10-21 23:50:22 oracle_agent, ep01client02.comfin.ge.com, logged in via CLI
entry-110905 2014-10-21 23:50:24 oracle_agent, ep01client02.comfin.ge.com, logged out of CLI
entry-110906 2014-10-22 00:02:17 oracle_agent, 3.154.219.140, timed out
entry-110907 2014-10-22 00:02:17 oracle_agent, 3.154.219.140, timed out
entry-110908 2014-10-22 00:06:17 oracle_agent, 3.154.219.140, timed out
entry-110909 2014-10-22 00:06:31 oracle_agent, 3.154.219.140,entry-110910 2014-10-22 00:09:34 oracle_agent, ep01client02.comfin.ge.com, logged in via CLI
entry-110911 2014-10-22 00:09:39 oracle_agent, ep01client02.comfin.ge.com, logged out of CLI
entry-110912 2014-10-22 00:10:04 oracle_agent, ep01client02.comfin.ge.com, logged in via CLI
entry-110913 2014-10-22 00:10:08 oracle_agent, ep01client02.comfin.ge.com, logged out of CLI
entry-110914 2014-10-22 00:10:22 oracle_agent, ep01client02.comfin.ge.com, logged in via CLI
entry-110915 2014-10-22 00:10:24 oracle_agent, ep01client02.comfin.ge.com, logged out of CLI
entry-110916 2014-10-22 00:29:34 oracle_agent, ep01client02.comfin.ge.com, logged in via CLI
entry-110917 2014-10-22 00:29:39 oracle_agent, ep01client02.comfin.ge.com, logged out of CLI
entry-110918 2014-10-22 00:30:04 oracle_agent, ep01client02.comfin.ge.com, logged in via CLI
entry-110919 2014-10-22 00:30:08 oracle_agent, ep01client02.comfin.ge.com, logged out of CLI
entry-110920 2014-10-22 00:30:22 oracle_agent, ep01client02.comfin.ge.com, logged in via CLI
entry-110921 2014-10-22 00:30:24 oracle_agent, ep01client02.comfin.ge.com, logged out of CLI
entry-110922 2014-10-22 00:45:17 oracle_agent, 3.154.219.140, timed out
entry-110923 2014-10-22 00:46:31 oracle_agent, 3.154.219.140,entry-110924 2014-10-22 00:46:33 oracle_agent, 3.154.219.140,entry-110925 2014-10-22 00:46:52 oracle_agent, 3.154.219.140,entry-110926 2014-10-22 00:49:34 oracle_agent, ep01client02.comfin.ge.com, logged in via CLI
entry-110927 2014-10-22 00:49:39 oracle_agent, ep01client02.comfin.ge.com, logged out of CLI
entry-110928 2014-10-22 00:50:04 oracle_agent, ep01client02.comfin.ge.com, logged in via CLI
entry-110929 2014-10-22 00:50:08 oracle_agent, ep01client02.comfin.ge.com, logged out of CLI

some of the entry.*** lines (e.g entry-110924,entry-110925 etc) are moved up to the previous lines specifically the ones which have previous lines something like below..

entry-110923 2014-10-22 00:46:31 oracle_agent, 3.154.219.140, (mark nothing after the comma (,) at the end)

as opposed to general pattern below

entry-110921 2014-10-22 00:30:24 oracle_agent, ep01client02.comfin.ge.com, logged out of CLI
entry-110922 2014-10-22 00:45:17 oracle_agent, 3.154.219.140, timed out

Pls advise.

Kind Regards & Thanks
Debasish

jpollard 10-21-2014 09:00 PM

Yup - that would be due to the exceptions to format.

Try this one. It adds a bit more by detecting when a newline may be needed before the "entry-"

Code:

#!/usr/bin/perl

$nl = 0;        # assume newline not needed yet
while (<>) {
  chop;
  if (!/^entry-\d{6}/) {
        print " ",$_,"\n";
        $nl = 0;        # newline not needed
  } else {
        print "\n" if ($nl); # newline is needed
        $nl = 1;        # need a newline in the future
        print $_;
  }
}
print "\n" if ($nl);    # newline is needed


debumail186 11-13-2014 06:52 PM

Thanks jpollard. That works great.

The linux community rocks!! :)

Dev


All times are GMT -5. The time now is 03:42 PM.