LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   Sed adds a space instead of tab at end of line (https://www.linuxquestions.org/questions/linux-newbie-8/sed-adds-a-space-instead-of-tab-at-end-of-line-856253/)

kaprasanna1 01-14-2011 03:31 AM

Sed adds a space instead of tab at end of line
 
The objective is to read a file line by line, add a tab at end of each line and add a value(number) after the tab.

My script:

Code:

i=0
val=44
while read line
do
  #Ignore empty lines
  case "$line" in
        "") echo >> Report.tmp.tsv; continue;;
  esac
  case "$i" in
    0)
      dt=`date  +%d-%m-%y`
      echo $line | sed s/$/'\t'$dt/ >> Report.tmp.tsv;;
    1) echo $line | sed s/$/'\t'$val/ >> Report.tmp.tsv;;

    2) echo $line | sed s/$/'\t'$val/ >> Report.tmp.tsv;;
    3) echo $line | sed s/$/'\t'$val/ >> Report.tmp.tsv;;
   
  esac
  i=$(($i+1))
  val=$(($val+1))
done < Report.tsv

rm Report.tsv
mv Report.tmp.tsv Report.tsv

Report.tsv before running the script for the first time:
Code:

Test Line 1
This is Line number two
Line number is three
Fourth line of original report

At the end of first run, each line of Report.tsv gets appended by a space instead of a tab.
On the other hand each line of Report.tsv gets appended by tab at the end of second run onwards.

This was realized when Report.tsv was imported in open office spreadsheet.
First set of appended values get merged into the original column (Strings) and the subsequent appended values fall in distinct columns.

What is wrong here? Please guide.
Thanks in advance.

jschiwal 01-14-2011 03:59 AM

Sed is a line editor. A line will end with a new-line. Using "s/$/\t/" will only add a tab before the end of the line.

You can build up lines and then change all of the newlines (except the last) to tabs. This is usually done by adding a line to the Hold buffer; recalling the hold buffer; and performing a global replace "s/\n/\t/".

Here is an example, extracting one of the records of the lspci output, and replacing the newlines with tabs:
Code:

/sbin/lspci -v | sed -n '/Network controller/,/^$/{ /^$/!H
                                                    /^$/{H;g;s/\n/\t/gp}}'

Note the use of braces to group commands together when you want to perform more than one sed command inside a subrange. The `g' flag is needed at the end of the substitute command `s' to substitute all of the newlines in the line.

If all you want to do is replace all the newlines in a file with tabs, you could use the `tr' program instead:

tr '\n' '\t' <original_file >newfile

kaprasanna1 01-14-2011 04:58 AM

jschiwal,

Really appreciate your quick reply.
I didn't quite understand why sed adds a space (not new line) at the end of a line when it has been asked to add a tab.
When I run my script third time, first two appended columns get separated by spaces (which were by tabs earlier) and third by a proper tab.
This is extremely confusing.
Also the same sed command works like a charm if I ask it to add a comma instead of \t at end of each line.
The sed-hold buffer example you presented didn't work for me. Get "sed: -e expression #1, char 25: extra characters after command" error. I am researching more on sed and hold buffer.
And [tr '\n' '\t'] replaces every single \n by a \t so it isn't really helpful for me.

Again, thanks much for the reply.

crts 01-14-2011 05:15 AM

Hi,

are you sure that it is not a setting in OpenOffice that malforms the file?
Can you post the output of the following command
Code:

od -c Report.tmp.tsv
directly after the sed's have been apllied?

Kenhelm 01-14-2011 05:22 AM

Tabs need to be quoted to survive being echoed
Code:

line=abc$'\t'123    #  $'\t' is a tab character in bash
echo $line
abc 123            # tab has changed to a space

echo "$line"
abc    123          # tab has been preserved


jschiwal 01-14-2011 05:36 AM

I cut and pasted my posted example. It didn't have an error.

If your file is highly structured, consider using awk instead of sed. However, using tabs as record separators instead of field separators is very odd. Normally, tabs separate fields in a record, and newlines separate records.

Also, put your sed commands in double quotes if you use bash variables. An alternative is to enclose fixed text in single quotes, and variables in double quotes. You need to do the latter if you use `$' in a sed command meaning end of line.

Code:

head kmenu.trace | sed "s/^/$Date\t/"
14-01-11        execve("/usr/bin/kmenuedit", ["/usr/bin/kmenuedit"], [/* 92 vars */]) = 0
14-01-11        brk(0)                                  = 0x602000
14-01-11        mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f0049ead000
14-01-11        access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)


kaprasanna1 01-14-2011 05:47 AM

Quote:

Originally Posted by crts (Post 4223933)
Hi,

are you sure that it is not a setting in OpenOffice that malforms the file?
Can you post the output of the following command
Code:

od -c Report.tmp.tsv
directly after the sed's have been apllied?

In fact open office asks me which delimiter to use before hand upon which I select tab.

Output of
Code:

od -c Report.tmp.tsv
after 1st run:

Code:

0000000  T  e  s  t      L  i  n  e      o  n  e  \t  1  4
0000020  -  0  1  -  1  1  \n  T  h  i  s      i  s      L
0000040  i  n  e      n  u  m  b  e  r      t  w  o  \t  4
0000060  5  \n  L  i  n  e      n  u  m  b  e  r      i  s
0000100      t  h  r  e  e  \t  4  6  \n  F  o  u  r  t  h
0000120      l  i  n  e      o  f      o  r  i  g  i  n  a
0000140  l      r  e  p  o  r  t  \t  4  7  \n
0000154

Above file (generated after the first round of sed applies) opens in open office with tab as delimiter successfully.

Following is the out put of
Code:

od -c Report.tmp.tsv
after second run:


Code:

0000000  T  e  s  t      L  i  n  e      o  n  e      1  4
0000020  -  0  1  -  1  1  \t  1  4  -  0  1  -  1  1  \n
0000040  T  h  i  s      i  s      L  i  n  e      n  u  m
0000060  b  e  r      t  w  o      4  5  \t  4  5  \n  L  i
0000100  n  e      n  u  m  b  e  r      i  s      t  h  r
0000120  e  e      4  6  \t  4  6  \n  F  o  u  r  t  h   
0000140  l  i  n  e      o  f      o  r  i  g  i  n  a  l
0000160      r  e  p  o  r  t      4  7  \t  4  7  \n
0000176

This one when opened in open office with tab as delimiter; first set of appended values get merged with the row headers.

Thanks.

kaprasanna1 01-14-2011 06:34 AM

Quote:

Originally Posted by Kenhelm (Post 4223938)
Tabs need to be quoted to survive being echoed
Code:

line=abc$'\t'123    #  $'\t' is a tab character in bash
echo $line
abc 123            # tab has changed to a space

echo "$line"
abc    123          # tab has been preserved


I understand.
I'm indeed on bash.
Wrote this script to test:

Code:

line="Temporary line number one"
echo $line
val=55
some=`echo $line | sed s/$/'\t'$val/`
echo "$some"

some=`echo $some | sed s/$/'\t'$val/`
echo "$some"

output:

Code:

Temporary line number one
Temporary line number one        55
Temporary line number one 55        55

Looks like for each new tab all previous tabs are lost in bash shell.
If I wrap $some by more than one set of quotes (") all the tabs are lost including the last one.
Wonder how can I solve this easily.

Thanks.

kaprasanna1 01-14-2011 06:41 AM

jschiwal

I completely agree that tabs are normally used to separate fields and new-lines to separate records.
This articular report is to be updated on a nightly basis where it should be easy to compare values for subsequent days. Hence the upside down design!

Thanks for the pointer to awk and importance of quotes in sed.

crts 01-14-2011 06:42 AM

After your post #7 I realize that you are running the script twice on the same file. I initially assumed that you are processing several different files with your script. In this case double-quoting like
Code:

echo "$line" ...
should take care of the issue, as suggested by kenhelm.

kaprasanna1 01-14-2011 06:45 AM

Following script seems to have done the trick:
Code:

i=0
val=44
while read line
do
    #Ignore empty lines
    case "$line" in
        "") echo >> Report.tmp.tsv; continue;;
        esac
    case "$i" in
        0)
            dt=`date  +%d-%m-%y`
            some=`echo "$line" | sed s/$/'\t'$dt/`
            echo "$some" >> Report.tmp.tsv;;
        1) some=`echo "$line" | sed s/$/'\t'$val/`
            echo "$some" >> Report.tmp.tsv;;
        2) some=`echo "$line" | sed s/$/'\t'$val/`
            echo "$some" >> Report.tmp.tsv;;
        3) some=`echo "$line" | sed s/$/'\t'$val/`
            echo "$some" >> Report.tmp.tsv;;
        esac
  i=$(($i+1))
  val=$(($val+1))
done < Report.tsv
rm Report.tsv
mv Report.tmp.tsv Report.tsv

Output of od -c Report.tsv:

Code:

0000000  T  e  s  t      L  i  n  e      o  n  e  \t  1  4
0000020  -  0  1  -  1  1  \t  1  4  -  0  1  -  1  1  \n
0000040  T  h  i  s      i  s      L  i  n  e      n  u  m
0000060  b  e  r      t  w  o  \t  4  5  \t  4  5  \n  L  i
0000100  n  e      n  u  m  b  e  r      i  s      t  h  r
0000120  e  e  \t  4  6  \t  4  6  \n  F  o  u  r  t  h   
0000140  l  i  n  e      o  f      o  r  i  g  i  n  a  l
0000160      r  e  p  o  r  t  \t  4  7  \t  4  7  \n
0000176

Thanks.

crts 01-14-2011 06:54 AM

Quote:

Originally Posted by kaprasanna1 (Post 4223990)
I
If I wrap $some by more than one set of quotes (") all the tabs are lost including the last one.
Wonder how can I solve this easily.

Not sure why you want to use 'multiple' enclosing quotes, however, if you use an even number of pairs of double-quotes then you are actually not enclosing your variable.
Example:
Code:

echo "" $line ""
              ^^ opening and closing quote.
    ^^ opening and closing quote.

As you can see, the quotes are not interpreted as outer and inner quotes. Do you want to echo a the quotes? As in
Code:

echo "$line"
"content of line"

Then you will have to escape the double-quotes when you assign the value to line:
Code:

line="\"content of line\""
Please elaborate a bit more on the situation that triggers this issue. I am not sure if I fully understand what you are trying to do.

grail 01-14-2011 09:24 AM

Maybe this can give you some ideas:
Code:

i=0
val=44

while read line
do
    #Ignore empty lines
    [[ -z $line ]] && continue

    case $((i++)) in
        0)  dt=`date  +%d-%m-%y`;;
    [1-3])  (( dt = val++ ));;
    esac

    echo "$line" | sed "s/$/\t$dt/" >> Report.tmp.tsv
done < Report.tsv
rm Report.tsv
mv Report.tmp.tsv Report.tsv



All times are GMT -5. The time now is 08:35 AM.