Substituting some characters in a text file (shell script).

stf92 · 06-30-2012, 10:09 AM

Hi:

I have a text file from which I want to eliminate all newlines save in the case where two consecutive newlines are present. That is, one possible algorithm would be the following.

Code:

1. p=p+1           # advance character pointer
2. if char at position p = \n
       if char position p+1 = \n
           p=p+1
       else
           substitute char at p for ' '
3. goto step 1

If I want to implement it in the Bash script language, then
(a) I would begin by making use of a while sentence.
(b) I must treat p as a numeric variable.
(c) Is readline able to read char by char?
(d) And what would be a clause/instruction/sentence to write a file?
(e) Would it not be easier to have two files: one input file and one output file?

Could you give me some hints covering these points?

pixellany · 06-30-2012, 11:49 AM

Code:

sed '/^$/d' oldfile > newfile

Not exactly what you requested---this one eliminates all empty lines. I suspect that sed can also be used for the problem as stated

stf92 · 06-30-2012, 01:24 PM

I consider myself able to write the script. If only I new how to write one character at a time, in the style of C's fputc, fput, putc and putchar. But by reading the bash manual the only builtin command that does output is printf. On the contrary, for input there is the read builtin command.

ntubski · 06-30-2012, 02:35 PM

Code:

$ help read
read: read [-ers] [-u fd] [-t timeout] [-p prompt] [-a array] [-n nchars] [-d delim] [name ...]
...
If -n is supplied with a non-zero NCHARS argument, read returns after
NCHARS characters have been read.
$ help echo
echo: echo [-neE] [arg ...]
     Output the ARGs.  If -n is specified, the trailing newline is
    suppressed. ...

Quote:

If only I new how to write one character at a time, in the style of C's fputc, fput, putc and putchar. But by reading the bash manual the only builtin command that does output is printf.

You can use printf or echo for output:

Code:

$ echo -n x ; echo -n y ; echo -n z ; echo
xyz
$ printf x ; printf y ; printf z ; printf '\n'
xyz

stf92 · 06-30-2012, 02:43 PM

Thanks ntubski. 'read -u fd' reads input from file descriptor fd. But then there must be a write command that writes to a given file descriptor!

ntubski · 06-30-2012, 03:24 PM

You can use output redirection (this works for any command, not just echo):

Code:

echo -n x >file  # write the character 'x' to "file"
echo -n x >&2    # write the character 'x' to standard error

Actually you could use input redirection instead of -u for read as well.

stf92 · 06-30-2012, 03:56 PM

Yes,
while
done < $infile

The thing is that I must read/modify a file or, else have the file to modify as input and do output on another file. But I think I now have enough material to begin thinking how I'll do it. Thanks a lot.

pixellany · 06-30-2012, 05:19 PM

Your original problem statement:

Quote:

I have a text file from which I want to eliminate all newlines save in the case where two consecutive newlines are present.

A solution:

Code:

sed -n 'h; :1 n; /./{H; b1}; /^$/p; x; s/\n/ /g; p' oldfile > newfile

I've done a few tests of this, but I won't advertise it as bulletproof.

stf92 · 06-30-2012, 07:50 PM

It seems there is no choice for me but getting familiar with sed. Thanks a lot, pixellany. The command seems to work fine.

ntubski · 06-30-2012, 10:04 PM

awk version; a bit longer but more readable, I think.

Code:

awk '{if (length($0)) printf("%s ",$0); else print""} END{print""}' oldfile > newfile

stf92 · 06-30-2012, 10:31 PM

Thank you ntubski. Some day I'll get quite familiar with sed and awk. In the meantime, I'll keep your examples in order to study them in the future. I think that using the given file as input and letting the modified file be another file (output file) I can do it within a while loop and input/output redirection. For instance, for output, I'll use something like

echo -n $var1 >>outfile.

stf92 · 06-30-2012, 11:50 PM

Code:

semoi@darkstar:~/script/el_mio$ cat f1
#!/bin/bash

# 1. Read a char from infile
# 2. Output it to stdout
# 3. Goto step 1

while read -n 1 car1  # -n 1 reads only one char
do
  echo -n $car1       # -n: do not output \n  
done < infile   
exit
semoi@darkstar:~/script/el_mio$ cat infile
To be or not to be. That is the question.
Whether 'tis nobler in the mind,
semoi@darkstar:~/script/el_mio$ ./f1
Tobeornottobe.Thatisthequestion.Whether'tisnoblerinthemind,semoi@darkstar:~/script/el_mio$

As you can see, either read or echo eats the spaces. Also, line terminators. Why?

firstfire · 07-01-2012, 02:47 AM

Hi.

From `help read':

Quote:

Read a line from the standard input and split it into fields.

Reads a single line from the standard input, or from file descriptor FD
if the -u option is supplied. The line is split into fields as with word
splitting, and the first word is assigned to the first NAME, the second
word to the second NAME, and so on, with any leftover words assigned to
the last NAME. Only the characters found in $IFS are recognized as word
delimiters.

So to preserve spaces try this:

Code:

IFS="" read -n1 x && echo "[$x]"

i.e. set IFS to empty line.
To preserve newlines as well, add -d "":

Code:

while IFS="" read -n 1 -d ""  car1  # -n 1 reads only one char
do
	echo -n "$car1"       # -n: do not output \n  
done

catkin · 07-01-2012, 03:09 AM

Quote:

Originally Posted by stf92

Thank you ntubski. Some day I'll get quite familiar with sed and awk. In the meantime, I'll keep your examples in order to study them in the future. I think that using the given file as input and letting the modified file be another file (output file) I can do it within a while loop and input/output redirection. For instance, for output, I'll use something like

echo -n $var1 >>outfile.

That will work but will be gruesomely slow compared to an awk or sed solution. No matter for input files of a few hundred characters but for bigger files processed regularly ...

pixellany · 07-01-2012, 07:10 AM

Quote:

Originally Posted by ntubski

awk version; a bit longer but more readable.......

Awwwww, that takes all the fun out of it..

Seriously, I am jealous of the awk wizards---someday, I'll learn it.