LinuxQuestions.org - Substituting some characters in a text file (shell script).

Page 1 of 2

Show 50 post(s) from this thread on one page

- Programming (https://www.linuxquestions.org/questions/programming-9/)

- - Substituting some characters in a text file (shell script). (https://www.linuxquestions.org/questions/programming-9/substituting-some-characters-in-a-text-file-shell-script-4175414165/)

stf92

06-30-2012 10:09 AM

Substituting some characters in a text file (shell script).

Hi:

I have a text file from which I want to eliminate all newlines save in the case where two consecutive newlines are present. That is, one possible algorithm would be the following.

Code:

1. p=p+1          # advance character pointer

2. if char at position p = \n

      if char position p+1 = \n

          p=p+1

      else

          substitute char at p for ' '

3. goto step 1

If I want to implement it in the Bash script language, then
(a) I would begin by making use of a while sentence.
(b) I must treat p as a numeric variable.
(c) Is readline able to read char by char?
(d) And what would be a clause/instruction/sentence to write a file?
(e) Would it not be easier to have two files: one input file and one output file?

Could you give me some hints covering these points?

pixellany

06-30-2012 11:49 AM

Code:

sed '/^$/d' oldfile > newfile

Not exactly what you requested---this one eliminates all empty lines. I suspect that sed can also be used for the problem as stated

stf92

06-30-2012 01:24 PM

I consider myself able to write the script. If only I new how to write one character at a time, in the style of C's fputc, fput, putc and putchar. But by reading the bash manual the only builtin command that does output is printf. On the contrary, for input there is the read builtin command.

ntubski

06-30-2012 02:35 PM

Code:

$ help read

read: read [-ers] [-u fd] [-t timeout] [-p prompt] [-a array] [-n nchars] [-d delim] [name ...]

...

If -n is supplied with a non-zero NCHARS argument, read returns after

NCHARS characters have been read.

$ help echo

echo: echo [-neE] [arg ...]

    Output the ARGs.  If -n is specified, the trailing newline is

    suppressed. ...

Quote:

If only I new how to write one character at a time, in the style of C's fputc, fput, putc and putchar. But by reading the bash manual the only builtin command that does output is printf.

You can use printf or echo for output:

Code:

$ echo -n x ; echo -n y ; echo -n z ; echo

xyz

$ printf x ; printf y ; printf z ; printf '\n'

xyz

stf92

06-30-2012 02:43 PM

Thanks ntubski. 'read -u fd' reads input from file descriptor fd. But then there must be a write command that writes to a given file descriptor!

ntubski

06-30-2012 03:24 PM

You can use output redirection (this works for any command, not just echo):

Code:

echo -n x >file  # write the character 'x' to "file"

echo -n x >&2    # write the character 'x' to standard error

Actually you could use input redirection instead of -u for read as well.

stf92

06-30-2012 03:56 PM

Yes,
while
done < $infile

The thing is that I must read/modify a file or, else have the file to modify as input and do output on another file. But I think I now have enough material to begin thinking how I'll do it. Thanks a lot.

pixellany

06-30-2012 05:19 PM

Your original problem statement:

Quote:

I have a text file from which I want to eliminate all newlines save in the case where two consecutive newlines are present.

A solution:

Code:

sed -n 'h; :1 n; /./{H; b1}; /^$/p; x; s/\n/ /g; p' oldfile > newfile

I've done a few tests of this, but I won't advertise it as bulletproof.

stf92

06-30-2012 07:50 PM

It seems there is no choice for me but getting familiar with sed. Thanks a lot, pixellany. The command seems to work fine.

ntubski

06-30-2012 10:04 PM

awk version; a bit longer but more readable, I think.

Code:

awk '{if (length($0)) printf("%s ",$0); else print""} END{print""}' oldfile > newfile

stf92

06-30-2012 10:31 PM

Thank you ntubski. Some day I'll get quite familiar with sed and awk. In the meantime, I'll keep your examples in order to study them in the future. I think that using the given file as input and letting the modified file be another file (output file) I can do it within a while loop and input/output redirection. For instance, for output, I'll use something like

echo -n $var1 >>outfile.

stf92

06-30-2012 11:50 PM

Code:

semoi@darkstar:~/script/el_mio$ cat f1

#!/bin/bash



# 1. Read a char from infile

# 2. Output it to stdout

# 3. Goto step 1



while read -n 1 car1  # -n 1 reads only one char

do

  echo -n $car1      # -n: do not output \n  

done < infile  

exit

semoi@darkstar:~/script/el_mio$ cat infile

To be or not to be. That is the question.

Whether 'tis nobler in the mind,

semoi@darkstar:~/script/el_mio$ ./f1

Tobeornottobe.Thatisthequestion.Whether'tisnoblerinthemind,semoi@darkstar:~/script/el_mio$

As you can see, either read or echo eats the spaces. Also, line terminators. Why?

firstfire

07-01-2012 02:47 AM

Hi.

From `help read':

Quote:

Read a line from the standard input and split it into fields.

Reads a single line from the standard input, or from file descriptor FD
if the -u option is supplied. The line is split into fields as with word
splitting, and the first word is assigned to the first NAME, the second
word to the second NAME, and so on, with any leftover words assigned to
the last NAME. Only the characters found in $IFS are recognized as word
delimiters.

So to preserve spaces try this:

Code:

IFS="" read -n1 x && echo "[$x]"

i.e. set IFS to empty line.
To preserve newlines as well, add -d "":

Code:

while IFS="" read -n 1 -d ""  car1  # -n 1 reads only one char

do

        echo -n "$car1"      # -n: do not output \n  

done

catkin

07-01-2012 03:09 AM

Quote:

Originally Posted by stf92 (Post 4716093)

That will work but will be gruesomely slow compared to an awk or sed solution. No matter for input files of a few hundred characters but for bigger files processed regularly ...

pixellany

07-01-2012 07:10 AM

Quote:

Originally Posted by ntubski (Post 4716084)

awk version; a bit longer but more readable.......

Awwwww, that takes all the fun out of it..;)

Seriously, I am jealous of the awk wizards---someday, I'll learn it.

All times are GMT -5. The time now is 05:25 PM.

Page 1 of 2

Show 50 post(s) from this thread on one page