LinuxQuestions.org - Some techniques for text file editing

- Programming (https://www.linuxquestions.org/questions/programming-9/)

- - Some techniques for text file editing (https://www.linuxquestions.org/questions/programming-9/some-techniques-for-text-file-editing-102007/)

Some techniques for text file editing

I normally use C and bash scripting when I need to deal with text files. I have encountered a situation in which I got stuck, and helping me, means helping many other since this is not a "rare situation" where you need to program this type of application. Enough crap.

Suppose I have a text file, having a format like the following (but can be generic):

line1: field1.........field2....field3..................field5
line2: ..field6.....field7..field8..................field9

(I've replaced the white space with a period '.' in this post since multiple white spaces are boiled into one white space, thus losing the scope of my query)

The spacings between the fields are not the same on purpose! Is there a way to maintain the same structure of the text file (in terms of spacing, indentation, and delimitation) but modifying/replacing specific fields using C or bash?? I did it on bash by assiging each field to a variable as in:

var1=field1
var2=field2
... and so on

then:

line1=$var1............$var2....$var3..................$var5
(maintaining the same indentation)

and then redirected it to a text file:

echo $line1 > textfile

this resulted in loss of correct indentation:

output of text file-> field1 field2 field3 field5

Thanks in advance!!

(PS: The problem is coincidentally similar to when you write on this post and many white spaces are boiled to one space!!!)

i know that in perl if you add a line to a file like this:

print FILE '.............thing';

it will keep the white spaces while

print FILE "............thing";

will not.

not too sure how relevant that is to your problem

thanks ocularbob, that did it. Instead of print, I've used:

echo "$line1" > textfile

In this way, white spaces are preserved.

this might be ugly, but the first thing that came to mind
with me was not to carry the spaces with the variables,
but make the spaces seperate variables, or really
constants.

var1=field1
var1s=" "
var2=field2
var2s=" "

line1=$var1 $var1s $var2 $var2s

i don't know. probably a stupid idea.
i haven't programmed anything in 8 years.

edit: i guess i was slow on the draw.

I must say there is nothing better (IMO) than perl for text processing. For some it seems strange to start with, but it is so, so, so much easier than using bash, c etc style languages for text processing/scripting. Just get the hang of perl's regular expressions and you are armed with the best weapon for these types of tasks.

Give it a go, you'll thank your self for it!

I've done a lot of scripting with sh,ksh,tchs,bash and once I discovered perl, I nearly never use anything else... (for scripting that is)...

Cheers.

thanks mr_segfault, I'll take your advice. Do you think that with perl I am able to just edit only particular fields of a file instead of simulating the thing by replicating the whole text file from scratch (with new fields of course) and thus not being generic?

even awk is good at that.

ganninu, You sure can.

If you post your original file inside [KODE] your file text here[/KODE] (replacing the K's with C's (I dont know how to do that without it making my text into tags :) then i'll show you the perl scripts to go what your trying to do...

And as whansard said, even awk is good fot that, although I prefer perl since it is a little more like structured languages (C type) than awk.. I'm no awk guru :)

Cheers..

In actual fact I'm fully aware that awk is very powerful - I simply adore it and encourage many newbies to use it (although I'm not an awk guru). Yet I use it for snatching and filtering purposes - i've never used it to "replace" a field inside a text file. anyways. my file looks like this: (again, I'm gonna replace the white space with a period since the Linuxquestions.org post filter white spaces into one):

name....surname....................telephone....id
.......hair color.........height.............................

(Yes it is split in two lines)

i don't know how far you want to go with this thing..
but it looks pretty much like a little database. with perl you could store all your info in a file with each item seperated by a "|" and all the info for a given person on one line. then you can reformat with the script into a more readable form.

this code will take a file formatted like:

name|surname|telephone|id|haircolor|height
name1|surname1|telephone1|id1|haircolor1|height1

and pass each item into a var

$DATFILE = "/your/database/file";

open(DATFILE, "$DATFILE");
@DATA = <DATFILE>;
close(DATFILE);
foreach $LINE(@DATA){
($name,$surname,$phone,$id,$hairclr,$height) = split(/\|/, $LINE);
print "$name $surname $phone $id\n";
print "$hairclr $height";
}

you could then print all your info into HTML using tables to keep the alignment the way you like it and never have to deal with whitespace.

maybe that just confuses the issue but i love tables for nice formating

Ok here is the code for file: doSubs.pl

Code:

#!/usr/bin/perl



$newName = $ARGV[0];

$newSurname = $ARGV[1];

$newHeight = $ARGV[2];

$newAge = $ARGV[3];



open IFILE, ">&STDIN" or die "Unable to open stdin";



while(<IFILE>)  #loops reading 1 line from file each time

{

  $line = $_; # $_ is a line read from file, I take a copy of it, probably not needed, you could work on $_ itself.



  $line =~ s/(\s+)name(\s+)/$1$newName$2/g;

  $line =~ s/(\s+)Surname(\s+)/$1$newSurname$2/g;

  $line =~ s/(\s+)height(\s+)/$1$height$2/g;

  $line =~ s/(\s+)age(\s+)/$1$newAge$2/g;



  print $line;



}

Run like:

cat templateFile.txt | doSubs.pl persons_name persons_surname persons_height persons_age

Ok to explain what is going on.

I use the args from the command line as the items that are going to be substituted.

Then the while statement will take a line from the file at a time and put it into $_ and will stop at EOF.

Then I copy the line (not needed but what the heck! : )

The the substitustion lines:

$line =~ s/(\s+)name(\s+)/$1$newName$2/g;

this says make $line = the result of the following operation on that line.

the s means substitute.

the expression is /<what to match>/<what to replace with>/ the g means global (replace more than just the first match, again this is not needed in this example).

now the bits between, first the what to match.

(\s+) says match 1 or more (the +) white space characters (the \s) and store the match in $1, then match the token 'name' then again match 1 or more white space characters and store the match in $2.

Now the substitution bit:

the $1 says to put here what was matched in the first wild card match (/s+) then put the contents of $newName then the contents of $2 (from the second (/s+)). etc

so this example workes only if you have white space seperating your text, if you were to use say '.' as in you example you would replace the (/s+) with (\.+) (that is and escaped . and a + in brackets)..

my test input file looked like:

Code:

    surname    name    height

        age    junk

I hope this is what you were after..

Let me know if i've totally missed the task :)

The code is tested but hand typed into this post since I couldnt cut and past from my vmware linux window (just set it up), so there could be typos, so if it doesnt work, look for a simple typo..

There is probably a more simple way to do this, but this is the first method that came to mind :)

Cheers

It's not really a database. It's more like a buffer... It's a bit complex to explain how/why am I using it. But for our matters, I am saving only one entry of:

name....surname....................telephone....id
.......hair color.........height.............................

in the same text file. So I can initially have:

Linus....Torvaldis....................123456....8373
.......brown.........1.75.............................

, in the textfile.

Then I have another process which reads a new variable, say, id, and put the new id instead of 8373. Each time this process is run, a new id is read, and replaces the older one, yet keeping the same structure and other field values. I need to keep the same spaces and everything as this data is passed to a serial terminal which will get confused if even a single character is shifted by 1.

So the new entry becomes:

Linus....Torvaldis....................123456....9999 (<-- new id)
.......brown.........1.75.............................

Anyways, for now I've kept it simple, and have used the echo rudimentary technique. It works, but it's not generic. I'll try to find a better solution. Thanks ocularbob. We can close the case :)

think you now have two parts that might go rather nicely together. :)
im about to go use some of what mr_seg posted in a couple of my own scripts.
there's alot of learning going on here....

Well guys it was really nice discussing these techniques with you. I'm sure many other readers are appreciating it. Thanks again guys.