[SOLVED] Inserting text at specific line X column coordinates

udiubu · 08-03-2012, 09:13 AM

Dear all,

I have a txt file with signal change values containing for example 20 lines of different column-length (ranging between 40 and 50):

0.345644 0.453233 0.567872 ...
0.354234 0.452223 0.589872 ...
0.323445 0.451111 0.567822 ...

Each value has the same number of digits.

Each line belongs to a different subject and for some coding problems some NaN are missing. I know where these NaN should be inserted, and the inserting positions change according to subject and column. As I have lots of files, but the missing values are always in the same position, I was wondering whether there is a way to automatize my insertions. I should literally do what "insert cell" in excel does: shifting columns and inserting values.

The output should look for example like the one below:

0.345644 0.453233 0.567872 ...
0.354234 NaN 0.589872 ...
0.323445 0.451111 NaN ...

Can CAT and SED help me in this?
I wouldn't mind writing a long series of inserting actions per line, as long as I can make lines insensitive to each others, so that column insertion wouldn't affect the rest.

One very space-consuming idea would be to make each line independent to the others (i.e. copy the line to a third file), add a column to it and re-append lines once again. Anything faster?

Any help would be highly appreciated!

I thank you very much for your help.

Sincerely,

Udiubu

sycamorex · 08-03-2012, 09:21 AM

I'm quite sure you could accomplish whatever you want to accomplish in either sed or awk but without more information it's really hard to say anything more. Just a set of my assumptions:

Quote:

different column-length

Quote:

Each value has the same number of digits.

These two pieces of info are contradictory. I think you mean a differing number of columns per line with a fixed number of digits per column

Quote:

I know where these NaN should be inserted, and the inserting positions change according to subject and column.

but we don't know it.

Quote:

I was wondering whether there is a way to automatize my insertions.

I have no idea unless I know the criteria for insertions.

udiubu · 08-03-2012, 09:27 AM

All right sycamorex, and sorry for having been misleading.
I indeed mean "differing number of columns per line with a fixed number of digits per column".
I have a long list of coordinates, but we could use just two of them as a test, as in the second array I proposed:

0.345644 0.453233 0.567872 ...
0.354234 NaN 0.589872 ...
0.323445 0.451111 NaN ...

so in this case I would like to put:

a NaN in the second line, column 2
a NaN in the third line, column 3

I hope this helps, and thanks again for your prompt response, Sycamorex.

Best,

Udiubu

sycamorex · 08-03-2012, 09:33 AM

Thanks for the clarification. A few more questions:

1. Is there a pattern like:

2nd line, column 2
3rd line, column 3
4th line, column 4

2. If there's no pattern like that, how would you want to specify the line number and column?
3. It looks like you want to replace the old value with NaN. Is that correct?

udiubu · 08-03-2012, 09:37 AM

1. No there's no such a pattern.
2. And indeed I was wrong with the array, I don't want to substitute, but to insert NaN and shift cells to the right. The previous array is wrong, this one is the correct one:

0.345644 0.453233 0.567872 ...
0.354234 NaN 0.452223 0.589872 ...
0.323445 0.451111 0.567822 NaN ...

I am terribly sorry for this mistake.

Udiubu

sycamorex · 08-03-2012, 10:47 AM

As I've been learning python for the last few days, I wrote the following script for you. The first command line argument is the line number, the second argument will be the column number. For example:

Code:

cat columns.txt 
0.345644 0.453233 0.567872 0.432543
0.354234 0.452223 0.589872 0.233123
0.323445 0.451111 0.567822 0.452345 0.345234
~/data/projects/python/misc % ./columns.py 2 1  # 2nd line, 1st column
0.345644 0.453233 0.567872 0.432543
NaN      0.354234 0.452223 0.589872 0.233123
0.323445 0.451111 0.567822 0.452345 0.34523
~/data/projects/python/misc % ./columns.py 3 2  # 3rd line, 2nd column
0.345644 0.453233 0.567872 0.432543
0.354234 0.452223 0.589872 0.233123
0.323445 NaN      0.451111 0.567822 0.452345 0.345234

Code:

#!/usr/bin/python
import sys

# The first argument is the line number starting with 1
# The second argument is the column number starting with 1

def main():
    if (len(sys.argv) != 3):
        print('{0} takes exactly 2 args!'.format(sys.argv[0]))
    else:
        pattern = "NaN     "
        line_number = int(sys.argv[1]) - 1
        column_number = int(sys.argv[2]) - 1
        columns = [column.rstrip() for column in open('columns.txt')]
        current_line = (columns[line_number]).split(" ")
        current_line.insert(column_number, pattern)
        columns[line_number] = ' '.join(current_line)
        new_output = '\n'.join(columns)
        print(new_output)

if __name__ == '__main__':
    main()

The only validation that it does is checking the number of command line arguments (has to be 2). It will spit an error if the line/column number you provide is out of range.

theNbomr · 08-05-2012, 10:30 AM

This isn't so much a solution as a bit of methodology that I use to solve such problems. I try to identify the key aspects of the problem, and then match those aspects to programming languages and/or tools that I know. In this case, I see that the problem involves text files that are row/column oriented. Immediately, this suggests a tool such as awk or for me, Perl. When I see 'insert', I think of the splice function in Perl. Having selected the tool and a basic operation to perform, I can develop and test on a single line of input data, and once that works, wrap it up in the file-reading and iteration control.
This kind of problem decomposition can be helpful not only when you are trying to solve a problem yourself, but also when trying to describe the parameters of your problem to others, such on forums like this one.
--- rod.

danielbmartin · 08-05-2012, 11:45 AM

The solution offered by grail (below) is superior. Therefore, I have withdrawn mine.

Daniel B. Martin

grail · 08-05-2012, 12:32 PM

Maybe something simple like:

Code:

awk -vrow=2 -vcol=3 'NR == row{$col = "NaN "$col}1' file

The negative here would be redirecting this to a new file and then renaming once completed.

If we assumed you had say a 100 insertions to make you could also place each row, column and new value on lines in a file and read them in to be used to change the file, like so:

Code:

$ cat columns.txt 
0.345644 0.453233 0.567872 0.432543
0.354234 0.452223 0.589872 0.233123
0.323445 0.451111 0.567822 0.452345 0.345234
$ cat changes.txt
2 3 Nan
3 1 0.123456
$ awk 'FNR == NR{col[$1]=$2;value[$1]=$3;next}FNR in col{$(col[FNR]) = value[FNR]" "$(col[FNR])}1' changes.txt columns.txt
0.345644 0.453233 0.567872 0.432543
0.354234 0.452223 Nan 0.589872 0.233123
0.123456 0.323445 0.451111 0.567822 0.452345 0.345234

udiubu · 08-06-2012, 01:14 AM

Grail this works perfectly!

Thanks a lot.

Best,

Udiubu

grail · 08-06-2012, 02:32 AM

@daniel - please don't withdraw an answer as it may assist others in solving not only this problem but to also see alternate ways to go about it. Remember that yours
may not be the shortest or fastest in this instance but may be better suited to a more perplexing problem. As you have outlined in previous posts, some times mine may
be a little too complex or advanced for others to follow and hence an alternative is always appreciated

danielbmartin · 08-07-2012, 09:11 AM

Quote:

Originally Posted by grail

@daniel - please don't withdraw an answer as it may assist others in solving not only this problem but to also see alternate ways to go about it. Remember that yours may not be the shortest or fastest in this instance but may be better suited to a more perplexing problem. ...

Okay, here is a different solution.

Input file 1...

Code:

0.345644 0.453233 0.567872 0.432543
0.354234 0.452223 0.589872 0.233123
0.323445 0.451111 0.567822 0.452345 0.345234

Input file 2...

Code:

2 2
3 4

The desired output file has "NaN" inserted before line 2, field 2, and also before line 3, field 4.

Desired output file...

Code:

0.345644 0.453233 0.567872 0.432543
0.354234 NaN 0.452223 0.589872 0.233123
0.323445 0.451111 0.567822 NaN 0.452345 0.345234

I like to develop code stepwise, writing work files along the way. Examination of these work files verifies that the code is working as intended. They also help others understand the method.

The development-level code is this ...

Code:

sed -r 's|^|s/L|; s| |F|; s|$|/NaN/|' < $InFile2 > $Work1
awk '{for (i = 1; i <= NF; i++)
  printf("%s", "L"NR "F" i " " $i " ")}
   {printf("%s","\n")}' $InFile1 \
|tee $Work2                      \
|sed "-f" $Work1 -               \
|tee $Work3                      \
|sed -r 's/L[0-9]+F[0-9]+ ?//g'  \
> $OutFile

Begin by converting InFile2 into a series of instructions to be performed by a sed.
This...

Code:

sed -r 's|^|s/L|; s| |F|; s|$|/NaN/|' < $InFile2 > $Work1

... creates Work1 containing this:

Code:

s/L2F2/NaN/
s/L3F4/NaN/

Note that "2 2" meaning "line 2, field 2" has been converted to a single word "L2F2". Same for all other lines in InFile2.

Now, turn our attention to InFile1.
This...

Code:

awk '{for (i = 1; i <= NF; i++)
  printf("%s", "L"NR "F" i " " $i " ")}
   {printf("%s","\n")}' $InFile1 \

... creates Work2 containing this:

Code:

L1F1 0.345644 L1F2 0.453233 L1F3 0.567872 L1F4 0.432543
L2F1 0.354234 L2F2 0.452223 L2F3 0.589872 L2F4 0.233123
L3F1 0.323445 L3F2 0.451111 L3F3 0.567822 L3F4 0.452345 L3F5 0.345234

No data has been lost. A Line-and-Field designation has been inserted ahead of each data item. These will serve as "targets" for substitutions done by a sed.

This...

Code:

|sed "-f" $Work1 -               \

... performs those substitutions, creating Work3:

Code:

L1F1 0.345644 L1F2 0.453233 L1F3 0.567872 L1F4 0.432543
L2F1 0.354234 NaN 0.452223 L2F3 0.589872 L2F4 0.233123
L3F1 0.323445 L3F2 0.451111 L3F3 0.567822 NaN 0.452345 L3F5 0.345234

Observe that the "targets" L2F2 and L3F4 have been replaced with the character string "NaN". Now the remaining (unused) targets must be removed.

This blows away those unused targets ...

Code:

|sed -r 's/L[0-9]+F[0-9]+ ?//g'  \          \

... to create the desired output file.

When satisfied that the code is working properly, the tees which create the workfiles may be removed, and the finished product is ...

Code:

sed -r 's|^|s/L|; s| |F|; s|$|/NaN/|' < $InFile2 > $Work1
awk '{for (i = 1; i <= NF; i++)
  printf("%s", "L"NR "F" i " " $i " ")}
   {printf("%s","\n")}' $InFile1 \
|sed "-f" $Work1 -               \
|sed -r 's/L[0-9]+F[0-9]+ ?//g'  \
> $OutFile

Daniel B. Martin