a really tough (at least for me) colum data file missing value problem
Good morning guys,
I’m having a really hard time (in fact I’m completely stuck) trying to process a space separated data file because one of the columns is sometimes filled with a space where it should have a 0. The problem happens on the 39th column. Most of the time the 39th column has numbers, but sometimes ( maybe 20% of the time ) where the 39th column should have a 0, it has a space, and on a space separated data file that shifts all the data from the 40th column and on to the left one column, so now all the data that comes after the 39th is shifted over by one column and things are all screwed up. What I think will help is that 39th column should always contains an integer, and that the integer is always aligned to the right side of the column, plus the 38th and 40th columns always have 6 numbers after it’s decimal and there are always 9 “steps” from the last number in the 38th column to the last number in the 39th column. Or you could also count that there's always 15 “steps” from column 38th’s decimal point to the last number in the 39th column's integer. Here’s what it looks like, and so you can see the problem easier, I will replace spaces withr dashes, and I’ll write the decimal part of the 38th and 40th columns as “123456”, but you know that’s not how the data file really is, so just for ease of understanding Code:
column38 column39 column40 column41 1.Count over to the 38th column 2.Count over 9 “steps” from the last number in the 38th column 3.If the 9th “step” is not a number, make it a 0 4.Else continue to go down the rows looking for the problem In pseudo-code I think but, idk step 1 is easy I have no idea how to do step 2 step 3 is maybe a simple “if” statement ? and step 4 will happen just because it’s an awk/sed/of bash script, whatever yes? Thanks for helping me, Tabitha |
is it maybe some sort of byte counting after the 38th column, but I don't know how to do that
|
I am a little confused by what you want to do?
You have file in the format you mentioned (yes or no)? What do you wish to do with the data? ie put it another file? What have you tried in the way of solving your problem, outside of pseudo code? |
Code:
awk 'substr($0,26,1) == " " {print}' test.lst |
With Perl
Code:
#!/usr/bin/perl Code:
markus@samsung:~/Programmierung/perl$ cat text.txt It searches where FALSE/TRUE is in the wrong column and inserts a 0 in the second column, you will have to change the column numbers. Here it prints only the changed lines. Markus |
yep, the file is white space delimeted.
the fixed file needs to look like this: Code:
column38 column39 column40 column41 |
markush that might just work??? but I will need it to print out the whole file, not just the fixed rows
couple questions..... the 3 in ar[3] corresponds to the FALSE|TRUE column because it is 0 based? the 2 in the line splice @ar, 2, 0, "0" ; means go back 2 columns the "0" in the line splice @ar, 2, 0, "0" ; means pad with a 0 what does the first 0 mean in that line thank you sooooo much for your help!!! Tabby |
The splice command of Perl takes these arguments, the array, the position, the length and the value to insert. the 0 is the length.
You should read Code:
perldoc -f splice Code:
#!/usr/bin/perl |
sounds great, I'll give it a try.....
thanks again, Tabby |
For the formatted output you should take a look at Perls write command.
Code:
perldoc -f write Code:
perldoc -f format Markus |
it gave me the error
Code:
Can't locate feature.pm in @INC (@INC contains: /usr/lib64/perl5/site_perl/5.8.8/x86_64-linux-thread-multi /usr/lib/perl5/site_perl/5.8.8 /usr/lib/perl5/site_perl /usr/lib64/perl5/vendor_perl/5.8.8/x86_64-linux-thread-multi /usr/lib/perl5/vendor_perl/5.8.8 /usr/lib/perl5/vendor_perl /usr/lib64/perl5/5.8.8/x86_64-linux-thread-multi /usr/lib/perl5/5.8.8 .) at ./script.pl line 5 ps. trying to help myself, I tried to do a perldoc on 'feature' but it said there is no documentation for perl function 'feature' thenI looked on Google and it said I have to have perl 5.10 changing to 5.10 is probably not going to happen but I'll ask :( please say it's fixable by another way :) |
You can substitute
Code:
say "@ar" ; Code:
print "@ar\n" ; Markus |
Actually, I'd say the Perl unpack fn is ideal for this http://linux.die.net/man/1/perlpacktut
|
Given the text.txt file in post #5, then this sed command (which matches the first 24 characters in a line followed by a space, then changes the space to zero)
Code:
sed 's:\(^.\{24\}\) :\10:g' text.txt Code:
723.123456 1321 9462.123456 FALSE etc. The above assumes that your file has fixed width columns. |
Well as an alternative:
Code:
ruby -ane '$F.insert(1,0) if $F.length == 4;puts $F.join("\t")' file |
All times are GMT -5. The time now is 05:44 AM. |