LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   Text file - needs 'fixing' (https://www.linuxquestions.org/questions/linux-newbie-8/text-file-needs-fixing-805857/)

JGuillou 05-04-2010 10:12 AM

Text file - needs 'fixing'
 
We are currently transferring data from a proprietary DB to Postgresql. All of the data is in text files, each approx 125,000 lines long.
I have a problem with 2 of the files.

One of the columns contains a numeric which should be in the format 00000.00 or be 0.00 - unfortunately approx 25% of the records contain a null.

I can use cut to 'slice' the file into 3 sections but my problem is : how do I keep the values and only replace the nulls with 0.00

Thanks for any help

grail 05-04-2010 10:15 AM

Might be helpful to see a portion of the file in question??(also maybe a small snippet of what the file will look like after being processed)

Also, if you have used cut, show us your code and where you are having issues with it?

pixellany 05-04-2010 10:19 AM

In case a sample of the file does not make it clear, please explain what you mean by a "null". For example, do you mean the ascii "NUL" character (0x00)?

JGuillou 05-04-2010 11:15 AM

Line from file :

INSERT INTO pltrans (pt_key,pt_spkey_sp_lkey_pll_cref,pt_spkey_sp_lkey_pll_subs,pt_spkey_sp_code,pt_batch_hdr,pt_actual_ date,pt_effect_date,pt_tran_order_no,pt_trans_type,pt_text_ctx_clre,pt_text_ctx_seqn,pt_amount,pt_va t_cash_ind,pt_authorised,pt_alloc_ref,pt_bal_remain,pt_status,pt_day_terms,pt_discount,pt_cash_disco unt,pt_pay_mark,pt_supp_ref,pt_department,pt_conta,pt_total_amount,pt_ytd_turnover,pt_posting_type,p t_pb_batch_no,pt_vat_amount,pt_close,pt_due_date,pt_transfer_ref,pt_com_no,pt_cmp_seq_no) VALUES (' 270440',' 8632',' 1','DIGI01 ',' 38761',' 30/04/2004',' 15/04/2004','15925 ',' 401',' 8632',' 12012',' -150.00','N ','Y ',' 270440',' 0.00','I ',' 30',' 0.00',' -0.00','0 ','RS 0222 ',' 0','DIGI01 ',' -172.50',' ','S ',' 219',' -22.50','C ',' 15/05/2004',' ',' ',' ');

Problem column is the one just after -172.50, and before the S - which should be 0.00 and not blank.

I plan to use 'cut' to take the file to just before the field, and just after the field.

For the field I need to either, keep the value if not blank, or enter 0.00

I will then use paste to put the file back together again

My problem is the substitution either the value or 0.00 in the column

pixellany 05-04-2010 11:39 AM

OK

What is it that uniquely defines the field you want to change? e.g. is it the position? Or maybe is it related to the actual data?

If, for example, you want to replace any space, then that would be easy with SED. If it is the position, then AWK will be better.

Here's one example using SED:
Code:

sed "s/',' ','/','0.01','/g"
This replaces all instances of:
',' ','
with:
','0.01','

JGuillou 05-04-2010 11:59 AM

What defines the column is it's location (unfortunately the file in my post has lost the spaces when copying to my post)

The problem is that the column must be : equal to original value if not blank OR 0.00 if blank

Tinkster 05-04-2010 12:54 PM

Something like
Code:

awk -F, '{if($59~/ /){$59="\0470.0\047"};print $0}' insert.sql
may do the trick ...



Cheers,
Tink

JGuillou 05-04-2010 01:46 PM

Thanks, but I'm a little lost.

$59 = the correct field but I don't understand the 0470.0\047

Tinkster 05-04-2010 01:51 PM

The "problem" is that awk uses the same "single quotes" your fields
are enclosed in to protect its commands from the shell .... the \047
is just the octal value representing a ' (single quote). Writing the
octal value I find slightly more readable that escaping htem - and it's
"safer" if you ever change quoting, or embed it in a script

In other words \0470.0\047 is the same as '0.0'

Better?


Cheers,
Tink

grail 05-04-2010 10:45 PM

Assuming you want this for every line in the file either of these seem to work ok:
Code:

awk 'gsub(/\047 \047/,"\0470.00\047")' file

sed "s/' '/'0.00'/g" file


JGuillou 05-05-2010 03:26 AM

Hi

Sorted it using my initial process :

cut -c1-963 import_data.sql > jpg1
cut -c964-977 import_data.sql > jpg2
cut -c977-1104 import_data.sql > jpg3

I vi'ed jpg2 with the following :
:%s/ / 0.00/

I then glued it all back with :

paste -d '' jpg1 jpg2 jpg3 > import_data-a.sql

Probably not pretty, but hey it works - thanks for the help

grail 05-05-2010 09:06 AM

Please mark as SOLVED if you have your solution


All times are GMT -5. The time now is 04:12 PM.