LinuxQuestions.org
Welcome to the most active Linux Forum on the web.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 05-04-2010, 11:12 AM   #1
JGuillou
LQ Newbie
 
Registered: Aug 2008
Location: Mauritius
Distribution: Mandriva 2008 Powerpack
Posts: 12

Rep: Reputation: 0
Text file - needs 'fixing'


We are currently transferring data from a proprietary DB to Postgresql. All of the data is in text files, each approx 125,000 lines long.
I have a problem with 2 of the files.

One of the columns contains a numeric which should be in the format 00000.00 or be 0.00 - unfortunately approx 25% of the records contain a null.

I can use cut to 'slice' the file into 3 sections but my problem is : how do I keep the values and only replace the nulls with 0.00

Thanks for any help
 
Old 05-04-2010, 11:15 AM   #2
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 9,247

Rep: Reputation: 2684Reputation: 2684Reputation: 2684Reputation: 2684Reputation: 2684Reputation: 2684Reputation: 2684Reputation: 2684Reputation: 2684Reputation: 2684Reputation: 2684
Might be helpful to see a portion of the file in question??(also maybe a small snippet of what the file will look like after being processed)

Also, if you have used cut, show us your code and where you are having issues with it?
 
Old 05-04-2010, 11:19 AM   #3
pixellany
LQ Veteran
 
Registered: Nov 2005
Location: Annapolis, MD
Distribution: Arch/XFCE
Posts: 17,802

Rep: Reputation: 738Reputation: 738Reputation: 738Reputation: 738Reputation: 738Reputation: 738Reputation: 738
In case a sample of the file does not make it clear, please explain what you mean by a "null". For example, do you mean the ascii "NUL" character (0x00)?
 
Old 05-04-2010, 12:15 PM   #4
JGuillou
LQ Newbie
 
Registered: Aug 2008
Location: Mauritius
Distribution: Mandriva 2008 Powerpack
Posts: 12

Original Poster
Rep: Reputation: 0
Line from file :

INSERT INTO pltrans (pt_key,pt_spkey_sp_lkey_pll_cref,pt_spkey_sp_lkey_pll_subs,pt_spkey_sp_code,pt_batch_hdr,pt_actual_ date,pt_effect_date,pt_tran_order_no,pt_trans_type,pt_text_ctx_clre,pt_text_ctx_seqn,pt_amount,pt_va t_cash_ind,pt_authorised,pt_alloc_ref,pt_bal_remain,pt_status,pt_day_terms,pt_discount,pt_cash_disco unt,pt_pay_mark,pt_supp_ref,pt_department,pt_conta,pt_total_amount,pt_ytd_turnover,pt_posting_type,p t_pb_batch_no,pt_vat_amount,pt_close,pt_due_date,pt_transfer_ref,pt_com_no,pt_cmp_seq_no) VALUES (' 270440',' 8632',' 1','DIGI01 ',' 38761',' 30/04/2004',' 15/04/2004','15925 ',' 401',' 8632',' 12012',' -150.00','N ','Y ',' 270440',' 0.00','I ',' 30',' 0.00',' -0.00','0 ','RS 0222 ',' 0','DIGI01 ',' -172.50',' ','S ',' 219',' -22.50','C ',' 15/05/2004',' ',' ',' ');

Problem column is the one just after -172.50, and before the S - which should be 0.00 and not blank.

I plan to use 'cut' to take the file to just before the field, and just after the field.

For the field I need to either, keep the value if not blank, or enter 0.00

I will then use paste to put the file back together again

My problem is the substitution either the value or 0.00 in the column
 
Old 05-04-2010, 12:39 PM   #5
pixellany
LQ Veteran
 
Registered: Nov 2005
Location: Annapolis, MD
Distribution: Arch/XFCE
Posts: 17,802

Rep: Reputation: 738Reputation: 738Reputation: 738Reputation: 738Reputation: 738Reputation: 738Reputation: 738
OK

What is it that uniquely defines the field you want to change? e.g. is it the position? Or maybe is it related to the actual data?

If, for example, you want to replace any space, then that would be easy with SED. If it is the position, then AWK will be better.

Here's one example using SED:
Code:
sed "s/',' ','/','0.01','/g"
This replaces all instances of:
',' ','
with:
','0.01','
 
Old 05-04-2010, 12:59 PM   #6
JGuillou
LQ Newbie
 
Registered: Aug 2008
Location: Mauritius
Distribution: Mandriva 2008 Powerpack
Posts: 12

Original Poster
Rep: Reputation: 0
What defines the column is it's location (unfortunately the file in my post has lost the spaces when copying to my post)

The problem is that the column must be : equal to original value if not blank OR 0.00 if blank
 
Old 05-04-2010, 01:54 PM   #7
Tinkster
Moderator
 
Registered: Apr 2002
Location: in a fallen world
Distribution: slackware by choice, others too :} ... android.
Posts: 23,066
Blog Entries: 11

Rep: Reputation: 910Reputation: 910Reputation: 910Reputation: 910Reputation: 910Reputation: 910Reputation: 910Reputation: 910
Something like
Code:
awk -F, '{if($59~/ /){$59="\0470.0\047"};print $0}' insert.sql
may do the trick ...



Cheers,
Tink
 
Old 05-04-2010, 02:46 PM   #8
JGuillou
LQ Newbie
 
Registered: Aug 2008
Location: Mauritius
Distribution: Mandriva 2008 Powerpack
Posts: 12

Original Poster
Rep: Reputation: 0
Thanks, but I'm a little lost.

$59 = the correct field but I don't understand the 0470.0\047
 
Old 05-04-2010, 02:51 PM   #9
Tinkster
Moderator
 
Registered: Apr 2002
Location: in a fallen world
Distribution: slackware by choice, others too :} ... android.
Posts: 23,066
Blog Entries: 11

Rep: Reputation: 910Reputation: 910Reputation: 910Reputation: 910Reputation: 910Reputation: 910Reputation: 910Reputation: 910
The "problem" is that awk uses the same "single quotes" your fields
are enclosed in to protect its commands from the shell .... the \047
is just the octal value representing a ' (single quote). Writing the
octal value I find slightly more readable that escaping htem - and it's
"safer" if you ever change quoting, or embed it in a script

In other words \0470.0\047 is the same as '0.0'

Better?


Cheers,
Tink
 
Old 05-04-2010, 11:45 PM   #10
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 9,247

Rep: Reputation: 2684Reputation: 2684Reputation: 2684Reputation: 2684Reputation: 2684Reputation: 2684Reputation: 2684Reputation: 2684Reputation: 2684Reputation: 2684Reputation: 2684
Assuming you want this for every line in the file either of these seem to work ok:
Code:
awk 'gsub(/\047 \047/,"\0470.00\047")' file

sed "s/' '/'0.00'/g" file
 
Old 05-05-2010, 04:26 AM   #11
JGuillou
LQ Newbie
 
Registered: Aug 2008
Location: Mauritius
Distribution: Mandriva 2008 Powerpack
Posts: 12

Original Poster
Rep: Reputation: 0
Hi

Sorted it using my initial process :

cut -c1-963 import_data.sql > jpg1
cut -c964-977 import_data.sql > jpg2
cut -c977-1104 import_data.sql > jpg3

I vi'ed jpg2 with the following :
:%s/ / 0.00/

I then glued it all back with :

paste -d '' jpg1 jpg2 jpg3 > import_data-a.sql

Probably not pretty, but hey it works - thanks for the help
 
Old 05-05-2010, 10:06 AM   #12
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 9,247

Rep: Reputation: 2684Reputation: 2684Reputation: 2684Reputation: 2684Reputation: 2684Reputation: 2684Reputation: 2684Reputation: 2684Reputation: 2684Reputation: 2684Reputation: 2684
Please mark as SOLVED if you have your solution
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Remove lines in a text file based on another text file asiandude Programming 10 01-29-2009 11:59 AM
Linux Text File convert to Windows/Notepad Text File = Wrapped? backroger Linux - Software 4 01-18-2009 06:54 AM
text match pipe to file then delete from original text file create new dir automatic tr1px Linux - Newbie 6 09-10-2008 10:40 PM
How to parse text file to a set text column width and output to new text file? jsstevenson Programming 12 04-23-2008 03:36 PM
fixing a corrupted xinitrc file Castanea_d. Linux - Newbie 2 05-30-2007 09:17 AM


All times are GMT -5. The time now is 02:30 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration