LinuxQuestions.org
Review your favorite Linux distribution.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 10-25-2007, 12:44 PM   #1
zomane
Member
 
Registered: Sep 2005
Location: Austria
Distribution: Debian, CentOS, OpenBSD, FreeBSD
Posts: 52

Rep: Reputation: 16
Replacing part (lines) of a file ( bash or perl )


Hello all,
I have two files
file orig.txt(bigger one, almost 7000 lines):

Code:
12345 $secondfield $thirdfield $nfield SDFFF
23456 $secondfield $thirdfield $nfield DFEDFRGFF
34567 8090988 33435 655646 SFFEFEFKKLKL
90783 5433543 54532543 5454 HJHJHGGH
76576 435345 534 5453566767 WEQRTQ
and so on ....
file repl.txt(smaller one, 1000 lines):

Code:
34567 1111 3354 566  SFFEFEFKKLKL
90783 324324 255 54435 HJHJHGGH
76576 3232 4545 4554 WEQRTQ
field names(for avoiding confusions) as ordered example files :

USERNUMBER AMOUNT1 AMOUNT2 AMOUNT3 NAME

orig.txt contains all USERNUMBERs from repl.txt but AMOUNT[1-3]s are different .
I want to replace it line by line .
Can someone gave my an idea how to do that ?
First I try that
Code:
cat repl.txt | awk '{print $1}' > USERNUMBERS.txt 
for i in $(awk '{print $1}' < USERNUMBERS.txt ) ; do  grep -v $i orig.txt > removed_wrong_lines.txt ; done
I expected removed_wrong_lines.txt will contains only correct lines and after that simply do
Code:
cat removed_wrong_lines.txt repl.txt > corrected.txt
but my experiment was unsuccessful.
I will be thankful for any suggestions how to solve this.

Last edited by zomane; 10-25-2007 at 12:48 PM. Reason: spell&formating
 
Old 10-25-2007, 07:41 PM   #2
cfaj
Member
 
Registered: Dec 2003
Location: Toronto, Canada
Distribution: Mint, Mandriva
Posts: 221

Rep: Reputation: 31
Quote:
Originally Posted by zomane View Post
Hello all,
I have two files
file orig.txt(bigger one, almost 7000 lines):

Code:
12345 $secondfield $thirdfield $nfield SDFFF
23456 $secondfield $thirdfield $nfield DFEDFRGFF
34567 8090988 33435 655646 SFFEFEFKKLKL
90783 5433543 54532543 5454 HJHJHGGH
76576 435345 534 5453566767 WEQRTQ
and so on ....
file repl.txt(smaller one, 1000 lines):

Code:
34567 1111 3354 566  SFFEFEFKKLKL
90783 324324 255 54435 HJHJHGGH
76576 3232 4545 4554 WEQRTQ
field names(for avoiding confusions) as ordered example files :

USERNUMBER AMOUNT1 AMOUNT2 AMOUNT3 NAME

orig.txt contains all USERNUMBERs from repl.txt but AMOUNT[1-3]s are different .
I want to replace it line by line .
Can someone gave my an idea how to do that ?
First I try that
Code:
cat repl.txt | awk '{print $1}' > USERNUMBERS.txt 
for i in $(awk '{print $1}' < USERNUMBERS.txt ) ; do  grep -v $i orig.txt > removed_wrong_lines.txt ; done
I expected removed_wrong_lines.txt will contains only correct lines and after that simply do
Code:
cat removed_wrong_lines.txt repl.txt > corrected.txt
but my experiment was unsuccessful.
I will be thankful for any suggestions how to solve this.

Is this what you want?

Code:
awk '
FNR == NR { x[$1] = $0; next }
{ print (x[$1]) ? x[$1] : $0 }
' repl.txt orig.txt
 
Old 10-26-2007, 01:19 AM   #3
zomane
Member
 
Registered: Sep 2005
Location: Austria
Distribution: Debian, CentOS, OpenBSD, FreeBSD
Posts: 52

Original Poster
Rep: Reputation: 16
Thanks,
I have one question about your awk construct.
Order of USERNUMBERs in two files is not important, am I right ?
I mean if USERNUMBER_xxx is on line 3520 in orig.txt and on line 542 in repl.txt, this will not make confusion in replacing.
If I understand correct all above it works for me , but if your answer on my question is "NO" then my first post is not formated correct .
 
Old 10-26-2007, 04:10 PM   #4
PAix
Member
 
Registered: Jul 2007
Location: United Kingdom, W Mids
Distribution: SUSE 11.0 as of Nov 2008
Posts: 195

Rep: Reputation: 40
Hi Zomane,
Cfaj's bit of code is is not dependent on the ordering, as I will explain. I suspect that you asked about the ordering because you didn't fully understand how the code works. It took me a moment too, so for the benefit of others I will describe it.
Quote:
The files repl.txt and orig.txt are specified in that particular order for good reason. The files are read one after the other .
NR is the record number of the input record since the start of input
FNR is the record number of the current input file
Code:
FNR==NR { x[$1] = $0; next }
This reads the first file (the shorter repl.txt) and at this point FNR and NR are the same and so the complete file is read line by line into aray x[ ] indexed by $1, the contents of the first column USERNUMBER (this is not an index number, we are talking about an associative array. The key is a unique string or number associated with the record.
USERNUMBER is assumed to be unique within in the repl.txt file otherwise subsequent occurences will overwrite earlier ones during this first phase, populating the array. Clue - if your array has less records than the repl.txt file then a or duplicate/s have been found, but that's for you to worry about elsewhere if necessary, I just thought you should know about it.
So now our replacement array is full of replacement text.
At this point FNR and NR are still synchronised and from what you say, let us assume 1000.
The first record read from the second file orig.txt will see NR become 1001 while FNR will become 1. Plainly the first line of code will no longer be executed beyond the pattern matching. Note that while FNR and NR matched and the array was being populated, the 'next' statement caused the next line of the file to be read in without proceeding to the next statement in the code.
Code:
{ print (x[$1]) ? x[$1] : $0 }
No pattern here indicates that this line of code should be executed for each and every line read in. Execution has been prevented so far because of the tight loop mentioned. Now however the patterns in the first line of code no longer match so no more Mr Tight Loop. Instead welcome to the ternary operator
Code:
selector ? if-true-exp : if-false-exp
In our code the brackets around the piece of code preceding the ? are intended to force evaluation of the array using the value of USERNUMBER in the current record. if it exists then print the value from the array. if it doesn't exist (no replacement) then print the current (original line).
From this point the next record is read in and the code iterates naturally as described until the end of file.
The input files appear at the end of the code and are read in almost as if they were one except for the behavour or the record counters NR and FNR.

Oh were I able to make the description as short as Cfaj's super couple of lines of code


So it does everything that it says on the tin and as long as you understand the nature of duplicates you are thoroughly home and dry.

PAix

Last edited by PAix; 10-26-2007 at 04:12 PM.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Perl vs. bash code - number of lines noir911 Programming 11 10-07-2011 12:35 PM
bash scripting: loop over a file, replacing two decimal numbers frankie_DJ Programming 2 04-30-2007 04:04 PM
Bash remove part of a file based on contents of another file bhepdogg Programming 4 01-31-2007 03:13 PM
Replacing new lines (\n) from a file bkeeper Linux - Software 4 12-15-2005 02:13 AM
perl: replacing a special line in a file markus1982 Programming 1 09-26-2002 02:05 PM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 11:23 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration