LinuxQuestions.org
Download your favorite Linux distribution at LQ ISO.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Software
User Name
Password
Linux - Software This forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.

Notices


Reply
  Search this Thread
Old 01-14-2009, 04:38 PM   #1
darcman
LQ Newbie
 
Registered: Jan 2009
Posts: 1

Rep: Reputation: 0
How would you combine files excluding the lines that are different?


I've got an application (trac if you must know) that has an option for some nice centralized configuration. But in order to use this feature I must take many singular configuration files and "move" the redundant configuration settings to a "global" file while each of the "local" (for each of the instance of trac we run) configuration files retain the site specific data. Below are the steps I think would need to take place:
  1. compare lines in all the existing configuration files
  2. output the lines that are the same in all the files to one new file
  3. create a backup copy of each local configuration file
  4. create new local configuration files that only include the lines that are not in the global configuration file

How would you create a script to do this?

Thanks All!
 
Old 01-14-2009, 11:58 PM   #2
clowenstein
LQ Newbie
 
Registered: Feb 2006
Posts: 4

Rep: Reputation: 0
Excluding different lines from files.

Are the lines in these files in the same order, just some of them have different content? Maybe a few extra lines in one file or another?

If so, you can use comm(1) to find the lines common to a pair of files. Then you can use comm(1) again to find the lines common to this output and a third file. Continuing on, you get the set of lines common to all files.

$ comm -12 File1 File2 > Comm12 # print lines common to both files
$ comm -12 Comm12 File3 > Comm123 # lines common to files 1,2,3
. . .
continue until you have CommN # lines common to all files

$ comm -3 CommN File1 > Diff1 # lines different in file1
$ comm -3 CommN File2 > Diff2 # lines different in file2

This will only work if the common lines in the files are all in the same order. But that may very well be the case for these configuration files.

If the common lines are not all in the same order, you probably have to sort the files first to put everything in order. But that may upset the eventual use, I don't know.

Naturally you want to practise this to make sure it works the way you want before committing to any permanent changes.

This is not yet a script, but is the outline of a method that could be turned into a script.

carl
 
Old 01-14-2009, 11:58 PM   #3
LaughingBoy
Member
 
Registered: May 2006
Location: Adelaide, South Australia
Distribution: Fedora 6-17 x64 / Ubuntu 10.x x64
Posts: 95

Rep: Reputation: 16
I'm sure it would involved grep, diff, and some regular expressions.

How many files are you talking about?
 
Old 01-15-2009, 06:29 AM   #4
lkraemer
Member
 
Registered: Aug 2008
Posts: 113

Rep: Reputation: 10
Diff

I have used DIFF25 to do the same type of compare (in XP) and piped the output to a diff.txt file. You should be able to do the same, and the source file (GLOBAL) can be built by removing the lines that are contained in the diff.txt file.

Have a look at the docs for DIFF.

I am sure grep can also be used with diff for a script for batch
processing.

lkraemer

Last edited by lkraemer; 01-15-2009 at 06:30 AM.
 
Old 01-15-2009, 08:26 AM   #5
jschiwal
LQ Guru
 
Registered: Aug 2001
Location: Fargo, ND
Distribution: SuSE AMD64
Posts: 15,733

Rep: Reputation: 681Reputation: 681Reputation: 681Reputation: 681Reputation: 681Reputation: 681
comm needs the input files to be sorted.

After composing a post, I checked what trac was and looked in the configuration section of the documentation. The config
file is in the form used by Apache and Windows 3.1 .ini files.
[section]
config entry
config entry

[section 2]
config entry
config entry

How many configuration files do you need to setup?
IMHO, it may be better to manually go through which configuration entries are global by the nature of what they are. Then create a template config file to use to configure each machine with items not in the config file.

Here is an fragment of the wine.ini from picasa:
Code:
[Strings]
MciExtStr="Software\Microsoft\Windows NT\CurrentVersion\MCI Extensions"
Mci32Str="Software\Microsoft\Windows NT\CurrentVersion\MCI32"
Desktop="Control Panel\Desktop"
Metrics="Control Panel\Desktop\WindowMetrics"
CurrentVersion="Software\Microsoft\Windows\CurrentVersion"
CurrentVersionNT="Software\Microsoft\Windows NT\CurrentVersion"
FontSubStr="Software\Microsoft\Windows NT\CurrentVersion\FontSubstitutes"
Control="System\CurrentControlSet\Control"

[Classes]
HKCR,.avi,"Content Type",2,"video/avi"
HKCR,.dll,"Content Type",2,"application/x-msdownload"
HKCR,.exe,,2,"exefile"
HKCR,.exe,"Content Type",2,"application/x-msdownload"
HKCR,.htm,,2,"htmlfile"
HKCR,.htm,"Content Type",2,"text/html"
HKCR,.html,,2,"htmlfile"
You could do something like:
sed '/[Strings]/,/^$/{ /[Strings]/!p }' *.ini >strings.merged
This will cut out all of the values from the strings sections.

Next you could do something like:
grep -c 'MciExtStr="Software\Microsoft\Windows NT\CurrentVersion\MCI Extensions"' strings.merged

This will return the number of matches.

grep -c 'MciExtStr="Software\Microsoft\Windows NT\CurrentVersion\MCI Extensions"' configfiles/*.ini
will return the same thing without cutting out the [strings] section. A name="value" pair would have to be
unique to a section to avoid a miscount.

If the number of matches equals the number of config files, then that entry can go into a global config file.

---

It may be better to read all of the files in a pearl script. You could have the "name" part as the index to a hash.
Read in a single file to store the values, and have another field or array so you can track whether the items are unique or not. Initialize the unique field to "yes" on the first config file.
Next read in the other config files and whenever a value differs, change the unique field from "yes" to "no".

Finally use the hash array to print out only the unique items.
 
Old 01-15-2009, 08:37 AM   #6
utoddl
LQ Newbie
 
Registered: Oct 2005
Location: Sanford, NC
Distribution: Fedora, Ubuntu
Posts: 5

Rep: Reputation: 2
You need diff's "--unchanged-line-format" parameter

You're looking for diff's "--unchanged-line-format" parameter. diff normally shows you differences, but it can show you samenesses as well. If you specify --unchanged-line-format and omit --old-line-format and --new-line-format, then diff will only show the unchanged lines.

For example, suppose you have old.conf and new.conf, and you want to create common.conf that just has the lines the two files have in common. This will do it:

diff --unchanged-line-format='%l
' old.conf new.conf > common.conf


That's not a typo; there is a new-line in the single quoted string right after the "%l". "%l" stands for the line in question, but without the line termination character. So you have to put a literal new-line character in the string. Seems weird, but it lets you do things like adding a comment to the end of each line, like this:

diff --unchanged-line-format='%l # common to old.conf and new.conf
' old.conf new.conf > common.conf

diff --old-line-format='%l # unique to to old.conf
' old.conf new.conf > old-unique.conf

diff --new-line-format='%l # unique to new.conf
' old.conf new.conf > new-unique.conf


Cheers,
--
utoddl@email.unc.edu

Last edited by utoddl; 01-15-2009 at 08:41 AM. Reason: typo in example code
 
Old 01-15-2009, 08:43 AM   #7
jschiwal
LQ Guru
 
Registered: Aug 2001
Location: Fargo, ND
Distribution: SuSE AMD64
Posts: 15,733

Rep: Reputation: 681Reputation: 681Reputation: 681Reputation: 681Reputation: 681Reputation: 681
Why didn't I think of diff. I missed the obvious because I started thinking of "comm -12 < <(sort file1) < <(sort file2) >common
Thanks uttodll.
 
Old 01-15-2009, 12:26 PM   #8
babel17
LQ Newbie
 
Registered: Nov 2007
Posts: 9

Rep: Reputation: 1
uniq

As long as line order is unimportant, this trivial using uniq.

Code:
sort <filelist> | uniq -d > commonfile
will give you a commonfile that contains only the lines that are common to all files in <filelist>

Code:
for file in <filelist>; do
    sort $file commonfile | uniq -u > $file.uniq
done
will give you the only the non common line from each original file.

The big problem is if line order matters it's not going to work.

Hmm. if line order does matter, you do the common file the same and then do:

Code:
perl -n -i.bak -e 'BEGIN {open(COMMON "< commonfile"); while(<COMMON>) {$cmn{$_}=1;} close(COMMON);} print unless $cmn{$_};' <filelist>
If I wrote this right it should give you your original input files back with the common lines removed but line order preserved and it saves a copy of the originals with the .bak extension.

Of course I haven't actually tested it to see if it works

Last edited by babel17; 01-15-2009 at 12:38 PM.
 
Old 01-16-2009, 07:20 PM   #9
jschiwal
LQ Guru
 
Registered: Aug 2001
Location: Fargo, ND
Distribution: SuSE AMD64
Posts: 15,733

Rep: Reputation: 681Reputation: 681Reputation: 681Reputation: 681Reputation: 681Reputation: 681
Quote:
Originally Posted by babel17 View Post
As long as line order is unimportant, this trivial using uniq.

Code:
sort <filelist> | uniq -d > commonfile
will give you a commonfile that contains only the lines that are common to all files in <filelist>
Actually no. "sort <filelist> | uniq -d will report duplicates but will return any entry that is in more than one file, such as in two of fifty files.
 
Old 01-19-2009, 02:49 PM   #10
babel17
LQ Newbie
 
Registered: Nov 2007
Posts: 9

Rep: Reputation: 1
Quote:
Originally Posted by jschiwal View Post
Actually no. "sort <filelist> | uniq -d will report duplicates but will return any entry that is in more than one file, such as in two of fifty files.
Yeah, You're right. Another issues is what happens if the same line occurs more than once. See disc^H^H^H^Hcopout about not actually having tested any of it.

Even diff won't work if the file is in the format previously described, as identical sections may occur in a different order between two files.

Only way to do it right is with a script that understands the file format.
 
Old 01-19-2009, 07:23 PM   #11
chrism01
LQ Guru
 
Registered: Aug 2004
Location: Sydney
Distribution: Rocky 9.2
Posts: 18,309

Rep: Reputation: 2744Reputation: 2744Reputation: 2744Reputation: 2744Reputation: 2744Reputation: 2744Reputation: 2744Reputation: 2744Reputation: 2744Reputation: 2744Reputation: 2744
Yeah. If you use Perl, you can store the config in a hash, which means that all the recs with the same key (same name) will end up in one hash you can output to a file. It'll be able to understand the file format.
Its a good lang for this kind of problem.
Here's a couple of links:
http://perldoc.perl.org/
http://www.perlmonks.org/?node=Tutorials
 
  


Reply

Tags
diff, parsing, text


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
excluding some lines from compiling enzom83 Programming 2 10-13-2007 12:42 PM
Parse lines need from /var/log/message but excluding... grant-skywalker Linux - General 8 03-20-2007 02:30 PM
excluding files and 'mv' surfbass Linux - General 3 07-15-2006 07:59 PM
using egrep and excluding lines beginning in # thebudbottle Programming 2 05-10-2006 02:27 PM
tar and excluding files murshed Linux - Newbie 7 03-15-2003 02:32 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Software

All times are GMT -5. The time now is 09:12 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration