LinuxQuestions.org
Share your knowledge at the LQ Wiki.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 04-21-2020, 02:01 PM   #1
2byt3s
LQ Newbie
 
Registered: Apr 2020
Posts: 2

Rep: Reputation: Disabled
2 files diff help


I need some help with a bash script to do the following. I can do simple stuff in bash but this one is beyond me.

I have 2 files, one has a set of domains and urls only such as
www.google.com
www.fakeurl.com/xmltcs/test.php

the other contains another list some of the urls match but are not in the same order and are commented using the # to notate not to be processed:

www.google.com # comment not to be processed
www.wikihow.com/index.html # whatever goes here

I need to parse the two and merge them, removing any duplicates but keeping the comments if there are any.

Any ideas?
 
Old 04-21-2020, 02:15 PM   #2
Turbocapitalist
LQ Guru
 
Registered: Apr 2005
Distribution: Linux Mint, Devuan, OpenBSD
Posts: 7,309
Blog Entries: 3

Rep: Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721Reputation: 3721
I would use join if you're not going to escalate to perl. The input files would need to be sorted first for that. See the options -j and -a in "man join".

An advanced approach would be to use process substition to sort the files on the way in to join without changing anything on the disk.
 
1 members found this post helpful.
Old 04-21-2020, 06:45 PM   #3
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,128

Rep: Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121
Might as well remove the not to be processed lines with something like "grep -v" - save some post-processing.
 
Old 04-22-2020, 01:59 PM   #4
2byt3s
LQ Newbie
 
Registered: Apr 2020
Posts: 2

Original Poster
Rep: Reputation: Disabled
I wish I could remove them, however it is a requirement to meet a certain standard that was adopted.
 
Old 04-22-2020, 02:08 PM   #5
ondoho
LQ Addict
 
Registered: Dec 2013
Posts: 19,872
Blog Entries: 12

Rep: Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053
bash:
  • read the file in line by line
  • remove comments like so: ‘line=“${line%%#*}”’
edit:
probably best if you show us what you wrote so far.
 
1 members found this post helpful.
Old 04-22-2020, 06:29 PM   #6
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,128

Rep: Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121Reputation: 4121
Quote:
Originally Posted by 2byt3s View Post
I wish I could remove them, however it is a requirement to meet a certain standard that was adopted.
I was suggesting the removal be done along with the sort (piping) in the process substitution suggested above. That only affects the data your script sees, not the actual source data.
 
Old 04-22-2020, 08:05 PM   #7
scasey
LQ Veteran
 
Registered: Feb 2013
Location: Tucson, AZ, USA
Distribution: CentOS 7.9.2009
Posts: 5,727

Rep: Reputation: 2211Reputation: 2211Reputation: 2211Reputation: 2211Reputation: 2211Reputation: 2211Reputation: 2211Reputation: 2211Reputation: 2211Reputation: 2211Reputation: 2211
Quote:
Originally Posted by ondoho View Post
probably best if you show us what you wrote so far.
Agreed. The OP needs to show us what they’ve tried..
 
1 members found this post helpful.
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
rebuild xor conf -- diff video driver - diff video card zimbot Ubuntu 2 06-13-2017 03:34 PM
How can I build a kernel module on one system for a diff. kernel on a diff. system? slacker_ Linux - General 18 09-15-2014 05:55 PM
setting git-diff --color-words as git-diff potuz Linux - Software 5 09-09-2010 01:31 PM
Dual Boot diff Hard Disk diff OS on Suse 9.1 wilhem Linux - Newbie 1 08-13-2004 06:06 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie

All times are GMT -5. The time now is 04:55 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration