Remove duplicates from file
I need help on below,
I have a file with below file names with the directory list cat /tmp/fileList.tmp A/B/C A/B/C/D A/B/C/D/E.txt A/B/C/D/E.txt A/B/C1 A/B/C1/D/E1.txt A/B/C1/D/E2.txt A/B/C1/D/E2.txt using Code:
awk '!_[$1]++' /tmp/fileList.tmp A/B/C A/B/C/D A/B/C/D/E.txt A/B/C1 A/B/C1/D/E1.txt A/B/C1/D/E2.txt but still the Dir paths are listed, I need to remove the directory paths and keep only the file paths. |
I don't know you produced the file list, but if you can reproduce by going through a directory structure, if you use a find command such as this:
Code:
find . \! -type d Code:
find . -type f With the list you have now, as a Human, how do you recognize a non-directory? Do all files have extensions? If so, then could use a pattern such as: Code:
(.+)\.(.+)$ |
Quote:
extension of some sort ... Code:
awk '/\./ && !_[$1]++' dupes Cheers, Tink |
is using "awk" mandatory ?
from the sed page - the " one liners " http://sed.sourceforge.net/sed1line.txt |
Quote:
|
Quote:
|
Quote:
|
As has already been stated, you would need to provide information about to tell the difference between a file and a directory.
|
Quote:
Code:
branches/upgrade What I'm trying to accomplish here is write automated script to merge the SVN changes from one branch to another by referring a JIRA ticket. |
Unless you can find a way to differentiate between files and directories, you will be stuck with only removing the duplicates.
It could even be as simple as the directories all having a trailing slash. |
Quote:
can we use some string filtering, branches/upgrade/Build/scripts branches/upgrade/Build/scripts/svnscripts/svnsbupdater so we can remove the line "branches/upgrade/Build/scripts" because always the file name will contain the dir path in it. |
Thanks for Anuradha I got this solved, posting the answer for others.
Code:
#!/usr/bin/perl |
Well I must say I am curious how this script has met any of your requirements??
When run on the data from post #9 I get: Code:
branches/upgrade/Build/scripts/svnscripts/svnsbupdater Quote:
Your own example is flawed in the fact that only a visual look at the data can let you know what is a file or directory: Quote:
|
Quote:
Code:
[san@san1 tmp]$ cat t.txt |
Well I think it is important to note for people who might search and find this solution that it works incorrectly on the assumption that the longest match for the same path
will end in a file name. An easy example, if we assume that directory blah is as follows: Code:
branches/upgrade/Build/scripts/compile |
All times are GMT -5. The time now is 12:58 PM. |