Remove duplicates from file
I need help on below,
I have a file with below file names with the directory list cat /tmp/fileList.tmp A/B/C A/B/C/D A/B/C/D/E.txt A/B/C/D/E.txt A/B/C1 A/B/C1/D/E1.txt A/B/C1/D/E2.txt A/B/C1/D/E2.txt using Code:
awk '!_[$1]++' /tmp/fileList.tmp A/B/C A/B/C/D A/B/C/D/E.txt A/B/C1 A/B/C1/D/E1.txt A/B/C1/D/E2.txt but still the Dir paths are listed, I need to remove the directory paths and keep only the file paths. |
I don't know you produced the file list, but if you can reproduce by going through a directory structure, if you use a find command such as this:
Code:
find . \! -type d Code:
find . -type f With the list you have now, as a Human, how do you recognize a non-directory? Do all files have extensions? If so, then could use a pattern such as: Code:
(.+)\.(.+)$ |
Quote:
extension of some sort ... Code:
awk '/\./ && !_[$1]++' dupes Cheers, Tink |
is using "awk" mandatory ?
from the sed page - the " one liners " http://sed.sourceforge.net/sed1line.txt |
Quote:
|
Quote:
|
Quote:
|
As has already been stated, you would need to provide information about to tell the difference between a file and a directory.
|
Quote:
Code:
branches/upgrade What I'm trying to accomplish here is write automated script to merge the SVN changes from one branch to another by referring a JIRA ticket. |
Unless you can find a way to differentiate between files and directories, you will be stuck with only removing the duplicates.
It could even be as simple as the directories all having a trailing slash. |
Quote:
can we use some string filtering, branches/upgrade/Build/scripts branches/upgrade/Build/scripts/svnscripts/svnsbupdater so we can remove the line "branches/upgrade/Build/scripts" because always the file name will contain the dir path in it. |
Thanks for Anuradha I got this solved, posting the answer for others.
Code:
#!/usr/bin/perl |
Well I must say I am curious how this script has met any of your requirements??
When run on the data from post #9 I get: Code:
branches/upgrade/Build/scripts/svnscripts/svnsbupdater Quote:
Your own example is flawed in the fact that only a visual look at the data can let you know what is a file or directory: Quote:
|
Quote:
Code:
[san@san1 tmp]$ cat t.txt |
Well I think it is important to note for people who might search and find this solution that it works incorrectly on the assumption that the longest match for the same path
will end in a file name. An easy example, if we assume that directory blah is as follows: Code:
branches/upgrade/Build/scripts/compile |
Quote:
|
There is no programmatical way based on the output. The only solution
would be to do that reporting on the file-system, and generate the report from there. ... |
Rather than limiting the program to working just within the list of files and directories obtained from the JIRA ticket, I'd go the extra mile to be sure I had it right. I'd compare the list against the SVN repository. That would enable you to differentiate between files and directories. For example, comparing a line ( or a portion of a line ) from the ticket, against the output from something like:
Code:
svn list --depth files ... |
All times are GMT -5. The time now is 06:46 AM. |