Linux - SoftwareThis forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
Hi everyone
I have Ubuntu 14.04.1 system.I have search windows partion with clamtk for viruses and move some files to quarantine.I have all the files with informations.Like this
What AWK actually does here, is to take one line at a time from 'tmpa1.lis', and remove the filesize and the hashes, including the separating commas, leaving only the filenamepath, which it writes to the output file 'tmpa1files.lis'.
The reason i have done it like that, is that the filenamepath's may contain commas, (','), which i want to preserve.
So, 'start of line' + 'some alphanumerics' + ',' + 'some alphanumerics' + ',' + 'some alphanumerics' + ',' is removed. And the rest is output, one line at a time.
I don't have the time now to adapt the concept for your case, but using AWK may be the simplest way to do what you ask for.
It is also possible to add quotes, ('"'), around the filenamepaths in the output file, merely by changing the "print $0" part of the AWK script.
What AWK actually does here, is to take one line at a time from 'tmpa1.lis', and remove the filesize and the hashes, including the separating commas, leaving only the filenamepath, which it writes to the output file 'tmpa1files.lis'.
The reason i have done it like that, is that the filenamepath's may contain commas, (','), which i want to preserve.
So, 'start of line' + 'some alphanumerics' + ',' + 'some alphanumerics' + ',' + 'some alphanumerics' + ',' is removed. And the rest is output, one line at a time.
I don't have the time now to adapt the concept for your case, but using AWK may be the simplest way to do what you ask for.
It is also possible to add quotes, ('"'), around the filenamepaths in the output file, merely by changing the "print $0" part of the AWK script.
Read my post again, thoroughly, and you will find it does exactly what it is supposed to.
I adviced you to use AWK to filter out what you wanted from your input file.
I told you to do the actual work on adapting the AWK command for your own purpose.
Nobody will do the work for you, since this is not a "pro bono sweatshop for getting the hard work done".
I charge €100 per hour for doing such work, but you can probably find someone doing it for much less.
In fact, people being lazy with school work, or even worse, at the job, tend to take their difficult task to a forum like this one. They ask for "help" in doing their homework or their job task without ever getting paid. If you let someone else do your homework for you, then you will never learn anything.
That sort of behavior is kind of disliked in this forum. It is the lowest form of lazy. Sloth - One of the seven deadly sins.
We will gladly help you. But we will not do your work for you.
First thing you will have to do is changing the AWK script to start looking for colon, ':', as a delimiter, instead of a comma sign.
If you look at one line of your example input file:
The line starts with an alphanumeric field "410893564e40afe3ef3f9bda36e058bf" containing letters and numbers. You match it with "[[:alnum:]]*". The '*' says to look for one or many alphanumerics, ("[[:alnum:]]").
Actually, the "regular expression" starts with a caret sign, '^', which is the code for "start of line".
So, "^[[:alnum:]]*" means to look for: Start of line, then one or many alphanumerics.
Next my script expects to find a comma. So "^[[:alnum:]]*," means to search for: Start of line, then letters and numbers, then a comma.
It continues to search for two more instances of "[[:alnum:]]*,".
So, when it sees a line starting with something like:
Code:
a1b2c3,111xxx,0987Ff,
...it will remove that from the line. Actually, it will in this case replace it with the empty string "" as you can see in the AWK script.
If the full line was:
Code:
a1b2c3,111xxx,0987Ff,/yes/i/know.dat
...it will remove the first three "[[:alnum:]]*," on the line.
Thus leaving the following output:
Code:
/yes/i/know.dat
But you have colons as delimiters instead of commas.
Try look for "^[[:alnum:]]*:" instead of "^[[:alnum:]]*,[[:alnum:]]*,[[:alnum:]]*,".
Did that help any?
The command used in the AWK script is gsub().
This is the command:
Code:
gsub(/^[[:alnum:]]*:/,"",$0);
What happens if you add another gsub command right after the first one that looks like this:
Code:
gsub(/:[[:alnum:]]*$/,"",$0);
?
It starts looking for a colon. Then some letters and/or numbers. Then we have the dollar sign, '$', last in the pattern to look for. The dollar sign means "end of line". So the pattern looks for a colon, then some alphanumerics, that is right at the end of line. It will replace this with an empty string, "", effectively removing it.
These gsub() commands are all working on a string named $0 which for AWK means "one full line".
AWK can also divide a line into fields, or columns, separated by a character like comma, colon, or whatever you like. The strings for each column from left to right is $1, $2, $3 and so on.
So why do we not use that, instead of searching for complex patterns?
It is because the field we are interested in, which in my application is comma separated from the other fields, but that it self can contain one or several commas in the file name. Simply splitting into fields with a comma separator would truncate a filename that contains the fully legitimate ','.
Are you with me so far?
Good info on how to use AWK, (and many other powerful unix/linux commands), is the grymoire, by Bruce Barnett.
Read my post again, thoroughly, and you will find it does exactly what it is supposed to.
I adviced you to use AWK to filter out what you wanted from your input file.
I told you to do the actual work on adapting the AWK command for your own purpose.
Nobody will do the work for you, since this is not a "pro bono sweatshop for getting the hard work done".
I charge 100 per hour for doing such work, but you can probably find someone doing it for much less.
In fact, people being lazy with school work, or even worse, at the job, tend to take their difficult task to a forum like this one. They ask for "help" in doing their homework or their job task without ever getting paid. If you let someone else do your homework for you, then you will never learn anything.
That sort of behavior is kind of disliked in this forum. It is the lowest form of lazy. Sloth - One of the seven deadly sins.
We will gladly help you. But we will not do your work for you.
First thing you will have to do is changing the AWK script to start looking for colon, ':', as a delimiter, instead of a comma sign.
If you look at one line of your example input file:
The line starts with an alphanumeric field "410893564e40afe3ef3f9bda36e058bf" containing letters and numbers. You match it with "[[:alnum:]]*". The '*' says to look for one or many alphanumerics, ("[[:alnum:]]").
Actually, the "regular expression" starts with a caret sign, '^', which is the code for "start of line".
So, "^[[:alnum:]]*" means to look for: Start of line, then one or many alphanumerics.
Next my script expects to find a comma. So "^[[:alnum:]]*," means to search for: Start of line, then letters and numbers, then a comma.
It continues to search for two more instances of "[[:alnum:]]*,".
So, when it sees a line starting with something like:
Code:
a1b2c3,111xxx,0987Ff,
...it will remove that from the line. Actually, it will in this case replace it with the empty string "" as you can see in the AWK script.
If the full line was:
Code:
a1b2c3,111xxx,0987Ff,/yes/i/know.dat
...it will remove the first three "[[:alnum:]]*," on the line.
Thus leaving the following output:
Code:
/yes/i/know.dat
But you have colons as delimiters instead of commas.
Try look for "^[[:alnum:]]*:" instead of "^[[:alnum:]]*,[[:alnum:]]*,[[:alnum:]]*,".
Did that help any?
The command used in the AWK script is gsub().
This is the command:
Code:
gsub(/^[[:alnum:]]*:/,"",$0);
What happens if you add another gsub command right after the first one that looks like this:
Code:
gsub(/:[[:alnum:]]*$/,"",$0);
?
It starts looking for a colon. Then some letters and/or numbers. Then we have the dollar sign, '$', last in the pattern to look for. The dollar sign means "end of line". So the pattern looks for a colon, then some alphanumerics, that is right at the end of line. It will replace this with an empty string, "", effectively removing it.
These gsub() commands are all working on a string named $0 which for AWK means "one full line".
AWK can also divide a line into fields, or columns, separated by a character like comma, colon, or whatever you like. The strings for each column from left to right is $1, $2, $3 and so on.
So why do we not use that, instead of searching for complex patterns?
It is because the field we are interested in, which in my application is comma separated from the other fields, but that it self can contain one or several commas in the file name. Simply splitting into fields with a comma separator would truncate a filename that contains the fully legitimate ','.
Are you with me so far?
Good info on how to use AWK, (and many other powerful unix/linux commands), is the grymoire, by Bruce Barnett.
Sir your are absulety correct.I apologize for my bad attitudes.I would like to add you in Youtube if you have a channel about this subjects.You are tremendous teacher.I have great respect.
I found this link https://regex101.com/#pcre . I think I better try some Regular Expressions here.Are they same as awk expression?
What about cut command
Code:
cat restore | cut -d : -f 2 > test
or
cut -d : -f 2 restore
Sir your are absulety correct.I apologize for my bad attitudes.I would like to add you in Youtube if you have a channel about this subjects.You are tremendous teacher.I have great respect.
Thank you.
Quote:
Originally Posted by Whynot
I found this link https://regex101.com/#pcre . I think I better try some Regular Expressions here.Are they same as awk expression?
There are some different variants on regular expressions. One must verify that the application use the language variant the regular expression was written for.
Always read the manual, and make sure the installed application will do what is expected. Is the particular OS using 'nawk' or the plain vanilla, ancient, limited 'awk'? Should i use 'grep', or 'egrep' or 'fgrep'? Will 'grep' do the same on Solaris or BSD as on Linux?
Better make sure.
Quote:
Originally Posted by Whynot
What about cut command
Code:
cat restore | cut -d : -f 2 > test
or
cut -d : -f 2 restore
Great!
You have on your own found an even simpler solution to your problem than what i proposed.
Looking at your filelists i see that you handle files and directories on a Windows disk.
In Windows/NTFS a filename may not contain the following characters:
Code:
\ / : * ? " < > |
So your filenames will not contain any colon. Therefore you can use cut with colon as a separator.
Be aware that in Unix/Linux it is allowed to have colon in a filename.
The following is legal:
Code:
# touch abdcef\:0123
# ls
abdcef:0123
As long as you only handle filenames on an NTFS filesystem you are fine with using cut.
Otherwise you may need a more complex solution. Like the one i proposed.
If these files are infected, what's the point restoring them?
I suspect he scanned with PUA enabled, Got alarmed and quarantined them.
They may not be infected.
virustotal.com would be my next recourse for suspect files.
surely. VideoLAN (VLC) for Windows is an infection?
I suspect he scanned with PUA enabled, Got alarmed and quarantined them.
They may not be infected.
virustotal.com would be my next recourse for suspect files.
surely. VideoLAN (VLC) for Windows is an infection?
No that file is not infected i just checked with virustotal but I think most of them are infected files.
Quote:
Originally Posted by Whynot
Originally Posted by Habitual View Post
Open clamtk and hit Ctrl+M
and restore the files from clamtk.
With clamtk there is about 3000 files and I have to restore them one by one
Now I have all files with path I need to move this files from local drive but I have to take file names with file extensions
I can't go farther.
test file content
With clamtk there is about 3000 files and I have to restore them one by one
Quote:
Originally Posted by Whynot
Now I have all files with path I need to move this files from local drive but I have to take file names with file extensions
I can't go farther.
What about using tar?
Example (run as root):
Code:
cd ~/.clamtk/viruses
tar --one-file-system --atime-preserve -cv -T ~/list-of-cleanfiles.lis -f ~/noviruses.tar
cd /media/whynot/New\ Volume
tar -x -f ~/noviruses.tar
This will make a tar file with only the files listed in the file "~/list-of-cleanfiles.lis".
It will not store any directories in the archive, unless they are explicitly stated on it's own line in the list file. This is good, since you do not want to clobber the directories you restore to.
It will preserve the path in the file list.
The file list must use relative paths, so the tar archive is created with relative paths.
Your file list has absolute paths. Maybe you can use 'cut' to remove the leading "/media/whynot/New Volume/" from every line in the list?
When you 'cd' to the root directory of the volume you want to restore to, and issue the tar restore command, the files will be restored relative to that directory, "/media/whynot/New\ Volume".
Tar will create the necessary directories in the path of the files restored if they do not exist.
If the directories exist, tar will just add the files into the existing directories.
Directories created by tar, (if necessary), will inherit owner/priv from its parent directory when created.
You can list the contents of the tar file before restoring using:
Code:
tar -t -f ~/noviruses.tar
Please note that this method will preserve privileges/ownership of the files.
I mention this because the original virus list had a privileges column to the right.
I guess it doesn't matter much, since ownerchip probably will be clobbered by the method you have mounted the NTFS filesystem in Linux. This can be tricky to get right. Have you made a backup of the volume you will restore to?
You will probably have to check the privileges/ownership of these restored files when running the Windows volume again.
It is of course possible to just copy over the tar file to the NTFS, and then boot the Windows system up, and use WinRAR to restore the files from the tar file to the correct destinations.
Using tar to copy/(move) the files like the above will also at the same time give you an archive file of the restored files.
Quote:
Originally Posted by Whynot
I stuck here.
Well, maybe not anymore.
Quote:
Originally Posted by Whynot
Code:
whynot@whynot-System-Product-Name:~/.clamtk/viruses$ while read line; do mv "$line"; done < test
It has to be something like this.
mv file "$line"
I would greatly appreciate any help
Thanks.
One alternative way to do the copy/move would be to use SED or AWK to reformat every line of the file list to a copy/move command, like "cp <sourceFilePath> <destinationFilePath>" with a nifty print command - Thus making a bash script file that you can source.
Now you're getting somewhere - but you need both the source file location (as in your first post, but the full pathname) as well as your destination.
Looking at where you're going with this (trimming the path, escaping the blanks then constructing a command), I'd be using something like awk. That way you can do the lot on each line without having to read the file multiple times or use more than one tool. Using you initial file list, try this - it'll generate stdout you can redirect to a file. Have a look and make sure it is ok, then simply "source" it to run the mv's.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.