ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
I have 1000+ .pdf files wrongly renamed after I undelled them. I've looked around and found a .pdf to .txt batch converter so now I also have 1000+ .txt files with the file name in their second line first five character string. I would like to have the .pdf files batch renamed after their second line first 5 character string.
Example:
1. BATCH INPUT: a folder containing both 1000+ pdf and 1000+ .txt files like:
Undeleted001.pdf & Undeleted001.txt
Undeleted002.pdf & Undeleted002.txt
2. BATCH NEEDS TO EXTRACT either from .pdf or .txt files the first five characters of the second line for each file like:
UNDELETED001.PDF (.TXT) 1st five characters, 2nd Line:"12345"
UNDELETED002.PDF (.TXT) 1st five characters, 2nd Line:"ABCDE"
3. BATCH WILL RENAME ALL PDF FILES:
Undeleted001.pdf ---------->12345.pdf
Undeleted002.pdf ---------->ABCDE.pdf
I've seen some text extract script example using the cat and the sed commands but they apply only to one file and you have to write in the file name.
I have some Basic, VBA and Matlab programming knowlledge, I am currently reading on C/C++ and bash, and am willing to learn either pearl or phyton or lisp or all. I started traveling a one way road that has forked tremendously but am determined to get there. Some people do crosswords, I like programming.
Can you please: provide a road map of the commands to do this in the programming language of choice. Like:
Using Language "Language"
Step1: Batch Input: commands "this/that" options "this/that"
Step2: Text Extraction: commands "this/that" options "this/that"
Step3: File Rename: commands "this/that" options "this/that"
Feel free to provide the complete code, but if you do so I won't have to read on how to use those commands and what those options do.
#!/usr/bin/perl
while (<ARGV>) {
if ($. == 2) { # line nr 2
($nm) = m/(.{5})/; # rip the name up and make a command
print STDOUT "cp $ARGV $nm.txt\n";
close ARGV; # effectively opens the next file
}
}
$ prog.pl > 1.sh # save to a text file and check it's ok
$ cat 1.sh
cp 1.pdf fruit.pdf
cp 2.pdf fibre.pdf
cp 3.pdf ringo.pdf
$ sh 1.sh # run it as a script, bonus is it leaves a record too
Thank you for your fast response. You did post the complete script (I think) and more because of the record keeping; still I have to go and find out how this works. Thanks this will keep me busy for a few days.
I hope that you get to read this. After seeing your solution, reading some of O'Reilly's "Programming Perl" and stumbling upon some stones I managed to create attached scripts.
Thansk Again
Lixbie
1st Stone: Some files were *.pdf and others *.PDF
Code:
Quote:
#Change PDF to pdf in files
#!/usr/bin/perl #Use as ./ren1ch.pl *.PDF > ren.sh
print STDOUT "#!/usr/bin/sh\n";
while (<ARGV>) # While ($_ = <ARGV>)
{
($name) = $ARGV; #$name is name of first *.PDF file
$name =~ s/PDF/pdf/g; # Change Pdf to pdf in $name
print STDOUT "mv $ARGV $name\n\n"; # move old file to new file
close ARGV; # backto top
}
2nd Stone: The name of the file wasn't always on the 5th line but always started with a 12 character long text.
3rd: Large text neded to be abreviated
4th: Spaces had to be changed to _
5th: Found some non Word Characters in the text
6th: Some files where split and had the similar names so one was overriding the other and I was getting 90% of the original files. Added the counter and OK.
Code:
Quote:
#Rename all pdf files to $name acording to line in *.txt file
#!/usr/bin/perl #Use it as ./ren2ren *.txt > ren.sh
while (<ARGV>) # While ($_ = <ARGV>)
{
if (/texttext:/) # line contains my text:
{
($name) = substr($_, 12);
if ($name =~ /\s(.*)\b/)
{
$name = $1;
$name =~ s/[some large text]/\0/g; # Abreviate
$name =~ s/\W+/\_/g;
$name =~ s/_+/_/g;
($ARGV2)= $ARGV; # Keep ARGV... still testing
$ARGV2 =~ s/txt/pdf/g;
$AI++;
print STDOUT "cp $ARGV2 $name\_F$AI.pdf\n\n";
print STDOUT "mv $ARGV\t\t\txt/$ARGV\n";
print STDOUT "mv $ARGV2\t\t\pdf/$ARGV2\n\n";
close ARGV; #back to top
}
}
}
for PDF in *.pdf
do
txt="${PDF%*.pdf}.txt"
if [ -e ${txt} ]; then
fivechar=$(awk 'NR==2{print substr($0,1,5)}' "${txt}")
mv "$PDF" "${fivechar}.pdf"
# add code to remove remaining txt files
fi
done
Quote:
Feel free to provide the complete code, but if you do so I won't have to read on how to use those commands and what those options do.
you don't have to read on how to use them? that's not the way to learn.
Last edited by ghostdog74; 06-30-2008 at 10:12 PM.
Ghost: Is that bash shell? I don't understand clearly the purpose of the script but I guess that it changes the uppercase PDF for lowercase pdf? Right?
Chris: Thanks for the link, very good one. One thing I haven't been able to find is how to break long lines in perl without trouble because I would like to and tab subroutines to the right and keep all text in the screen and all my comments in one neat column; but when you have the following line tabbed three times
$name =~ s/[some very very very very large text]/\0/g; # Abreviate
the comment goes out of the screen to the right.
Is there a way to break this line in perl like this
$name =~ s/[some very very very
very large text]/\0/g; # Abreviate
Ghost: Is that bash shell? I don't understand clearly the purpose of the script but I guess that it changes the uppercase PDF for lowercase pdf? Right?
yes its shell(bash) and no, it doesn't change upper case PDF to lowercase. it goes through all your pdf files, check for corresponding .txt file of the same name, get the 1st 5 chars of the second line of the corresponding text file and use it to rename the pdf file. If you are interested , see my sig for bash link
$str= "asdfgh".
"zxcvb";
# same as
$str="asdfghzxcvb";
PS always start your progs with
#!/usr/bin/perl -w
use strict;
Those 2 strictures ( -w = warnings & use strict) enforce proper coding eg declarations and warn you of dodgy/broken syntax.
To syntax test a perl prog use
>perl -wc prog.pl
which will do a test compile without running it.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.