C programming question

Dogs · 09-19-2010, 11:36 PM

So I've got a lot of archived (rar) files with par2 recovery files associated with them, and my goal is to write a program that will do its work in the present working directory.

the par2 files are either labeled filename.volume.PAR2, filename.volume.par2, filename.par2, or filename.PAR2.

The first run should check the entire contents of the directory to see which format has been used for the file exension, and then dump its results into an array of strings, or a file, or something for use later.

The second run then needs to check the entire directory for the format of the rar files. The first archive is labeled as either filename.r00, filename.R00, or filename.part01.rar, and dump the contents to an array of strings or something for use later.

with the formats for the extension collected, the next step should truncate the results to PAR2, par2, or both.

Code:

      if(par2 && PAR2 are present)
            then execute par2 v *.par2
            then execute par2 v *.PAR2
      else if(par2 && !PAR2)
            then execute par2 v *.par2
      else if(!par2 && PAR2)
            then execute par2 v *.PAR2

very near the end of the output of par2 v *.whatever, a status message is printed which states whether or not recovery is required. That needs to be captured and used in a test to determine whether or not par2 needs to be executed with the r flag.

The best way, since if repair is necessary, the status message is printed 6 lines from the end, and when repair is unnecessary, the message is printed at the very end of the output, seems to me to be an on-the-fly evaluation of the output from the par2 v command.

Code:

        if(Repair is required.)
           then execute par2 r *.whatever
        else if(All files are correct, repair is not required.)
           then move on to extraction

Code:

       if(Repair is required. && par2 && PAR2)
          then execute par2 r *.par2
          then execute par2 r *.PAR2
       else if(Repair is required. && par2 && !PAR2)
          then execute par2 r *.par2
       else if(Repair is required. && !par2 && PAR2)
          then execute par2 r *.PAR2

So, once the par2 files are verified and repaired if necessary, then it is time to extract from the primary archive, the contents, into a newly created directory that is named "filename".

so,

Code:

       
       if(.r00 && .R00 && .part01.rar)
         then extract all of those to a directory labeled filename
       else if(.r00 && !.R00 && !.part01.rar)
         then extract just those ones to a directory labeled filename
       else if(!.r00 && .R00 && !.part01.rar)
         then extract just those ones to a directory labeled filename
       else if(!.r00 && !.R00 && .part01.rar)
         then extract just those ones to a directory labeled filename

My questions are --

How do I, from a C program, grab only a single instance of a file extension from multiple entries from an overall filename which may contain any number of different characters and formats?

IE -
2008.records.bak.PAR2
Home.films-09.par2
Extremely-long-list-of-ingredients.vol83-93.par2

END RESULT SHOULD BE -
.PAR2 && .par2

IE 2 -
huffinsmcpuffins.par2
twenty-8-steps_to.fall-asleep-tonight.par2

END RESULT SHOULD BE -
.par2 && !.PAR2

My initial thought is that I could do ls *.par2 *.PAR2 > pars
strip the filenames and only keep the file extensions
then uniq the file to reduce it to either par2, PAR2, or both.

How do I, from a C program, execute another program with a wildcard because the filename, at this stage, is unimportant for the par2 command, followed by one or both of 2 possible formats for the file extension.

IE -

with the previous results from either EX 1 or EX 2, do -
(what will be done regardless) [what needs to be added]

(par2 v *) [.par2]
(par2 v *) [.PAR2]

My initial thought is a loop that is iterated based on how many different formats of extensions there are. The trick would be to distinguish between either par2 or PAR2, if both weren't present, but at the moment I'm unsure as to how that might be achieved.

Next up is searching for the rar extensions. I imagine that the solution to doing it with par2 extensions would be mostly the same for the rars, so that's not my primary concern.

So, how do I strip the extensions from the output of a command like ls *.r00 *.R00 *.part01.rar, and then uniq what is left and use that for the creation of directories to extract the files in to?

IE -

test01.r00
test01.r01
test01.r02
test01.r03
test02.R00
test02.R01
test02.R02
test1-334-dd-4565.part01.rar
test1-334-dd-4565.part02.rar
test1-334-dd-4565.part03.rar

END RESULT IS -

mkdir test01
mkdir test02
mkdir test1-334-dd-4565

If I can get these questions answered, I can figure the rest out. It's not a matter so much as, "how do I do X", but moreso a matter of, "I've got the general idea, but some of it isn't quite clear to me. I mean, I could make it work for 20% of the cases, but not 100% of the cases... and that is unsatisfactory!"

Thanks guys, hope this post clearly conveyed my troubles.

grail · 09-20-2010, 12:35 AM

Well I guess my first question back to you would be why the need to be done in C seeing that you are constantly making calls to shell based programs??
It does appear this would be better in a script of some kind.

Assuming you still wish to go ahead with C, maybe you can show us your code that works for the 20% and outline where you get stuck trying to get to 100%?

hda7 · 09-20-2010, 11:19 AM

Quote:

Originally Posted by grail

Well I guess my first question back to you would be why the need to be done in C seeing that you are constantly making calls to shell based programs??
It does appear this would be better in a script of some kind.

Assuming you still wish to go ahead with C, maybe you can show us your code that works for the 20% and outline where you get stuck trying to get to 100%?

I would also agree that C is probably not the best language for your task. Unless you need your program to be written in C, I would use something like bash or perl.

instag · 09-20-2010, 12:38 PM

I can't help you with C, but I agree with the others that a script is enough for that kind of task.
When I deal with a problem like that, I also try to simplify things first. In your case that would be:

rename all suffixes to a standard (like lowercase .rar and .par), so the script has it easy
run par2repair on all .par2 files - if repair is not required, it will just skip that step (and you run it only once for every group of files)
let unrar handle the creation of the output dirs with the "-ad" switch

Dogs · 09-20-2010, 01:21 PM

Well, this program is only going to be a few hundred lines long, so I don't care how un-intelligent the language considerations may be (at least at this point)

I will agree, however, that a BASH script or such would be the right choice for this task, but I just want to write a C program to handle as many of the lower-level issues as I can think of!

I don't have a completed version yet, and there is a bug in my source as of right now, however it is coming along fairly well.

Code:

#include <stdio.h>


void reverse(char *, char *, int); // function for reversing an array at a later time

int main(){

	// verify we're in the right place, but if we're not, then oh well.
	printf("In directory ");
	system("pwd");
	printf("\n Verifying parity\n");
	// only checking for .par2 for now
	system("par2 v *.par2 1>var 2>/dev/null");
        
/*	if((fp = fopen("var", "r")) == NULL){
		perror("Something went terribly, terribly wrong. Hammer will fix it.");
		return 1; // Hammer did it.
	}

	char point;
	// skip to the end of the file, later this might be useful?
	while( (point = fgetc(fp)) != EOF){
		;
	}
	printf("EOF reached\n");	*/


	
	printf("Checking to see if repair is necessary\n");

	char* test_string = "All files are correct, repair is not required."; // confirmation line from par2 output
	FILE* fp;

	// check the end line in var to see if it matches the test_string
	system("tail -n1 var > conf");
	if( (fp = fopen("conf", "r")) == NULL){
		perror("Something isn't quite right!\n");
	}
	
	int testsz = 46;
	int i;
	
	// check takes place here
	char read_string[testsz];
	for(i = 0; i < testsz; i++){
		if((read_string[i] = fgetc(fp)) == EOF){
			printf("EOF Reached, breaking.\n");
			 break;
		}
	
	}


	read_string[testsz] = 0; // necessary for proper operation

	// compare the test_string and the string that was read
	if(strcmp(read_string, test_string) == 0){
		printf("Repair is not necessary\n");
	}else{
		printf("Something needs repairing.\n");
		system("par2 r *.par2 2>&1 > /dev/null");
		printf("Repair complete! (maybe)\n");
	}
	fclose(fp);


	printf("\n\nMoving to extraction process.\n");
	
	// list all files that are useful to me
	system("ls *.[rR]00 *.part01.rar *.zip 1> rars 2>/dev/null");

	if( (fp = fopen("rars", "r")) == NULL){
		printf("Sure you're in the right directory? I'm not.\n");
	}
	
	char *types[] = { ".r00", ".R00", ".rar", ".zip" }; // array of strings of possibilities
	char read_line[100]; // line that I read
	char read_type[5]; // last 4 characters that I read (in reverse)
	char file_ext[5]; // final usage of the last 4 characters read

	char rzero, Rzero, rar, zip; // flags for what was detected
	rzero = Rzero = rar = zip = 0;
	
	int d = 0; 
	int g = 0;
	int t = 2;
	int ti = 0;

	// the bug exists between line 89 and 134 - The nature of which is a failure to log each different filetype.
	// in this case, only one filetype is logged, however, all filetypes are detected.

	while( ( read_line[d] = fgetc(fp)) != EOF ){

		    if(read_line[d++] == '\n'){
			printf("Newline encountered at d = %d\n", d);
			printf("read_line = %s\n", read_line);
		   	 while(t < 6){
				printf("Grabbing file extension\n");
				read_type[g] = read_line[d - t];
				g++;
				t++;
			}

		d = 0;
		int g = 0;
		int t = 2;
		int ti = 0;


		printf("read_type = %s\n", read_type);
		reverse(read_type, file_ext, 5); // reverse the last 4 characters read in reverse
		printf("file_ext = %s\n", file_ext);
		
			
				// check to see which types were found
				if( strcmp(file_ext, types[0]) == 0){
					printf("types[0] is %s\n", types[0]);
					rzero = 1;
				}else if(strcmp(file_ext, types[1]) == 0){
					printf("types[1] is %s\n", types[1]);
					Rzero = 1;
				}else if(strcmp(file_ext, types[2]) == 0){
					printf("types[2] is %s\n", types[2]);
					rar = 1;
				}else if(strcmp(file_ext, types[3]) == 0){
					printf("types[3] is %s\n", types[3]);
					zip = 1;
				}else{
					printf("Next!\n");
				}

			printf("rzero found = %d, Rzero found = %d, rar found = %d, zip found = %d\n", rzero, Rzero, rar, zip);
	

		    } 
			
	}
	printf("EOF REACHED!\n");

	
}


void reverse(char back[], char forward[], int j){
	printf("Reversing the array\n");
	int x, y;
	for(x = 0, y = j-2; x < 4; x++, y--){
	/*forward[0] = back[3];
	forward[1] = back[2];
	forward[2] = back[1];
	forward[3] = back[0];*/
		forward[x] = back[y];
	}
	printf("reversed the array forward[] = %s\n", forward);
}

jiml8 · 09-20-2010, 01:25 PM

I agree that C is not an optimal language for this.

However, to get the extension, you would use pointers:

Code:

char * buf, *ptr1;

Assume that *buf points to a string that contains: Extremely-long-list-of-ingredients.vol83-93.par2\0 (note the null terminator) since you have read this in.
Then, you find the end of that string. You can do this a couple of different ways, but I'll use sizeof():

Code:

ptr1 = buf + sizeof(buf)

This makes ptr1 point to the end of the string.

Now we need the extension. To do this, scan backward through the string until we encounter a period:

Code:

while(*ptr1 != '.' && ptr1 > buf) {
    --ptr1;
}

This loop will end when the first period is encountered, or if there is no period, it will end when ptr1 is pointing at the beginning of the string (ALWAYS take steps to avoid running out of your buffer!!!!).

So, in this case, at the end, *ptr1 will point to the string ".par2", which is what you want to find.

As for the question about using wildcards, the system() command should work with that.