[SOLVED] is there even a way to speed up file loading into a sorted linked list?
ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
is there even a way to speed up file loading into a sorted linked list?
I guess this is what one could call a sorted link list.
I got
Quote:
596,221
files and it has already been over an hour and a half and it is still loading them into a linked list that is sorting them alphabetically by using strcmp.
I've googled this question but got nothing, I think it is limited by hardware speed and having to check every file first. I was just wondering if their is a quicker way of doing this.
I did as posted I did.
What key words did as you use?
The search terms I used are visible in the links I provided for those who will follow them.
But what is actually relevant is, what searches have you done and what have you found and tried before posting. But once more you did not include that relevant information in your post
As I have asked repeatedly, please end this pattern of asking others to perform your basic work for you, or please consider findling help elsewhere, per the LQ posting guidelines to which I again refer you.
do you want to speed up file loading or sorting or both? Do you want to sort by filename, by size or by content or what?
If you sort by names you do not need to load those files into memory at all. How big is that 596,221 files altogether? Will that fit into RAM at all?
You gave almost no any information in your original post...
I guess this is what one could call a sorted link list.
Quote:
596,221
I got files and it has already been over an hour and a half and it is still loading them into a linked list that is sorting them alphabetically by using strcmp.
I've googled this question but got nothing, I think it is limited by hardware speed and having to check every file first. I was just wondering if their is a quicker way of doing this.
Quote:
Originally Posted by astrogeek
The search terms I used are visible in the links I provided for those who will follow them.
But what is actually relevant is, what searches have you done and what have you found and tried before posting. But once more you did not include that relevant information in your post
As I have asked repeatedly, please end this pattern of asking others to perform your basic work for you, or please consider findling help elsewhere, per the LQ posting guidelines to which I again refer you.
Quote:
Originally Posted by pan64
do you want to speed up file loading or sorting or both? Do you want to sort by filename, by size or by content or what?
If you sort by names you do not need to load those files into memory at all. How big is that 596,221 files altogether? Will that fit into RAM at all?
You gave almost no any information in your original post...
I would suggest that you take a bit of time to consider the feedback of both astrogeek as well as pan64. My thinking is that you have some experience of time with LQ as well as recurrence, due to the high post count. And given that you've started a number of programming threads as of late, and been given feedback that questions are written either confusing, or incomplete, it would seem to make sense that you would put the effort forwards to improve the quality of how you ask.
Think of it in the other direction. You, yourself have offered advice to other LQ members. We're happy to offer advice towards your programming questions. We do realize that you are progressing very rapidly, being tenacious with attacking your problems, and working very diligently towards solutions. Perhaps a better tact might be that when you find something you don't understand, is to work more at it, as you obviously are doing, but to not post a LQ thread just yet until you have considered all your options first.
What you are doing is potentially frustrating those who read your threads. First, they see a fast changing thread, and second, they see very unclear, or incomplete information.
Ask yourself how you would react to another member doing this same thing? What Linux talents are you a very accomplished expert at? Say it were regex, and you saw a regex question. But then you went to answer it and found that the OP edited their post, updated it a few times even before you were able to answer their first question. Or say they just put in some regex and said "It doesn't work", meanwhile you're asking "What is your input? What did you originally want it to do?" Sorry to say, but this is a similar position you are placing others into with threads like these.
If you want a better answer, then post the following code:
Post the code that "loads" the "files" (no, your node insert code obviously isn't it)
And also post the code that's sorting them. That wasn't in the snippet you posted either.
I really didn't want to post my entire code only to have to worry about a someone that seems to like to .... about everything I do in there now.
I switched it back to the node I was using before hand that loads first come first serve and I loaded
Code:
1,135,597
files in a matter of seconds. So now I believe I'm changing my plan of attack to looking into a short algorithm that is quickly sort it alphabetically, just to make it a feature/option after it is loaded.
Because it seems to me that the information is already in memory so it should be faster afterwords?
Keeping mind this is my very first time dealing with linked list and I am playing quick study with the mind set of only wanting the necessary information to get this job done. I am not going for a job at this nor am I going for some nobel prize over this. Truth be told.
my knowledge that I gain on this will only be used for this one thing I am updating, that is as far as I can see it going. After it is finished then it goes into that bin in the back of ones mind.
If you still want to look at the code that does the work to get the files. Here you go.
Code:
void add_file_to_filelist_recursively(char *origpath)
{
struct stat st;
char *path;
int recursive = 1;
if (!origpath)
return;
path = strdup(origpath);
if (level == FILELIST_FIRST)
{
// First time through, sort out pathname
int len = 0;
len = strlen(path);
if (path[len - 1] == '/')
path[len - 1] = '\0';
char *newpath = get_absolute_path(path);
free(path);
path = newpath;
}
errno = 0;
if (stat(path, &st)) {
free(path);
return;
}
if ((S_ISDIR(st.st_mode)) && (level != FILELIST_LAST)) {
struct dirent **de;
DIR *dir;
int cnt, n;
if ((dir = opendir(path)) == NULL) {
printf("couldn't open directory %s:", path);
free(path);
return;
}
n = scandir(path, &de, NULL, NULL);
if (n < 0) {
switch (errno) {
case ENOMEM:
printf("Insufficient memory to scan directory %s:", path);
break;
default:
printf("Failed to scan directory %s:", path);
}
} else {
for (cnt = 0; cnt < n; cnt++) {
if (strcmp(de[cnt]->d_name, ".")
&& strcmp(de[cnt]->d_name, "..")) {
char *newfile;
// function that adds absolute path to filename.
newfile = strjoin("", path, "/", de[cnt]->d_name, NULL);
//gets recursive always
if (recursive)
{
add_file_to_filelist_recursively(newfile, FILELIST_CONTINUE);
}
else
add_file_to_filelist_recursively(newfile, FILELIST_LAST);
free(newfile);
}
free(de[cnt]);
}
free(de);
}
closedir(dir);
} else if (S_ISREG(st.st_mode)) {
fcount++;
if (opts.printlistintake == 1)
printf("File: %s\n", path);
//sorted list
// insertNode(&filelist, path, fcount);
// first come first serve
add_2_front(&filelist, path, fcount);
}
free(path);
return;
}
What that does is recursively go through the filesystem, grab the size and path for every file, and then group them by size. My program is very fast, despite being written in Python, and it does a lot of the same stuff that I see in your new code.
What that does is recursively go through the filesystem, grab the size and path for every file, and then group them by size. My program is very fast, despite being written in Python, and it does a lot of the same stuff that I see in your new code.
I'd have to say it is that when it is comparing the file names in that node function -- that is all I can see why -- having to take every one of them and look at them and set them in order.
I do not have any experience in strcmp beyond 2 files, this is over 1 million files. I am sure that is going to take some time, regardless?
i'd have to say it is that when it is comparing the file names in that node function -- that is all i can see why -- having to take every one of them and look at them and set them in order.
I do not have any experience in strcmp beyond 2 files, this is over 1 million files. I am sure that is going to take some time, regardless?
Aah, okay. I think you're right that that's what's slowing it down. I can also see why now.
Code:
while (*head && strcmp((*head)->filename, filename) < 0)
head = &(*head)->next;
You're calling this for every file, right?
When you insert the first file, you do no strcmps.
When you insert the second file, you do one strcmp.
When you insert the third file, you might be doing two strcmps.
So for n files, you might be doing as many as 1 + 2 + 3 + ... + n - 1 strcmp operations.
Do you see how that might be modeled mathematically? It's been a long time since I've actually done this, so I'll just link to this explanation:
For n files, you're doing roughly n-squared strcmps.
That means that for a million files, you're doing a million times a million strcmps. That's a trillion strcmps.
And that's why you should switch from a linked list to a binary tree. It would reduce the number of strcmp operations from n-squared to n times log n. Which is MUCH fewer strcmps.
EDIT: to be clear, using a binary tree will literally make your program a million times faster. This is especially true if you go all out and implement something like a red-black tree.
Some comments about that code and strcmp() as well as strjoin().
Yes, these functions are inherently slow.
There is no need to call a function when you know explicit and very simple "placement" as well as "pattern".
Comparing a highly specific index offset against '.' or 0x2E is far faster than calling a library function.
Thus something like (example):
Code:
#define PERIOD 0x2e
if ((str[cnt] == PERIOD) && (str[cnt+1] == PERIOD) && (str[cnt+2] == 0x00))
This will serve and be far, far more rapid than strcmp as it appears to be, being used in that code.
Further, I do not like strcmp() only and especially if the strings can be "whatever", such as filenames allowing varied patterns. Why? You can encounter an alias, not a bash alias, but a false detection. "And you will!", because for those out there who are experienced programmers, you know that if there is a remote possibility that your code could hit that one improbable condition ... IT WILL! Thus you are doing a strcmp() against ".", well you can possibly hit that if the first character is a dot.
Alternatives to strjoin() might be any of memmove(), memcpy(), strcat(), or byte by byte copy directly inline using a local index variable.
I honestly try as hard as possible to do zero copy code for when data is being pushed around a system.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.