LinuxQuestions.org
Latest LQ Deal: Latest LQ Deals
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 09-05-2017, 02:39 PM   #1
BW-userx
LQ Guru
 
Registered: Sep 2013
Location: Somewhere in my head.
Distribution: Slackware (15 current), Slack15, Ubuntu studio, MX Linux, FreeBSD 13.1, WIn10
Posts: 10,342

Rep: Reputation: 2242Reputation: 2242Reputation: 2242Reputation: 2242Reputation: 2242Reputation: 2242Reputation: 2242Reputation: 2242Reputation: 2242Reputation: 2242Reputation: 2242
is there even a way to speed up file loading into a sorted linked list?


I guess this is what one could call a sorted link list.

I got
Quote:
596,221
files and it has already been over an hour and a half and it is still loading them into a linked list that is sorting them alphabetically by using strcmp.
Code:
void insertNode(mylist** head, char *filename, int c)
{
  //  printf("Adding: %s\n", filename);

    while (*head && strcmp((*head)->filename, filename) < 0)
        head = &(*head)->next;

    mylist *p = (mylist *) malloc(sizeof *p);
    p->filename = strdup(filename);
    p->listcount = c;
    p->next = *head;
    *head = p;
}
I've googled this question but got nothing, I think it is limited by hardware speed and having to check every file first. I was just wondering if their is a quicker way of doing this.

Last edited by BW-userx; 09-05-2017 at 02:44 PM.
 
Old 09-05-2017, 03:01 PM   #2
astrogeek
Moderator
 
Registered: Oct 2008
Distribution: Slackware [64]-X.{0|1|2|37|-current} ::12<=X<=15, FreeBSD_12{.0|.1}
Posts: 6,263
Blog Entries: 24

Rep: Reputation: 4194Reputation: 4194Reputation: 4194Reputation: 4194Reputation: 4194Reputation: 4194Reputation: 4194Reputation: 4194Reputation: 4194Reputation: 4194Reputation: 4194
Quote:
Originally Posted by BW-userx View Post
I've googled this question but got nothing...
Seriously? I see lots of helpful links on google and DDG.

As you have been asked before, please do your own basic research before posting questions.

Last edited by astrogeek; 09-05-2017 at 03:07 PM.
 
Old 09-05-2017, 04:25 PM   #3
dugan
LQ Guru
 
Registered: Nov 2003
Location: Canada
Distribution: distro hopper
Posts: 11,225

Rep: Reputation: 5320Reputation: 5320Reputation: 5320Reputation: 5320Reputation: 5320Reputation: 5320Reputation: 5320Reputation: 5320Reputation: 5320Reputation: 5320Reputation: 5320
Quote:
Originally Posted by BW-userx View Post
loading them into a linked list that is sorting them alphabetically by using strcmp.

I was just wondering if their is a quicker way of doing this.
Yes. Use a binary tree.

Last edited by dugan; 09-05-2017 at 04:39 PM.
 
1 members found this post helpful.
Old 09-05-2017, 05:49 PM   #4
BW-userx
LQ Guru
 
Registered: Sep 2013
Location: Somewhere in my head.
Distribution: Slackware (15 current), Slack15, Ubuntu studio, MX Linux, FreeBSD 13.1, WIn10
Posts: 10,342

Original Poster
Rep: Reputation: 2242Reputation: 2242Reputation: 2242Reputation: 2242Reputation: 2242Reputation: 2242Reputation: 2242Reputation: 2242Reputation: 2242Reputation: 2242Reputation: 2242
Quote:
Originally Posted by astrogeek View Post
Seriously? I see lots of helpful links on google and DDG.

As you have been asked before, please do your own basic research before posting questions.
Seriously, no I was lying.

I did as posted I did.
What key words did as you use?

Last edited by BW-userx; 09-05-2017 at 05:55 PM.
 
Old 09-05-2017, 05:51 PM   #5
BW-userx
LQ Guru
 
Registered: Sep 2013
Location: Somewhere in my head.
Distribution: Slackware (15 current), Slack15, Ubuntu studio, MX Linux, FreeBSD 13.1, WIn10
Posts: 10,342

Original Poster
Rep: Reputation: 2242Reputation: 2242Reputation: 2242Reputation: 2242Reputation: 2242Reputation: 2242Reputation: 2242Reputation: 2242Reputation: 2242Reputation: 2242Reputation: 2242
Quote:
Originally Posted by dugan View Post
Yes. Use a binary tree.
http://www.geeksforgeeks.org/binary-...ata-structure/

book marked.

thanks

Last edited by BW-userx; 09-05-2017 at 05:54 PM.
 
Old 09-06-2017, 03:57 AM   #6
astrogeek
Moderator
 
Registered: Oct 2008
Distribution: Slackware [64]-X.{0|1|2|37|-current} ::12<=X<=15, FreeBSD_12{.0|.1}
Posts: 6,263
Blog Entries: 24

Rep: Reputation: 4194Reputation: 4194Reputation: 4194Reputation: 4194Reputation: 4194Reputation: 4194Reputation: 4194Reputation: 4194Reputation: 4194Reputation: 4194Reputation: 4194
Quote:
Originally Posted by BW-userx View Post
Seriously, no I was lying.

I did as posted I did.
What key words did as you use?
The search terms I used are visible in the links I provided for those who will follow them.

But what is actually relevant is, what searches have you done and what have you found and tried before posting. But once more you did not include that relevant information in your post

As I have asked repeatedly, please end this pattern of asking others to perform your basic work for you, or please consider findling help elsewhere, per the LQ posting guidelines to which I again refer you.
 
1 members found this post helpful.
Old 09-06-2017, 04:42 AM   #7
pan64
LQ Addict
 
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 21,842

Rep: Reputation: 7308Reputation: 7308Reputation: 7308Reputation: 7308Reputation: 7308Reputation: 7308Reputation: 7308Reputation: 7308Reputation: 7308Reputation: 7308Reputation: 7308
do you want to speed up file loading or sorting or both? Do you want to sort by filename, by size or by content or what?
If you sort by names you do not need to load those files into memory at all. How big is that 596,221 files altogether? Will that fit into RAM at all?
You gave almost no any information in your original post...
 
3 members found this post helpful.
Old 09-06-2017, 07:23 AM   #8
jamison20000e
Senior Member
 
Registered: Nov 2005
Location: ...uncanny valley... infinity\1975; (randomly born:) Milwaukee, WI, US( + travel,) Earth&Mars (I wish,) END BORDER$!◣◢┌∩┐ Fe26-E,e...
Distribution: any GPL that work on freest-HW; has been KDE, CLI, Novena-SBC but open.. http://goo.gl/NqgqJx &c ;-)
Posts: 4,888
Blog Entries: 2

Rep: Reputation: 1567Reputation: 1567Reputation: 1567Reputation: 1567Reputation: 1567Reputation: 1567Reputation: 1567Reputation: 1567Reputation: 1567Reputation: 1567Reputation: 1567
My SSD partitions have realtime enabled so I can use, eg: sudo systemmonitor to give even misbehaving processes too much power... and a chillpad.
 
1 members found this post helpful.
Old 09-06-2017, 07:57 AM   #9
rtmistler
Moderator
 
Registered: Mar 2011
Location: USA
Distribution: MINT Debian, Angstrom, SUSE, Ubuntu, Debian
Posts: 9,882
Blog Entries: 13

Rep: Reputation: 4930Reputation: 4930Reputation: 4930Reputation: 4930Reputation: 4930Reputation: 4930Reputation: 4930Reputation: 4930Reputation: 4930Reputation: 4930Reputation: 4930
Quote:
Originally Posted by BW-userx View Post
I guess this is what one could call a sorted link list.
Quote:
596,221
I got files and it has already been over an hour and a half and it is still loading them into a linked list that is sorting them alphabetically by using strcmp.
Code:
void insertNode(mylist** head, char *filename, int c)
{
  //  printf("Adding: %s\n", filename);

    while (*head && strcmp((*head)->filename, filename) < 0)
        head = &(*head)->next;

    mylist *p = (mylist *) malloc(sizeof *p);
    p->filename = strdup(filename);
    p->listcount = c;
    p->next = *head;
    *head = p;
}
I've googled this question but got nothing, I think it is limited by hardware speed and having to check every file first. I was just wondering if their is a quicker way of doing this.
Quote:
Originally Posted by astrogeek View Post
The search terms I used are visible in the links I provided for those who will follow them.

But what is actually relevant is, what searches have you done and what have you found and tried before posting. But once more you did not include that relevant information in your post

As I have asked repeatedly, please end this pattern of asking others to perform your basic work for you, or please consider findling help elsewhere, per the LQ posting guidelines to which I again refer you.
Quote:
Originally Posted by pan64 View Post
do you want to speed up file loading or sorting or both? Do you want to sort by filename, by size or by content or what?
If you sort by names you do not need to load those files into memory at all. How big is that 596,221 files altogether? Will that fit into RAM at all?
You gave almost no any information in your original post...
I would suggest that you take a bit of time to consider the feedback of both astrogeek as well as pan64. My thinking is that you have some experience of time with LQ as well as recurrence, due to the high post count. And given that you've started a number of programming threads as of late, and been given feedback that questions are written either confusing, or incomplete, it would seem to make sense that you would put the effort forwards to improve the quality of how you ask.

Think of it in the other direction. You, yourself have offered advice to other LQ members. We're happy to offer advice towards your programming questions. We do realize that you are progressing very rapidly, being tenacious with attacking your problems, and working very diligently towards solutions. Perhaps a better tact might be that when you find something you don't understand, is to work more at it, as you obviously are doing, but to not post a LQ thread just yet until you have considered all your options first.

What you are doing is potentially frustrating those who read your threads. First, they see a fast changing thread, and second, they see very unclear, or incomplete information.

Ask yourself how you would react to another member doing this same thing? What Linux talents are you a very accomplished expert at? Say it were regex, and you saw a regex question. But then you went to answer it and found that the OP edited their post, updated it a few times even before you were able to answer their first question. Or say they just put in some regex and said "It doesn't work", meanwhile you're asking "What is your input? What did you originally want it to do?" Sorry to say, but this is a similar position you are placing others into with threads like these.
 
1 members found this post helpful.
Old 09-06-2017, 08:57 AM   #10
dugan
LQ Guru
 
Registered: Nov 2003
Location: Canada
Distribution: distro hopper
Posts: 11,225

Rep: Reputation: 5320Reputation: 5320Reputation: 5320Reputation: 5320Reputation: 5320Reputation: 5320Reputation: 5320Reputation: 5320Reputation: 5320Reputation: 5320Reputation: 5320
If you want a better answer, then post the following code:

Quote:
Originally Posted by BW-userx View Post
I got files... it is still loading them into a linked list
Post the code that "loads" the "files" (no, your node insert code obviously isn't it)

Quote:
that is sorting them alphabetically by using strcmp.
And also post the code that's sorting them. That wasn't in the snippet you posted either.
 
Old 09-06-2017, 09:36 AM   #11
BW-userx
LQ Guru
 
Registered: Sep 2013
Location: Somewhere in my head.
Distribution: Slackware (15 current), Slack15, Ubuntu studio, MX Linux, FreeBSD 13.1, WIn10
Posts: 10,342

Original Poster
Rep: Reputation: 2242Reputation: 2242Reputation: 2242Reputation: 2242Reputation: 2242Reputation: 2242Reputation: 2242Reputation: 2242Reputation: 2242Reputation: 2242Reputation: 2242
Quote:
Originally Posted by dugan View Post
If you want a better answer, then post the following code:



Post the code that "loads" the "files" (no, your node insert code obviously isn't it)



And also post the code that's sorting them. That wasn't in the snippet you posted either.
I really didn't want to post my entire code only to have to worry about a someone that seems to like to .... about everything I do in there now.


I switched it back to the node I was using before hand that loads first come first serve and I loaded
Code:
1,135,597
files in a matter of seconds. So now I believe I'm changing my plan of attack to looking into a short algorithm that is quickly sort it alphabetically, just to make it a feature/option after it is loaded.

Because it seems to me that the information is already in memory so it should be faster afterwords?

Keeping mind this is my very first time dealing with linked list and I am playing quick study with the mind set of only wanting the necessary information to get this job done. I am not going for a job at this nor am I going for some nobel prize over this. Truth be told.

my knowledge that I gain on this will only be used for this one thing I am updating, that is as far as I can see it going. After it is finished then it goes into that bin in the back of ones mind.

If you still want to look at the code that does the work to get the files. Here you go.
Code:
void add_file_to_filelist_recursively(char *origpath)
{
	struct stat st;
	char *path;
	int recursive = 1;

	if (!origpath)
		return;

	path = strdup(origpath);

	if (level == FILELIST_FIRST)

	{
		// First time through, sort out pathname 
		int len = 0;

		len = strlen(path);
		if (path[len - 1] == '/')
			path[len - 1] = '\0';

	char *newpath = get_absolute_path(path);

			free(path);
			path = newpath;

	}

	errno = 0;
	if (stat(path, &st)) {

		free(path);
		return;
	}

	if ((S_ISDIR(st.st_mode)) && (level != FILELIST_LAST)) {
		struct dirent **de;
		DIR *dir;
		int cnt, n;

		if ((dir = opendir(path)) == NULL) {
            printf("couldn't open directory %s:", path);
			free(path);
			return;
		}
		n = scandir(path, &de, NULL, NULL);
		if (n < 0) {
			switch (errno) {
			case ENOMEM:
				printf("Insufficient memory to scan directory %s:", path);
				break;
			default:
				printf("Failed to scan directory %s:", path);
			}
		} else {
			for (cnt = 0; cnt < n; cnt++) {
				if (strcmp(de[cnt]->d_name, ".")
						&& strcmp(de[cnt]->d_name, "..")) {
					char *newfile;
                                             // function that adds absolute path to filename.
					newfile = strjoin("", path, "/", de[cnt]->d_name, NULL);

					//gets recursive always
					if (recursive)
					{

						add_file_to_filelist_recursively(newfile, FILELIST_CONTINUE);
					}
					else

						add_file_to_filelist_recursively(newfile, FILELIST_LAST);
                        free(newfile);
				}
				free(de[cnt]);
			}
			free(de);
		}
		closedir(dir);
	} else if (S_ISREG(st.st_mode)) {

        fcount++;

      if (opts.printlistintake == 1)
            printf("File: %s\n", path);

        //sorted list
       // insertNode(&filelist, path, fcount);

       // first come first serve
        add_2_front(&filelist, path, fcount);

	}
	free(path);
	return;
}
add_2_front function
Code:
void add_2_front(mylist** filelist, char *filename, int c)
{
    mylist* new_file = (mylist *)malloc(sizeof(mylist));
	new_file->filename = strdup(filename);
	new_file->listcount = c;



	if(!(*filelist)){
		new_file->next = NULL;
		(*filelist) = new_file;
	}else{
		new_file->next = (*filelist);
		(*filelist) = new_file;
	}
 return;
}
 
Old 09-06-2017, 12:31 PM   #12
dugan
LQ Guru
 
Registered: Nov 2003
Location: Canada
Distribution: distro hopper
Posts: 11,225

Rep: Reputation: 5320Reputation: 5320Reputation: 5320Reputation: 5320Reputation: 5320Reputation: 5320Reputation: 5320Reputation: 5320Reputation: 5320Reputation: 5320Reputation: 5320
I don't immediately see in the new code where the slowdown is. It is, however, similar to something I've written that you might want to look at:

https://gist.github.com/duganchen/1e917c11fce44267b4c4

What that does is recursively go through the filesystem, grab the size and path for every file, and then group them by size. My program is very fast, despite being written in Python, and it does a lot of the same stuff that I see in your new code.

Last edited by dugan; 09-06-2017 at 12:32 PM.
 
Old 09-06-2017, 01:13 PM   #13
BW-userx
LQ Guru
 
Registered: Sep 2013
Location: Somewhere in my head.
Distribution: Slackware (15 current), Slack15, Ubuntu studio, MX Linux, FreeBSD 13.1, WIn10
Posts: 10,342

Original Poster
Rep: Reputation: 2242Reputation: 2242Reputation: 2242Reputation: 2242Reputation: 2242Reputation: 2242Reputation: 2242Reputation: 2242Reputation: 2242Reputation: 2242Reputation: 2242
Quote:
Originally Posted by dugan View Post
I don't immediately see in the new code where the slowdown is. It is, however, similar to something I've written that you might want to look at:

https://gist.github.com/duganchen/1e917c11fce44267b4c4

What that does is recursively go through the filesystem, grab the size and path for every file, and then group them by size. My program is very fast, despite being written in Python, and it does a lot of the same stuff that I see in your new code.
I'd have to say it is that when it is comparing the file names in that node function -- that is all I can see why -- having to take every one of them and look at them and set them in order.

I do not have any experience in strcmp beyond 2 files, this is over 1 million files. I am sure that is going to take some time, regardless?
 
Old 09-06-2017, 01:28 PM   #14
dugan
LQ Guru
 
Registered: Nov 2003
Location: Canada
Distribution: distro hopper
Posts: 11,225

Rep: Reputation: 5320Reputation: 5320Reputation: 5320Reputation: 5320Reputation: 5320Reputation: 5320Reputation: 5320Reputation: 5320Reputation: 5320Reputation: 5320Reputation: 5320
Quote:
Originally Posted by bw-userx View Post
i'd have to say it is that when it is comparing the file names in that node function -- that is all i can see why -- having to take every one of them and look at them and set them in order.

I do not have any experience in strcmp beyond 2 files, this is over 1 million files. I am sure that is going to take some time, regardless?
Aah, okay. I think you're right that that's what's slowing it down. I can also see why now.

Code:
while (*head && strcmp((*head)->filename, filename) < 0)
        head = &(*head)->next;
You're calling this for every file, right?

When you insert the first file, you do no strcmps.

When you insert the second file, you do one strcmp.

When you insert the third file, you might be doing two strcmps.

So for n files, you might be doing as many as 1 + 2 + 3 + ... + n - 1 strcmp operations.

Do you see how that might be modeled mathematically? It's been a long time since I've actually done this, so I'll just link to this explanation:

https://brilliant.org/wiki/sum-of-n-n2-or-n3/

Okay, do you see the problem now?

For n files, you're doing roughly n-squared strcmps.

That means that for a million files, you're doing a million times a million strcmps. That's a trillion strcmps.

And that's why you should switch from a linked list to a binary tree. It would reduce the number of strcmp operations from n-squared to n times log n. Which is MUCH fewer strcmps.

EDIT: to be clear, using a binary tree will literally make your program a million times faster. This is especially true if you go all out and implement something like a red-black tree.

Last edited by dugan; 09-06-2017 at 04:57 PM.
 
1 members found this post helpful.
Old 09-06-2017, 01:47 PM   #15
rtmistler
Moderator
 
Registered: Mar 2011
Location: USA
Distribution: MINT Debian, Angstrom, SUSE, Ubuntu, Debian
Posts: 9,882
Blog Entries: 13

Rep: Reputation: 4930Reputation: 4930Reputation: 4930Reputation: 4930Reputation: 4930Reputation: 4930Reputation: 4930Reputation: 4930Reputation: 4930Reputation: 4930Reputation: 4930
Some comments about that code and strcmp() as well as strjoin().

Yes, these functions are inherently slow.

There is no need to call a function when you know explicit and very simple "placement" as well as "pattern".

Comparing a highly specific index offset against '.' or 0x2E is far faster than calling a library function.

Thus something like (example):
Code:
#define PERIOD 0x2e

if ((str[cnt] == PERIOD) && (str[cnt+1] == PERIOD) && (str[cnt+2] == 0x00))
This will serve and be far, far more rapid than strcmp as it appears to be, being used in that code.

Further, I do not like strcmp() only and especially if the strings can be "whatever", such as filenames allowing varied patterns. Why? You can encounter an alias, not a bash alias, but a false detection. "And you will!", because for those out there who are experienced programmers, you know that if there is a remote possibility that your code could hit that one improbable condition ... IT WILL! Thus you are doing a strcmp() against ".", well you can possibly hit that if the first character is a dot.

Alternatives to strjoin() might be any of memmove(), memcpy(), strcat(), or byte by byte copy directly inline using a local index variable.

I honestly try as hard as possible to do zero copy code for when data is being pushed around a system.
 
1 members found this post helpful.
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
How can this list be sorted by file size 'high to low' Glenn D. Linux - Newbie 9 04-18-2015 02:42 PM
[SOLVED] How to list only files, sorted by date? Calab Linux - Newbie 2 03-25-2014 05:56 PM
[SOLVED] sort photos manually then list sorted filenames to text file carolus Linux - Software 15 03-13-2014 10:54 AM
Sorted List a.haddad Programming 22 01-03-2013 02:58 AM
[SOLVED] looking for a way to list files sorted by size mark_alfred Linux - General 4 08-22-2009 09:20 AM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 06:17 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration