LinuxQuestions.org
Latest LQ Deal: Latest LQ Deals
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 06-09-2013, 03:54 AM   #16
H_TeXMeX_H
LQ Guru
 
Registered: Oct 2005
Location: $RANDOM
Distribution: slackware64
Posts: 12,928

Original Poster
Blog Entries: 2

Rep: Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301

Alright, here is the final program

Code:
// output character statistics
#include <stdio.h>
#include <math.h>
#include <limits.h>

int main (void)
{
	// ! init
	unsigned char cluster[BUFSIZ];
	unsigned int i;
	unsigned int ret;

	// init
	const unsigned int uchartot = UCHAR_MAX + 1; // = 256

	unsigned long count[uchartot];

	for (i = 0; i < uchartot; i++)
	{
		count[i] = 0;
	}

	// input loop
	while ((ret = fread (cluster, 1, BUFSIZ, stdin)))
	{
		for (i = 0; i < ret; i++)
		{
			count[cluster[i]]++;
		}
	}

	// total calc
	for (i = 0; i < uchartot; i++)
	{
		printf ("%u\t%lu\n", i, count[i]);
	}

	return 0;
}
The thread is solved.
 
Old 06-10-2013, 03:41 AM   #17
mina86
Member
 
Registered: Aug 2008
Distribution: Debian
Posts: 517

Rep: Reputation: 229Reputation: 229Reputation: 229
By the way, you could do w/o uchartot:
Code:
#include <stdio.h>
#include <math.h>
#include <limits.h>

int main (void)
{
	// ! init
	unsigned char cluster[BUFSIZ];
	unsigned int i;
	unsigned int ret;

	// init
	unsigned long count[UCHAR_MAX + 1];
	for (i = 0; i < sizeof count / sizeof *count; i++)
	{
		count[i] = 0;
	}

	// input loop
	while ((ret = fread (cluster, 1, BUFSIZ, stdin)))
	{
		for (i = 0; i < ret; i++)
		{
			count[cluster[i]]++;
		}
	}

	// total calc
	for (i = 0; i < sizeof count / sizeof *count; i++)
	{
		printf ("%u\t%lu\n", i, count[i]);
	}

	return 0;
}
I usually define a macro for that:
Code:
#define ARRAY_SIZE(arr) (sizeof(arr) / sizeof *(arr))
 
Old 06-10-2013, 07:54 AM   #18
H_TeXMeX_H
LQ Guru
 
Registered: Oct 2005
Location: $RANDOM
Distribution: slackware64
Posts: 12,928

Original Poster
Blog Entries: 2

Rep: Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301
That code is slightly slower, probably because you're doing extra math as part of the loop. That's why I prefer to do the math before the loop. Technically, the compiler should be able to detect and calculate it and avoid the slight overhead.
 
Old 06-10-2013, 10:21 AM   #19
mina86
Member
 
Registered: Aug 2008
Distribution: Debian
Posts: 517

Rep: Reputation: 229Reputation: 229Reputation: 229
Quote:
Originally Posted by H_TeXMeX_H View Post
That code is slightly slower, probably because you're doing extra math as part of the loop.
If your compiler does not optimise the calculation, than change your compiler… I don't believe that it does not optimise this.
 
Old 07-21-2013, 08:06 AM   #20
H_TeXMeX_H
LQ Guru
 
Registered: Oct 2005
Location: $RANDOM
Distribution: slackware64
Posts: 12,928

Original Poster
Blog Entries: 2

Rep: Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301
I found a way to optimize it just a bit more, especially for 64-bit processors.

Code:
// output character statistics
#include <stdio.h>
#include <math.h>
#include <stdint.h>

int main (void)
{
	// ! init
	uint64_t cluster[BUFSIZ];
	unsigned int i;
	unsigned int ret;

	// init
	const unsigned int uint8tot = UINT8_MAX + 1; // = 256

	unsigned long count[uint8tot];

	for (i = 0; i < uint8tot; i++)
	{
		count[i] = 0;
	}

	// input loop
	while ((ret = fread (cluster, 8, BUFSIZ, stdin)))
	{
		for (i = 0; i < ret; i++)
		{
			count[ cluster[i]        & 0xff]++;
			count[(cluster[i] >>  8) & 0xff]++;
			count[(cluster[i] >> 16) & 0xff]++;
			count[(cluster[i] >> 24) & 0xff]++;
			count[(cluster[i] >> 32) & 0xff]++;
			count[(cluster[i] >> 40) & 0xff]++;
			count[(cluster[i] >> 48) & 0xff]++;
			count[(cluster[i] >> 56) & 0xff]++;
		}
	}

	// total calc
	for (i = 0; i < uint8tot; i++)
	{
		printf ("%3u\t%lu\n", i, count[i]);
	}

	return 0;
}
This does it in 0.113s, versus 0.132s.

Probably this compiles to better assembly code ?
 
Old 07-21-2013, 10:38 AM   #21
psionl0
Member
 
Registered: Jan 2011
Distribution: slackware_64 14.1
Posts: 722
Blog Entries: 2

Rep: Reputation: 124Reputation: 124
Would it not be faster to read the entire file into memory then do the computations?
 
Old 07-21-2013, 11:47 AM   #22
H_TeXMeX_H
LQ Guru
 
Registered: Oct 2005
Location: $RANDOM
Distribution: slackware64
Posts: 12,928

Original Poster
Blog Entries: 2

Rep: Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301
Quote:
Originally Posted by psionl0 View Post
Would it not be faster to read the entire file into memory then do the computations?
Not really, and this would also limit the input file size. I would also prefer that the data be piped in.

Code:
// output character statistics
#include <stdio.h>
#include <math.h>
#include <stdint.h>
#include <stdlib.h>

int main (int argc, char *argv[])
{
	// parse args
	if (2 != argc)
	{
		printf ("Usage:\n%s filename\n", argv[0]);
		return 1;
	}

	// ! init
	unsigned int i;
	long fsize;

	// init
	const unsigned int uint8tot = UINT8_MAX + 1; // = 256

	unsigned long count[uint8tot];
	for (i = 0; i < uint8tot; i++)
	{
		count[i] = 0;
	}

	// open file
	FILE * fp = fopen (argv[1], "rb");
	if (NULL == fp)
	{
		printf ("ERROR: Cannot open %s\n", argv[1]);
		return 2;
	}

	// obtain file size
	fseek (fp, 0, SEEK_END);
	fsize = ftell (fp);
	rewind (fp);

	// malloc buffer
	uint8_t * buffer = (uint8_t *) malloc (fsize);
	if (NULL == buffer)
	{
		printf ("ERROR: Not enough memory to contain %ld byte %s\n", fsize, argv[1]);
		return 3;
	}

	// copy file into buffer
	if (! fread (buffer, 1, fsize, fp))
	{
		printf ("ERROR: Could not read %s\n", argv[1]);
		return 4;
	}

	// close file
	fclose (fp);

	// input loop
	for (i = 0; i < fsize; i++)
	{
		count[buffer[i]]++;
	}

	// total calc
	for (i = 0; i < uint8tot; i++)
	{
		printf ("%3u\t%lu\n", i, count[i]);
	}

	free (buffer);

	return 0;
}
This is about twice as slow: 0.201s versus 0.113s.

Last edited by H_TeXMeX_H; 07-21-2013 at 01:15 PM.
 
Old 07-23-2013, 08:25 AM   #23
ta0kira
Senior Member
 
Registered: Sep 2004
Distribution: FreeBSD 9.1, Kubuntu 12.10
Posts: 3,078

Rep: Reputation: Disabled
I think your test cases are too small to make real comparisons. I assume you're not concerned with saving yourself 0.1 seconds of personal time, so why not run some tests that take several minutes or an hour?
Quote:
Originally Posted by mina86 View Post
This will fail if the file is very big, say 1T.
This really depends on pointer size and resource limits on address space. You definitely can't always count on being allowed 1TB of address space. I prefer using mmap, but the real problem here is that there's no good reason to prevent the program from reading from a descriptor that corresponds to a pipe or socket (or to a file on a filesystem without mmap support.)

Kevin Barry
 
Old 07-23-2013, 08:43 AM   #24
H_TeXMeX_H
LQ Guru
 
Registered: Oct 2005
Location: $RANDOM
Distribution: slackware64
Posts: 12,928

Original Poster
Blog Entries: 2

Rep: Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301
The time scales quite linearly.

As an example, piping a PRNG into this program (slight overhead from generating the random numbers):
100 MiB = 0.177s
1000 MiB = 1.708s
10000 MiB = 17.451s
 
Old 07-23-2013, 10:05 AM   #25
H_TeXMeX_H
LQ Guru
 
Registered: Oct 2005
Location: $RANDOM
Distribution: slackware64
Posts: 12,928

Original Poster
Blog Entries: 2

Rep: Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301
I have also tried an SSE2 version, but it is actually a bit slower (0.116s versus 0.113s):

Code:
// output character statistics
#include <stdio.h>
#include <math.h>
#include <stdint.h>

#include <emmintrin.h>

int main (void)
{
	// ! init
	__m128i cluster[BUFSIZ];
	unsigned int i;
	unsigned int ret;
	unsigned int chunk;

	// init
	const unsigned int uint8tot = UINT8_MAX + 1; // = 256

	unsigned long count[uint8tot];

	for (i = 0; i < uint8tot; i++)
	{
		count[i] = 0;
	}

	// input loop
	while ((ret = fread (cluster, 16, BUFSIZ, stdin)))
	{
		for (i = 0; i < ret; i++)
		{
			chunk = _mm_extract_epi16 (cluster[i], 0);
			count[chunk & 0xff]++;
			count[(chunk >> 8) & 0xff]++;

			chunk = _mm_extract_epi16 (cluster[i], 1);
			count[chunk & 0xff]++;
			count[(chunk >> 8) & 0xff]++;

			chunk = _mm_extract_epi16 (cluster[i], 2);
			count[chunk & 0xff]++;
			count[(chunk >> 8) & 0xff]++;

			chunk = _mm_extract_epi16 (cluster[i], 3);
			count[chunk & 0xff]++;
			count[(chunk >> 8) & 0xff]++;

			chunk = _mm_extract_epi16 (cluster[i], 4);
			count[chunk & 0xff]++;
			count[(chunk >> 8) & 0xff]++;

			chunk = _mm_extract_epi16 (cluster[i], 5);
			count[chunk & 0xff]++;
			count[(chunk >> 8) & 0xff]++;

			chunk = _mm_extract_epi16 (cluster[i], 6);
			count[chunk & 0xff]++;
			count[(chunk >> 8) & 0xff]++;

			chunk = _mm_extract_epi16 (cluster[i], 7);
			count[chunk & 0xff]++;
			count[(chunk >> 8) & 0xff]++;
		}
	}

	// total calc
	for (i = 0; i < uint8tot; i++)
	{
		printf ("%3u\t%lu\n", i, count[i]);
	}

	return 0;
}
That pretty much concludes my attempts to optimize this code. I think it is the best it can be, while remaining portable (no assembler code). This is completely solved now.
 
Old 07-23-2013, 01:06 PM   #26
ta0kira
Senior Member
 
Registered: Sep 2004
Distribution: FreeBSD 9.1, Kubuntu 12.10
Posts: 3,078

Rep: Reputation: Disabled
I'm sure the postincrement operations are optimized out, but just in case, try using preincrement. You could also directly increment a pointer to the buffer instead of using an index, e.g.
Code:
	for (uint8_t *current = buffer, *end = buffer + fsize; current < end;)
	{
		++count[*current++];
	}
Lastly, trying to load the entire file at once isn't a good idea since you could inadvertently DoS yourself, and it won't work if you want to read from a pipe, tty, or socket (i.e. what stdin is most of the time.)

Kevin Barry
 
Old 07-24-2013, 03:44 AM   #27
H_TeXMeX_H
LQ Guru
 
Registered: Oct 2005
Location: $RANDOM
Distribution: slackware64
Posts: 12,928

Original Poster
Blog Entries: 2

Rep: Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301
Tried it, but it doesn't help. I can't do exactly that, because I don't always have an fsize, but even if I use the buffer size, it is slower.
 
Old 08-03-2013, 03:18 AM   #28
konsolebox
Senior Member
 
Registered: Oct 2005
Distribution: Gentoo, Slackware, LFS
Posts: 2,248
Blog Entries: 8

Rep: Reputation: 235Reputation: 235Reputation: 235
I think multiple ioctl calls with printf is what's really making it slow. Perhaps using multiple sprintf to a buffer first then run one-time call to print to stdout with fwrite would make it faster.

Edit: Oh sorry perhaps it won't really be helpful with just a small number (256). If you could consider iota() which is not standard it could also help as it skips the parsing of the format string. If you could create your own itoa() function then it would also be better. You just have to be careful. Something that could return the length of the generated string would also be nice since it could help you know where to write next on the buffer. This obviously is not certainly portable and output may differ on some architectures or platforms.

Last edited by konsolebox; 08-03-2013 at 03:33 AM.
 
Old 08-03-2013, 01:39 PM   #29
H_TeXMeX_H
LQ Guru
 
Registered: Oct 2005
Location: $RANDOM
Distribution: slackware64
Posts: 12,928

Original Poster
Blog Entries: 2

Rep: Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301
You are right that printf slows it down a bit, but I also do other things with the core program that requires only a single printf.
 
Old 08-03-2013, 02:01 PM   #30
ta0kira
Senior Member
 
Registered: Sep 2004
Distribution: FreeBSD 9.1, Kubuntu 12.10
Posts: 3,078

Rep: Reputation: Disabled
The output duration is going to be negligible with data sizes large enough to be important. Disk seek times and hardware interrupts will wash out all of that optimization.

Why are you using stdio.h for input? You're just adding unnecessary cycles to the input loop. Also, why not fstat the file to get the optimal block size (st_blksize) for input?

You should try your program with a huge file that's actually on the filesystem. If your process is running at less-than 100% CPU then there's a good chance you're needlessly optimizing.

Kevin Barry
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Need help with a simple If/Else code in C Exsiss Programming 4 10-02-2007 02:37 AM
optimize socket networking code powah Linux - Networking 0 09-26-2006 04:47 PM
Interpret this simple C code Chase_G Programming 4 04-29-2005 09:07 AM
to become giddy in simple code Hamid Moradmand Programming 2 05-18-2004 06:23 PM
How to optimize source code installation? Waldi Slackware 4 08-05-2003 12:47 PM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 04:56 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration