LinuxQuestions.org
Welcome to the most active Linux Forum on the web.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Software > Linux - Kernel
User Name
Password
Linux - Kernel This forum is for all discussion relating to the Linux kernel.

Notices


Reply
  Search this Thread
Old 06-19-2014, 11:03 AM   #1
lloydsu
LQ Newbie
 
Registered: Jun 2014
Posts: 3

Rep: Reputation: Disabled
Will rewrite kernel with length prefix string make things faster


I'm new to linux kernel, during the study I start wondering if rewrite entire kernel with length prefix string instead of null terminal string (for constants also) can make kernel be more efficient?
 
Old 06-19-2014, 01:34 PM   #2
rtmistler
Moderator
 
Registered: Mar 2011
Location: USA
Distribution: MINT Debian, Angstrom, SUSE, Ubuntu, Debian
Posts: 9,879
Blog Entries: 13

Rep: Reputation: 4930Reputation: 4930Reputation: 4930Reputation: 4930Reputation: 4930Reputation: 4930Reputation: 4930Reputation: 4930Reputation: 4930Reputation: 4930Reputation: 4930
There's nothing stopping someone from using length prefixed string types in any of their programming, be that kernel, device drivers, or just regular application programming.

I'm wondering how many strings the kernel is using, except debug output or logs?

Can the kernel be more efficient? I'm sure it can be, I'd recommend you visit Kernel.org and read up on their efforts; if you are able to make improvements to either data types, as above or to performance, the kernel and Linux community would certainly be happy to benefit from your efforts.

I guess the first steps would be to establish a way to measure the performance metrics of your kernel.

Build the kernel the current way and obtain your metrics.

Then rebuild the kernel by changing the type definition of a string to use length prefixed strings; figure the most painless way to make that conversion, and then rebuild and take the metrics again.
 
Old 06-22-2014, 08:07 AM   #3
sundialsvcs
LQ Guru
 
Registered: Feb 2004
Location: SE Tennessee, USA
Distribution: Gentoo, LFS
Posts: 10,632
Blog Entries: 4

Rep: Reputation: 3931Reputation: 3931Reputation: 3931Reputation: 3931Reputation: 3931Reputation: 3931Reputation: 3931Reputation: 3931Reputation: 3931Reputation: 3931Reputation: 3931
... but also ... try to make a business-case for why such a potentially massive and therefore disruptive change would be justifiable, and exactly how length-prefix strings would seamlessly, reliably (100% accuracy is required) work with all the existing zero-terminated string constants.
 
Old 06-22-2014, 06:27 PM   #4
lloydsu
LQ Newbie
 
Registered: Jun 2014
Posts: 3

Original Poster
Rep: Reputation: Disabled
Smile

Thanks for the reply
Like I said, I'm new to kernel, it is exactly the question that whether it is potentially worthwhile to do such data type change, and this is pure curiosity about why null terminal string is used, what's the reason behind it (except because we are using C),
At lease I think length prefix string is faster in strlen, strcmp and strcpy etc.?
So, I asked here and want to see what's the suggestion from you experienced guys
I think it might be difficult, but not impossible to change that.

As Rtmistler suggested there might be no much string operation there, so, I would try to find how many string operation there in the kernel first, thanks for the suggestion.
 
Old 06-22-2014, 06:53 PM   #5
jpollard
Senior Member
 
Registered: Dec 2012
Location: Washington DC area
Distribution: Fedora, CentOS, Slackware
Posts: 4,912

Rep: Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513
Quote:
Originally Posted by rtmistler View Post
There's nothing stopping someone from using length prefixed string types in any of their programming, be that kernel, device drivers, or just regular application programming.

I'm wondering how many strings the kernel is using, except debug output or logs?

Can the kernel be more efficient? I'm sure it can be, I'd recommend you visit Kernel.org and read up on their efforts; if you are able to make improvements to either data types, as above or to performance, the kernel and Linux community would certainly be happy to benefit from your efforts.

I guess the first steps would be to establish a way to measure the performance metrics of your kernel.

Build the kernel the current way and obtain your metrics.

Then rebuild the kernel by changing the type definition of a string to use length prefixed strings; figure the most painless way to make that conversion, and then rebuild and take the metrics again.

It isn't just the kernel strings - those are fairly constant (MAJOR exception are the formatted strings, and the entire sysfs and procfs are loaded with those). There are also a ton of strings passed to the kernel by userspace applications as well (again, mostly through the sysfs and procfs).

The usual problem is knowing "how large a number to allow" for the number of bytes. An 8 bit count is totally insufficient. A 16 bit count would work... except maybe for the drivers sending lots of strings (such as syslog) - messages are concatenated into one buffer... Then you get into the problem of "how to represent the string".

sometimes the string is static - in which case, the length must also be static... sometimes the string is dynamic, possibly from format conversions... So now, it becomes harder to use generic buffers (not impossible, just harder). It also becomes necessary to know which is which.

The usual problem with strings is that you need two lengths - size of the storage allocated, and the length of the used storage for data... And a storage class (static, dynamic).

What gets worse is trying to handle strings in a generic manner - when you don't know the length ahead of time, how do you determine how much you need. And you need to ensure suitable alignment for the lengths, and flags (if any).

The usual problem is that handling strings that way is slow.
 
Old 06-23-2014, 03:18 AM   #6
lloydsu
LQ Newbie
 
Registered: Jun 2014
Posts: 3

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by jpollard View Post
It isn't just the kernel strings - those are fairly constant (MAJOR exception are the formatted strings, and the entire sysfs and procfs are loaded with those). There are also a ton of strings passed to the kernel by userspace applications as well (again, mostly through the sysfs and procfs).

The usual problem is knowing "how large a number to allow" for the number of bytes. An 8 bit count is totally insufficient. A 16 bit count would work... except maybe for the drivers sending lots of strings (such as syslog) - messages are concatenated into one buffer... Then you get into the problem of "how to represent the string".

sometimes the string is static - in which case, the length must also be static... sometimes the string is dynamic, possibly from format conversions... So now, it becomes harder to use generic buffers (not impossible, just harder). It also becomes necessary to know which is which.

The usual problem with strings is that you need two lengths - size of the storage allocated, and the length of the used storage for data... And a storage class (static, dynamic).

What gets worse is trying to handle strings in a generic manner - when you don't know the length ahead of time, how do you determine how much you need. And you need to ensure suitable alignment for the lengths, and flags (if any).

The usual problem is that handling strings that way is slow.
Thanks for the reply.
I think the size of storage is irrelevant, null terminal string has the same problem.
And for some cases, like strncat, if you know the length of both string, you can determine the length of new buffer faster.
Why you think handling string that way is slower? I expect it would be an faster because determine the length of string is very slow.
Let me take a look on sysfs and procfs

It is a good point about how many bytes are needed to keep the length, usually sizeof(int) should be enough?
Seems it does complicate things when try to use PAE, serialize it or transfer across different machine with different CPU word length.
It will need convert between null terminal (a buffer) and length prefix string back and forth sometime (a lot?).
A lot of enlightenment, thanks
 
Old 06-23-2014, 05:12 AM   #7
jpollard
Senior Member
 
Registered: Dec 2012
Location: Washington DC area
Distribution: Fedora, CentOS, Slackware
Posts: 4,912

Rep: Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513Reputation: 1513
Quote:
Originally Posted by lloydsu View Post
Thanks for the reply.
I think the size of storage is irrelevant, null terminal string has the same problem.
And for some cases, like strncat, if you know the length of both string, you can determine the length of new buffer faster.

Why you think handling string that way is slower? I expect it would be an faster because determine the length of string is very slow.
Look at the API for strncat -
Code:
char * strncat(char *dest, const char *src, size_t n);
The destination is modified.

Without knowing the storage available (by subtracting the used space of dest from the allocated space), you always have to copy both strings to new storage instead of just appending to the end of the existing used storage of a string.

The allocation is slow, as is having to copy the first string. The existing strncat will just copy the second string to the end of the first, for at most n bytes of src. The n is the amount of remaining unused bytes of dest. No allocation done, and determining the used bytes of src takes half the time of a copy.

Next, remember that dest is being modified, and the return value must be the new allocation - so using a fixed size buffer (one that is being updated) is now impossible, so code sequences like:
Code:
char buffer[1024];
...

buffer[0] = '\0';
strncat(buffer, str1, 1024);
...
strncat(buffer, str2, 1024-strlen(buffer))
can't be used as buffer cannot be modified. These now can't use a string (the buffer) that is just updated, so all the uses of strncat have to be modified as well as the library.

Doing counted strings is completely reasonable in user space, but in the kernel there is a LOT of buffer management of fixed size blocks (it reduced memory fragmentation). So reallocation (meaning that sometimes you have to deallocate dest, after allocating a new string) just to append two strings when the storage space in the first string is available would be frowned on.

Also remember - some buffers are on the stack... thus not dynamic, and can't be deallocated except by the containing function return (this is why I indicated having flags for the storage class; at a minimum you need "const", "static", "stack", and "heap").

The last architecture that I used with a string type (not character array) was the VAX. Strings there were always represented by a structure that included a pointer to the actual character array. Thus the character array could be replaced with a new one (a reallocation if the result was larger than the actual storage), or just updated (if there was enough room). Of course, constant strings couldn't have either operation applied to them (only copied). But it also meant that the structure had to be passed around, and sometimes it was "const", "static", "stack", or "heap" - thus the storage class of the header has to be known as well (and allocated sometimes). BTW, I don't think the kernel used the hardware supported strings... too slow due to the frequent reallocations required.

BTW, another place counted strings don't work is in functions like strtok where the input string is modified (a byte gets replaced by a null)...

Take a look at http://www.gnugeneration.com/mirrors...api/book1.html for the string functions used in the kernel. Then take a look at (a now dated version, but it should still give you an idea of how much added overhead counted strings would impose)
https://www.kernel.org/pub/linux/ker...4/lib/string.c

Quite frequently "strings" just point into other "strings"... This is trivial with character arrays, but non-trivial (actually, impossible to do) with counted strings. It is possible if there is a separate "string header" that can point to the storage. But now you have the added overhead of managing the string headers...

Quote:
Let me take a look on sysfs and procfs

It is a good point about how many bytes are needed to keep the length, usually sizeof(int) should be enough?
More than enough, and it would maintain a common alignment.
Quote:
Seems it does complicate things when try to use PAE, serialize it or transfer across different machine with different CPU word length.
It will need convert between null terminal (a buffer) and length prefix string back and forth sometime (a lot?).

A lot of enlightenment, thanks

Last edited by jpollard; 06-23-2014 at 05:32 AM.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] Apache Rewrite rules - how to make "not some string" rule SkyerSK Programming 4 01-26-2011 11:20 AM
How do I get the prefix length of an IPv6 address programatically in AIX? the_pilgrim AIX 1 09-30-2010 01:39 PM
Get string length in C ++ for a c string without trailing \0 nc3b Programming 10 12-28-2007 09:46 AM
Rewrite rule with query string in the pattern string basahkuyup Linux - Newbie 2 10-17-2006 02:06 AM
Things to make Linux work faster yelp666 Linux - General 4 04-19-2003 03:25 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Software > Linux - Kernel

All times are GMT -5. The time now is 03:40 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration