LinuxQuestions.org
Download your favorite Linux distribution at LQ ISO.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 04-27-2006, 10:17 PM   #1
true_atlantis
Member
 
Registered: Oct 2003
Distribution: fedora cor 5 x86_64
Posts: 639

Rep: Reputation: 30
c string input


how can i get a run time user input that gets the first word space terminating, and then the rest that the user typed... for example


COMMAND: ADD my name is mat

the program prints COMMAND: but i want to get ADD as the first string and the 'my name is mat' as the next string.... right now im using sscanf but that will just get ADD as the first string then 'my' as the second string. any ideas? thanks
 
Old 04-27-2006, 10:38 PM   #2
Vagrant
Member
 
Registered: Nov 2001
Posts: 75

Rep: Reputation: 15
Try scanf() and then gets().
 
Old 04-27-2006, 11:43 PM   #3
Wim Sturkenboom
Senior Member
 
Registered: Jan 2005
Location: Roodepoort, South Africa
Distribution: Ubuntu 12.04, Antix19.3
Posts: 3,794

Rep: Reputation: 282Reputation: 282Reputation: 282
Or use fgets (safer than gets) and next look for the first space in the entered string using strchr or strstr
 
Old 04-28-2006, 12:44 AM   #4
Dark_Helmet
Senior Member
 
Registered: Jan 2003
Posts: 2,786

Rep: Reputation: 374Reputation: 374Reputation: 374Reputation: 374
Regular old scanf() or any of its brethren will handle the problem. It's just a matter of tweaking the conversion string. By default, when the conversion string contains a %s, the string is assumed to end at the first whitespace character or a newline. You can work around that though. Try this:
Code:
char command[11];
char extra_input[101];

/* Completely zero-out the variables (not just set them equal to "" ) */
memset( command,     0,  11 * sizeof( char ) );
memset( extra_input, 0, 101 * sizeof( char ) );

...

printf( "COMMAND: " );
scanf( "%10s %100[^\n]s", command, extra_input );

printf( "Command entered: %s\n"
        "Additional data: %s\n", command, extra_input );
The width specifiers in scanf() prevent a buffer overrun. The only "gotcha" that may occur would be if the user enters a command of >10 characters. In that situation, extra_input will receive the "remaining" characters of the too-long command. But that's a user-input problem just like always. One method to avoid the problem would be to make sure the command string can store at least one more character than the longest command supported. After reading the input, check to see if the user entered a command that fills the whole string's space (i.e. command[10] != 0 ), and if that's the case, the user entered an invalid command - ask them to "please try again."

Last edited by Dark_Helmet; 04-28-2006 at 12:46 AM.
 
Old 04-28-2006, 08:47 AM   #5
ioerror
Member
 
Registered: Sep 2005
Location: Old Blighty
Distribution: Slackware, NetBSD
Posts: 536

Rep: Reputation: 34
NEVER, EVER, EVER use gets.

scanf isn't the ideal choice too for this either.

If this program will be used only on Linux (or rather, a GNU system), then use getline (a souped up fgets, which automatically reallocated memory as necessary (GNU extension)). Otherwise, use fgets as Wim Sturkenboom has already suggested. Using scanf ties you to a fixed string length, which may work but isn't very elegant and sooner or later someone will enter a line longer than you've allowed for and complain when it doesn't work. Program for robustness!
 
Old 04-28-2006, 09:45 AM   #6
bigearsbilly
Senior Member
 
Registered: Mar 2004
Location: england
Distribution: Mint, Armbian, NetBSD, Puppy, Raspbian
Posts: 3,515

Rep: Reputation: 239Reputation: 239Reputation: 239
yep, don't bother trying to be clever accepting input scanf-stylee.
you'll tie yourself in knots.

Much better to grab the entire line (fgets as above)

then play about with it afterwards.

(look at strtok to split it up maybe)

Last edited by bigearsbilly; 04-28-2006 at 09:46 AM.
 
Old 04-28-2006, 10:17 AM   #7
Dark_Helmet
Senior Member
 
Registered: Jan 2003
Posts: 2,786

Rep: Reputation: 374Reputation: 374Reputation: 374Reputation: 374
Quote:
Originally Posted by ioerror
Using scanf ties you to a fixed string length, which may work but isn't very elegant and sooner or later someone will enter a line longer than you've allowed for and complain when it doesn't work. Program for robustness!
Bash has a command line limit of 32k characters on my system (as stated through runs of configure scripts). I haven't heard anyone complain this isn't long enough, and I'm sure it's a fixed-length variable. Otherwise, why have the limit?

Robustness != accept all lenghs of user input

The user will be just as mad (if not moreso) if they were to enter a long string (that worked before) but ends up failing because the system happens to be using a lot of memory and the allocation for the string fails.

If it's a significant concern, then change the field specifiers. Make extra_input 32k in size, and put "%32767[^\n]s" in the conversion string. Then the extra input can be parsed just like anything else. And this conversion string isn't anymore difficult to understand than a normal shell-based wildcard string, and far less complicated than run-of-the-mill regular expressions.

I just don't get the aversion to using scanf(). Use whatever works, but it's not like scanf() is a carrier of bird flu or anything.

Last edited by Dark_Helmet; 04-28-2006 at 10:21 AM.
 
Old 04-28-2006, 12:01 PM   #8
ioerror
Member
 
Registered: Sep 2005
Location: Old Blighty
Distribution: Slackware, NetBSD
Posts: 536

Rep: Reputation: 34
Quote:
Bash has a command line limit of 32k characters on my system (as stated through runs of configure scripts). I haven't heard anyone complain this isn't long enough, and I'm sure it's a fixed-length variable. Otherwise, why have the limit?
Well, 32K is a little different to the 100 char example above.

And there is no size limit on piping data to a program.

Using fixed length buffers is insecure unless you are careful, and pointless anyway, buffers should be resized as needed, that's what realloc is for. Besides, just because your system is limited to 32K doesn't mean everyone else's is.

Quote:
Robustness != accept all lenghs of user input
If I want to pipe 100MB of input into a program it should be able to handle it (that doesn't mean it should do something useful with it, aborting with "input too large" is fine, but I don't expect it to crash).

Quote:
If it's a significant concern, then change the field specifiers. Make extra_input 32k in size
I don't consider a 32K fixed size buffer to be particularly elegant. And when someone wants to port it to a system with limited stack space, the'll need to modify it to e.g. allocate dynamically.

Quote:
I just don't get the aversion to using scanf(). Use whatever works, but it's not like scanf() is a carrier of bird flu or anything.
scanf is designed for _formatted_ input, hence the f. User input is not formatted, thus scanf is not the most suitable function for reading user input.

Last edited by ioerror; 04-28-2006 at 12:03 PM.
 
Old 04-28-2006, 01:02 PM   #9
Dark_Helmet
Senior Member
 
Registered: Jan 2003
Posts: 2,786

Rep: Reputation: 374Reputation: 374Reputation: 374Reputation: 374
Differences in philosophy. We'll have to disagree on this one. I believe any aspect of a program that interacts with the user must be of the utmost in predictability. Allocation (or reallocation) by its nature is not predictable.

Quote:
Originally Posted by ioerror
scanf is designed for _formatted_ input, hence the f. User input is not formatted, thus scanf is not the most suitable function for reading user input.
Not true. If the user expects to interact with the program, all data must be in a format the program is written to accept. The method of grabbing a glob of data without regard to format merely shifts the burden of formatting away from the user to the program. Instead of forcing the user to interact with the program in the expected way, the program tries to "figure out" what the user's input means. I've never been a big fan of that philosophy because I see it (or motivations similar to it) as the reason why HTML browsers are so huge. The browser takes a blob of data (the web page) and tries to render it even if basic HTML rules are ignored (like completely missing end tags). Which is one reason why XHTML is rigidly enforcing rules--otherwise the page doesn't get displayed. We'll have to see how strongly they stick to their guns on that one.

In this particular example, the program expects a command identifier (which it knows the maximum length of) and a supporting set of data (I'm guessing any string of characers to include in a linked list or something similar). So the format for the input is simply two strings. If the input were more complicated (such as sub-commands with arguments of their own) then I might not suggest scanf() even though it would still work with further format specifiers.

Quote:
Originally Posted by ioerror
Besides, just because your system is limited to 32K doesn't mean everyone else's is.
That's not the point I'm trying to make. My 32k adustment was in response to this:
Quote:
Originally Posted by ioerror
... sooner or later someone will enter a line longer than you've allowed for and complain when it doesn't work
The point I was trying to make is that shells appear to create a static input buffer. Your system may support more characters or less; it's still a statically sized buffer. There is a point, however, where the size of the buffer makes it sufficiently large enough to handle the vast majority of user input (which you appear to agree with when observing 32k is far different than 100). And therefore, also serves the predictability aspect. The user knows up-front what the maximum input they can use is, and the program will always accept input to that size.

Quote:
Originally Posted by ioerror
I don't consider a 32K fixed size buffer to be particularly elegant. And when someone wants to port it to a system with limited stack space, the'll need to modify it to e.g. allocate dynamically.
Not if coded correctly. A simple #define at the top would be all that's needed. The conversion string's width specifier could be adjusted with a call to sprintf() before it's used and reference the #define's value. That code would be no more complicated than a parser and would make "porting" to the next machine a simple matter of changing the #define's value. For instance:
Code:
#define MAX_COMMAND_SIZE       10
#define INPUT_BUFFER_SIZE    1024

...

char command[MAX_COMMAND_SIZE];
char extra_input[INPUT_BUFFER_SIZE];
char user_input_conversion[100]; /* arbirary size "large enough" to hold the string */

memset( command,     0, MAX_COMMAND_SIZE * sizeof( char ) );
memset( extra_input, 0, INPUT_BUFFER_SIZE * sizeof( char ) );

...

sprintf( user_input_conversion, "%%%ds %%%d[^\\n]s", MAX_COMMAND_SIZE - 1, INPUT_BUFFER_SIZE - 1 );

...

printf("COMMAND: ");
scanf( user_input_conversion, command, extra_input );
Quote:
Originally Posted by ioerror
If I want to pipe 100MB of input into a program it should be able to handle it (that doesn't mean it should do something useful with it, aborting with "input too large" is fine, but I don't expect it to crash)
I agree. And as mentioned before (input format), if the input does not follow the format the program expects, then it should print a similar error to the effect of "malformed input" or "improper data format" at the least (error reporting is a sore spot for me--brief, to the point of archaic, messages in error reporting is not a good thing, but I'm not going to go there).

EDIT: Added code example and corrected som spelling mistakes

Last edited by Dark_Helmet; 04-28-2006 at 01:39 PM.
 
Old 04-28-2006, 01:31 PM   #10
paulsm4
LQ Guru
 
Registered: Mar 2004
Distribution: SusE 8.2
Posts: 5,863
Blog Entries: 1

Rep: Reputation: Disabled
true_atlantis - please let us know what you finally wind up doing!

For whatever it's worth, here's my $0.02:

1. In this circumstance, I would perform exactly *one* I/O, get the *whole* string, and parse it from there.

2. I agree with ioerror that "gets()" is Evil.

3. On the other hand, "fgets()" is Good.
I would probably use it here.
Something like:
Code:
fgets (buff, BUFSIZE, stdin)
4. I don't think there's anything intrinsically wrong with "scanf" (or its sibling,
"sscanf"). I just don't think it's the best solution for this particular problem.

IMHO .. PSM

PS:
strtok() is handy - I use it frequently myself.
But beware - it, too, can be evil if there's any chance the string in question might be accessed by multiple threads...

Last edited by paulsm4; 04-28-2006 at 01:34 PM.
 
Old 04-28-2006, 01:58 PM   #11
ioerror
Member
 
Registered: Sep 2005
Location: Old Blighty
Distribution: Slackware, NetBSD
Posts: 536

Rep: Reputation: 34
Quote:
I believe any aspect of a program that interacts with the user must be of the utmost in predictability.
Quote:
If the user expects to interact with the program, all data must be in a format the program is written to accept.
This is practically impossible. Users are people and some people, through stupidity or just curiosity will supply some arbitrary lump of data. Assuming that users will always supply correctly formatted data is crazy.

Quote:
The method of grabbing a glob of data without regard to format merely shifts the burden of formatting away from the user to the program. Instead of forcing the user to interact with the program in the expected way, the program tries to "figure out" what the user's input means.
I'm not sure I follow this. Input data is just a bunch of bytes (e.g. a C source file). It's up to the program to determine whether it's correctly formatted or not. Programs will always have to parse the input to determine whether it is intelligible.

Quote:
In this particular example, the program expects a command identifier (which it knows the maximum length of) and a supporting set of data (I'm guessing any string of characers to include in a linked list or something similar). So the format for the input is simply two strings.
You're assuming that users are sensible and will always supply two strings. But what if I, say, pipe an mp3 file into the program? scanf will choke on that so you'd need some error handling to deal with it. Personally, I'd rather use a more general input method to start with.

Quote:
The point I was trying to make is that shells appear to create a static input buffer. Your system may support more characters or less; it's still a statically sized buffer. There is a point, however, where the size of the buffer makes it sufficiently large enough to handle the vast majority of user input (which you appear to agree with when observing 32k is far different than 100). And therefore, also serves the predictability aspect. The user knows up-front what the maximum input they can use is, and the program will always accept input to that size.
I see your point, and basically agree, but again, you're assuming that users will supply the correct input. My point was that sometimes they won't and the program should be able to deal with any input, correct, incorrect, too big, too small, and I didn't think that scanf was the best way to do that, though for relatively simple cases it may suffice. I still wouldn't use it though, but in this case it's more of a style, rather than a technical, choice.

Quote:
Originally Posted by ioerror
I don't consider a 32K fixed size buffer to be particularly elegant. And when someone wants to port it to a system with limited stack space, the'll need to modify it to e.g. allocate dynamically.
Duh! Didn't think that through properly, ignore me.

Quote:
Not if coded correctly. A simple #define at the top would be all that's needed. The conversion string's width specifier cold be adjusted with a call to sprintf() before it's used and reference the #define's value. That code would be no more complicated than a parser and would make "porting" to the next machine a simple matter of changing the #define's value.
Yeah, agree with that.
 
Old 04-28-2006, 02:05 PM   #12
ioerror
Member
 
Registered: Sep 2005
Location: Old Blighty
Distribution: Slackware, NetBSD
Posts: 536

Rep: Reputation: 34
Quote:
I don't think there's anything intrinsically wrong with "scanf" (or its sibling, "sscanf").
Sure, it's just that when I was first learning C, for some reason I developed a dislike for scanf (can't remember why) and I guess it kinda stuck.
 
Old 04-28-2006, 10:59 PM   #13
Dark_Helmet
Senior Member
 
Registered: Jan 2003
Posts: 2,786

Rep: Reputation: 374Reputation: 374Reputation: 374Reputation: 374
Sorry... this is long enough to qualify as a television mini-series.

I'll be the first to admit, when I first learned C, I thought scanf() was useless for strings for the very reason made by the OP. I was never taught that the conversion string had any bells and whistles for strings (i.e. you read in up to the first space and that was it).

By chance I decided to read the man page (because I had misplaced my reference book), and discovered all kinds of stuff. scanf() can actually do dynamic memory allocation for user-input strings, but to say that modifier ('a') is not portable is an understatement.

But back to the discussion, I'm not sure I follow the point about piping in an mp3. Unless I've overlooked something (which is entirely possible) the code I gave won't segfault. The conversion string for the command variable will read up to 9 characters for the first string. It may read less (if it encounters a space or instance), but it will not read more than 9. Then the 10th array location is a NULL from memset() which means it's not possible to read beyond the character array when it's displayed. Similarly, the conversion string for extra_input will read in a maximum of 1023 characters (maybe less - only if a newline is encountered) with a NULL in the 1024th location. So there's no danger of overflow or reading beyond the arrays' limits. So piping in an mp3 shouldn't cause a segfault.

About the "predictability" philosophy I mentioned earlier, let me try to approach it from a different perspective. When you walk up to a convenience store and open the door, you expect someone will be at the counter to handle your purchase, right? Otherwise, the door should be locked. Similarly with a program, if the user sees a prompt for input, there's a certain level of expectation associated with it. The user will get frustrated if during one run he's able to input a string of 1023 characters but another run he can only enter 52 (from memory usage beyond his/her control). By using a static array of characters, the program won't even start unless there's enough memory available to accommodate the request (i.e. the "door" will be locked). If the program starts up, then the program will always be able to accept the same length of input.

To extend the convenience store analogy a little further, the fact that the door is open and a clerk is behind the counter says noting of the inventory available. The clerk will be able to handle your payment, but there may be nothing available of interest in the store. Similarly, a program employing a static character array for user input does not necessarily mean the system has enough resources available to handle the request. However, it will be able to accomodate the request and explain why the request failed.

Basically what I'm getting at is "the user is king." The program needs to accommodate the user's requests for interaction. If the user is presented with a prompt, the program needs to guarantee it can accept input it accepted in any previous execution (barring revision changes or the like). If the application has a GUI, the GUI needs to run in a separate thread/process to respond to user clicks (none of this frozen-while-processing stuff). If the user is impatient and wants to quit, give them a button to cancel that sends an appropriate signal to kill the other process/thread. Predictable, responsive interaction between the program and user is paramount.

Of course, the user has some responsibility too: to provide data as the program expects (or at least expect to get yelled at by the program if they fail to do so). Sure, it's a good idea to program in anticipation of user stupidity, but there's a limit. For instance, virtually every (if not all) commands executed in the shell expect this format: command [options] [arguments]. The user can't go swapping around the individual pieces and expect meaningful results. The user must abide by some restrictions as to formatting input. To use the convenience store analogy again, to complete the purchase, the customer needs to speak the same language as the clerk. The customer can't expect the clerk to know all languages or to have a translation book for each under the counter.

If a user wants to pipe an mp3 into the program, it should be handled gracefully, but only to the point of telling the user "You're speaking gibberish. Have a nice day. Goodbye."
 
Old 04-29-2006, 02:51 AM   #14
ioerror
Member
 
Registered: Sep 2005
Location: Old Blighty
Distribution: Slackware, NetBSD
Posts: 536

Rep: Reputation: 34
Quote:
By chance I decided to read the man page (because I had misplaced my reference book), and discovered all kinds of stuff.
Something I should have done as well, it does appear to be more flexible than I had previously thought.

Quote:
But back to the discussion, I'm not sure I follow the point about piping in an mp3. Unless I've overlooked something (which is entirely possible) the code I gave won't segfault.
I didn't mean that it would necessarily segfault, just that it may not handle the data properly. For example, with your example above, if a user types 11 characters in the first string, only ten would be read and, this is my point, the eleventh would be the first char of the second string, which is not what the user intended. At least, that's how scanf works on my system. The man page states that %s matches a sequence of non-whitespace characters. But, I typed 'abcdefghijk 123456789' into your example above and I got:

Quote:
Command entered: abcdefghij
Additional data: k 123456789
which doesn't make sens to me, it certainly is what was intended. Why is the space after the k part of the string? Surely it should stop there, from the man page: "The input string stops at white space or at the maximum field width, whichever occurs first." The result seems to contradict the man page.

Quote:
Similarly with a program, if the user sees a prompt for input, there's a certain level of expectation associated with it. The user will get frustrated if during one run he's able to input a string of 1023 characters but another run he can only enter 52 (from memory usage beyond his/her control). By using a static array of characters, the program won't even start unless there's enough memory available to accommodate the request (i.e. the "door" will be locked). If the program starts up, then the program will always be able to accept the same length of input.
I see what you're saying, I agree with that, all I meant was that the program should be able (or at least it should try) to allocate more memory if the input is larger than it initially expects. But yeah, it should have a default amount of static space to provide a minimum level of input.

Quote:
Basically what I'm getting at is "the user is king." The program needs to accommodate the user's requests for interaction.
Absolutely, I think I misunderstood you at first.

Quote:
Of course, the user has some responsibility too: to provide data as the program expects (or at least expect to get yelled at by the program if they fail to do so). Sure, it's a good idea to program in anticipation of user stupidity, but there's a limit.
Indeed. But the program must be flexible enough to determine whether the input is valid or not. I just find scanf to be a rather unintuitive function (as demonstrated above), so it's not always the most suitable function to use.

BTW, I didn't pick up on this before, but:

Quote:
Bash has a command line limit of 32k characters on my system (as stated through runs of configure scripts).
Do you mean the total lenght of the command line arguments? I thought this limit was imposed by the kernel. In any case, that is not quite the same as general user input (though of course, that doesn't alter the crux of your argument).

But anyway, I think we basically agree, more or less, though we differ on some points of implementation, though I would disagree with this:

Quote:
The user will get frustrated if during one run he's able to input a string of 1023 characters but another run he can only enter 52 (from memory usage beyond his/her control).
If the system is low on memory, then the user must be aware that the program could fail. I don't see anything wrong with that (it is an exceptional condition, after all). You can't use static buffers for everything.

Last edited by ioerror; 04-29-2006 at 11:52 AM.
 
Old 04-29-2006, 11:33 AM   #15
exvor
Senior Member
 
Registered: Jul 2004
Location: Phoenix, Arizona
Distribution: Gentoo, LFS, Debian,Ubuntu
Posts: 1,537

Rep: Reputation: 87
In this simple program shows that scanf does what it says in the man page.


Code:
 
#include<stdio.h> 
int main()
{ 
   char first[20]; 
   char second[20]; 

  printf(":>"); 
  scanf("%s",first); 
  scanf("%s",second); 

  
  printf("\n%s\t%s",first,second); 

  return 0; 

}

output is
Code:
 
user@calipso:~/./a.out 
:> abcdefghijk 123456789
abcdefghijk        123456789user@calipso:~/

as you can see scanf did not mush anything but did stop at the space. the rest of the information that was left in the stdin buffer is then grabbed by the second call to scanf.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
C++ string input truncation craigs1987 Programming 2 07-29-2005 12:49 PM
flex input from string sibtay Programming 0 12-21-2004 06:14 AM
string input tekmorph Programming 3 10-25-2004 08:03 PM
parsing a user input string daphne19 Programming 1 04-22-2004 07:40 AM
Need to split an input string general4172 Linux - Software 6 10-29-2003 11:57 PM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 12:13 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration