Parsing strings in Lisp

indienick · 12-15-2006, 01:16 PM

(NOTE: This is not a homework question - I'm not even in school!)

I'm writing a program in Lisp that takes a string of commands from the user and interprets them. Here's the problem: I would like to break up the string, and store each portion into a list. What separates each element in the string is a whitespace (of course). I've taken several approaches to try and get my desired result, but to no avail.

Method #1:
Scan through the string, and collect all letters from the beginning of the string to the first whitespace, store the characters, and replace them in the main string, with null characters.
This falls apart after the first whitespace. Because the null characters, although they don't display, are still taking up space in the string. So, all I would see is " yadda yadda yadda". That space you see isn't the first element of the string. There are 5 #\null characters that are taking the place of the first "yadda" - assuming there were 4 yadda's passed.

Method #2:
Instead of dealing with a string, deal with the individual characters.

Code:

(let ((command (prompt "> ")) ;Prompt for the command
      (cmd (loop for c across command collecting c)) ;Break up the command char by char
      (buf nil) ;buffer variable
      (cmdbuf nil)) ;command buffer variable
  (dotimes (i (length cmd)) ;iterate as many times as the length of the cmd list of chars
    (if (not (eql (elt cmd i) #\Space)) ;if the current list element isn't whitespace
        (push (elt cmd i) buf)) ;add it to the buffer variable
    (if (eql (elt cmd i) #\Space) ;if the current element IS a whitespace
        #'(lambda ()
            (nreverse buf) ;reverse the elements in the list (because push adds a new element to the front of the list
            (push buf cmdbuf) ;push the reversed buffer variable to the command buffer variable
            (setf buf nil))))) ;null out the buffer variable

This just gives me complete jibberish stored in a list.

I don't know what else I could try.

EDIT: Even if you don't know Lisp, could you possibly suggest some other algorithm I could try?

taylor_venable · 12-15-2006, 03:37 PM

Quote:

Originally Posted by indienick

Scan through the string, and collect all letters from the beginning of the string to the first whitespace, store the characters, and replace them in the main string, with null characters.

Destroying data structures makes me sad.

I used recursion instead.

Code:

(defun helper (input-string)
  (let ((start (position #\   input-string)))
    (if start
        (let ((end (position #\  input-string :start (+ start 1))))
          (if end
              (cons (subseq input-string (+ start 1) end)
                    (split-by-space (subseq input-string end)))
            (list (subseq input-string (+ start 1)))))
      nil)))

(defun split-by-space (input-string)
  (let ((start (position #\  input-string)))
    (if (and start (> start 0))
        (remove-if (lambda (s) (string-equal s "")) (helper (concatenate 'string " " input-string)))
      (remove-if (lambda (s) (string-equal s "")) (helper input-string)))))

Yeah, kind of a crappy way to do it, I know. I was going to rewrite it into one elegant function, but I also wanted to answer you quickly. The problem is that HELPER needs whitespace at the beginning of the string to parse it correctly, so SPLIT-BY-SPACE will add it where necessary. Also, SPLIT-BY-SPACE will take care to remove any extra empty strings (caused by multiple continuous occurrences of spaces in the string) that may occur in the resulting list.

EDIT: Fix type error in (> start 0) when sending SPLIT-BY-SPACE an empty string.

indienick · 12-15-2006, 05:42 PM

Thanks alot for the reply, taylor_venable.

I wasn't expecting anyone to reply, let alone with an actual code example. When I get home (from work), I'll definitely give the code a shot, and see if I can roll it all into one nice function.
I'll post the results tomorrow.

sundialsvcs · 12-16-2006, 06:45 PM

Please do.

(exists (in world) (many-of us) (who (and (actually understand parentheses) (enjoy them)))) !

indienick · 12-16-2006, 10:17 PM

I haven't had a chance to try that code out yet, but I'm definitely going to send props out to both of you in the documentation for this project of mine, and I'll more than certainly post my results when I finally get a chance to try the code out.

The project I'm working on is - probably "stupid" or has already been done, but I consider it worth my time nonetheless - a shell for dynamic networking configuration. I call it Wish (Wireless Shell). It's pretty much been a project that I'm doing for my friend (and as a good excuse to flex my Lisp muscle) who had ALOT of problems keeping his wireless settings active (they'd be released and reset upon every suspend/halt/reboot). Pretty much, the shell takes up one of the TTYs (I'm aiming for TTY6), and it just sits there waiting for commands. It allows you to modify networking settings on the fly, and will re-update your connections with a magical command. Mainly, the big plus I can see for it, is modifying network settings on the fly, through a simple interface, with simple command structures (thus the reason why I need to parse strings).

indienick · 12-20-2006, 01:07 PM

I'm sorry it took so long for me to get back about the code sample, but it most definitely worked!

Thanks alot taylor_venable!

I'll tinker with the code a bit, and see if I can roll it into one function.

tuxdev · 12-20-2006, 01:24 PM

BTW, wish is already taken by TCL, IIRC. I haven't ever heard of a nesh or nsh or netsh.

sundialsvcs · 12-20-2006, 07:21 PM

String-parsing is definitely one of those exercises that will consume every ounce of programmer creativity that you throw at it.

This despite the fact that the total amount of time that the hardware spends doing it is less than a few microseconds even on a very bad day.

"If it works, FISI." (FISI = "

It, Ship It")

(Hey, the fact that you succeeded in doing string-parsing in Lisp at all oughta win you some kind of prize...

)

indienick · 12-21-2006, 11:00 AM

tuxdev:
Thanks for that.

I can't help but think that I've heard "netsh" thrown around somewhere before. But, I definitely like the sound of "nesh." All else fails, I'll take a Ukrainian word for something, and anglicize it.

sundialsvcs:
hahahahahahaha. I don't get, why Lisp, of all languages, has such poor string-parsing. What I did for the previous algorithms I tried, was take some of my old QBASIC string parsers (from high school programming class), and translate them to Lisp (with some modifications).

taylor_venable:
Again, thank you very much for the code, and taking the time to write aforementioned code. I have one more thing to ask of you, if you could explain some of your code to me. I've only been doing Lisp for about 2 months, so I'm still a little hazy on the logical structures. I get lost from "(if end " down.

makyo · 12-21-2006, 11:34 AM

Hi.

Quote:

Originally Posted by sundialsvcs

(exists (in world) (many-of us) (who (and (actually understand parentheses) (enjoy them)))) !

I suppose it goes without saying that:

Code:

(writes (in lisp) (yoda) (does))

Or to that effect, words

... cheers, makyo

indienick · 12-21-2006, 11:41 AM

hee hee

taylor_venable · 12-21-2006, 12:23 PM

indienick:

I've been out for a couple days and not looked over this in about a week, so I apologize for the delayed response. But I've now found some bugs in the code I originally provided, so I rewrote it to work a lot better. First let me try to answer your question, though, and explain how it works.

The function builds the list of parsed strings recursively from the end back to the beginning. The (if end ...) part looks to find if we've found another space ahead. If so, we know there are possibly other strings ahead to be parsed out, and we CONS the string we just found (SUBSEQ gives you a subset of a sequence -- in this case we use start and end to extract the string) onto the result of parsing the rest of the input. If there aren't any more spaces ahead, we presume that we've arrived at the last string, which we use to form a brand new list, and return it. (As this list is returned from the recursion, previous strings get added to the beginning, eventually forming the entire list of parsed strings.)

Unfortunately, these functions have some bugs. One is that it fails (throws an error) when passed the null string; I think it should return NIL instead. Another is that it stores up unnecessary null strings in the list when multiple spaces occur side-by-side in the input string; these must be removed later, which is inefficient. So to that end, I tried to write a clearer, less buggy version. This uses two mutually-recursive functions -- one to handle the start of a parsed string, and one to handle the end of a parsed string. There is no initial setup required and no cleanup to do afterwards, which is what was necessary in the previous code I wrote. So without further ado, here it is:

Code:

(defun split-by-space-end (input-string start)
  (let ((end (position #\  input-string :start (+ start 1))))
    (if (null end)
        (list (subseq input-string (+ start 1)))
      (if (= end (+ start 1))
          (if (= start (- (length input-string) 2))
              nil
            (split-by-space (subseq input-string end)))
        (cons (subseq input-string (+ start 1) end)
              (split-by-space (subseq input-string end)))))))

(defun split-by-space (input-string)
  (if (string-equal input-string "")
      nil
    (let ((start (position #\  input-string)))
      (if (null start)
          (list input-string)
        (if (eq (char input-string 0) #\ )
            (split-by-space-end input-string start)
          (split-by-space-end input-string -1))))))

It's a bit longer, but should be a little easier to understand. It's still limited to using the space as the delimiter, but you could fix that pretty easily by extending the function to accept a string of possible delimiters instead. Then you've basically got the Lisp version of strtok and StringTokenizer! Anyway, I've got some preliminary code that'll handle that; I'll post it back here if you like.

By the way, you need to invoke SPLIT-BY-SPACE here. Here's some examples:

Code:

[8]> (split-by-space "alpha beta gamma  ")
("alpha" "beta" "gamma")
[9]> (split-by-space " alpha")
("alpha")
[10]> (split-by-space "")
NIL
[11]> (split-by-space "                ")
NIL
[12]> (split-by-space "       alpha         beta         gamma            ")
("alpha" "beta" "gamma")
[13]> (split-by-space "   alpha   ")
("alpha")
[14]> (split-by-space "alpha beta gamma")
("alpha" "beta" "gamma")