LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (https://www.linuxquestions.org/questions/programming-9/)
-   -   Ruby: Matching for a delimiter (https://www.linuxquestions.org/questions/programming-9/ruby-matching-for-a-delimiter-4175496338/)

Lucien Lachance 02-26-2014 05:05 PM

Ruby: Matching for a delimiter
 
I'm writing a coding kata called the String Calculator. You can view more of the specs here: http://osherove.com/tdd-kata-1/ So far, I've come up with this. I would like to know how I can write a regex pattern that will split the string based on any delimiter. Here's an example:

# When given add('1') => 1

# When given add('1,2') => 2

# When given add("//;\n1;2") => 3

Giving it the pattern:
Code:

input.split(/[,\n]|[^0-9]/)
works as expected, however it splits "//;\n1;2" like this

=> ["", "", "", "", "1", "2"]

How can I remove those empty characters from being matched?

Code:

def self.add(input)
  solution = input.split(/[,\n]/).map(&:to_i).reduce(0, :+)
  input.match(/\n$/) ? nil : solution
end


grail 02-26-2014 07:28 PM

Well I am not sure which version of ruby you are using, but on mine (2.1.0p0), your split code throws an error:
Code:

$ ruby -e 'input="//;\n1;2"; print input.split([,\n]|[^0-9])'
-e:1: syntax error, unexpected ',', expecting ']'
input="//;\n1;2"; print input.split([,\n]|[^0-9])

I am able to overcome the error by using the following split:
Code:

input.split(/[,\n]|[^0-9]/)
Now that I have the same output, my suggestion would be to use 'delete_if' as seen here

Of course it could just be you are using the wrong method. Why not try:
Code:

input.scan(/[0-9]+/)
I threw in the '+' in case you have multi-digit values passed to you, ie 45

Lucien Lachance 02-26-2014 07:43 PM

Sorry about that, I'm running on 2.0.0, I fixed the error. I originally thought of using scan, but felt like that wasn't really solving the problem of reading between numbers and delimiters as the kata suggests. Scan works wonders, but do you think using it in this case (based on the spec) is ideal?

Also, I could say:
Code:

solution = input.scan(/\d+/).map(&:to_i).reduce(0, :+)
So a refactor might be:

Code:

def self.add(input)
  solution = input.scan(/\d+/).map(&:to_i).reduce(0, :+)
  input.end_with?("\n") ? nil : solution
end

but!

what happens if I feed add with
Code:

add("//1\n112")
Whereas '1' is the delimiter

grail 02-26-2014 09:29 PM

Now you are losing me a little bit :(

1. Why are you allowing the user (I assume this is person entering details) to enter gibberish or non-numerical input?

2. Your last example makes no sense, because if the delimiter is '1', this becomes nothing plus 12??

3. I also don't understand the following:
Code:

input.end_with?("\n") ? nil : solution
None of your examples have a newline at the end of the 'input' string plus the scan or split removes this so a solution can be provided. To me this seems a counter intuitive test??

Lucien Lachance 02-26-2014 09:52 PM

Okay, let me explain. For the string calculator kata, the following input is not allowed. (these are just the rules) http://osherove.com/tdd-kata-1/

Code:

StringCalculator.add("1,\n") # Should return nil
If any string ends in a newline this is considered invalid, so I return nil.

Now, with the delimiter situation. A string calculator should do the following:

Code:

  StringCalculator.add('1') # Should return 1
  StringCalculator.add('1,2') # Should return 3
  StringCalculator.add("//;\n1;2") # Should return 3 and accepts a delimiter of ';'
  StringCalculator.add("//+\n2+2") # Should return 4 and accepts a delimiter of '+'

However, what if the position of input[2, 1] is a number. This is incorrect, because this is supposed to be a delimiter, not a numerical value. Hope I explained this better. What do you think the ideal solution is to handle this problem?

[CODE]

grail 02-27-2014 12:22 AM

Ok, so I have had a look at the exercise and I am guessing you are up to:
Code:

4. Support different delimiters
  1. to change a delimiter, the beginning of the string will contain a separate line that looks like this:  “//[delimiter]\n[numbers…]” for example “//;\n1;2” should return three where the default delimiter is ‘;’ .

This being the case I would say that none of the current solutions are correct. You should be specifically looking for the pattern mentioned above, ie. '//[delim]\n', once found you need to extract and use this delimiter on the rest of the string. So you need to step back a bit and solve in the order mentioned as both current methods are ignoring the fact that you have been supplied
a new delimiter to use.

As for:
Quote:

However, what if the position of input[2, 1] is a number.
I am sorry but this is still confusing and does not appear to make sense to me :(
You may need to provide an example data and highlight what you are using for the delimiter?
If I do partially understand, I would again mention that supplying a number as a delimiter would seem to be a method you cannot accurately test for as all digits of the same value will be considered
a delimiter and hence removed, which may give you the incorrect total value expected.

Lastly, as someone who does testing in their own job, I find the following statement from the page to be bad advice:
Quote:

Make sure you only test for correct inputs. there is no need to test for invalid inputs for this kata
I always like to test for invalid input as it lets me know what needs to be shored up against a user who doesn't know how to use my software (which generally is everyone else but you).
Just my 2 cents

Lucien Lachance 02-27-2014 12:31 AM

That's been the hardest part. This damn delimiter matching! I'm not the best with regex, unfortunately. There's lots of ways to solve this, my first try I generated a string of supported character and checked to see if it matched a string beginning with '//'. Could you help get me started a bit, I appreciate the help. Seriously. In the mean time, take a look at this method I've come up with.

Something not yet implemented:
Code:

def supported_delimiters
  [*(33..46), *(58..64)].map(&:chr).join
end

Here's my production code:
Code:

module StringCalculator
  def self.add(input)
    validate_negatives(input)
    solution = input.scan(/\d+/).map(&:to_i).reduce(0, :+)                         
    input.end_with?("\n") ? nil : solution                                         
  end                                                                             
 
  def self.validate_negatives(input)
    negatives = input.scan(/-\d+/)
    fail "negatives not allowed: #{negatives.join(', ')}" if negatives.any?                                                                                             
  end                                                                             
   
  private_class_method :validate_negatives
end


grail 02-27-2014 07:46 AM

Like most of us I feel you have tried to jump ahead and so are missing the point of the lesson.

At each iteration of item(s) you add to your solution you should be testing it against data to see if it can handle the solution.
Now although not mentioned directly, it has been implied that the initial delimiter is to be a comma. So you need to backup and create a delimiter item / variable that has a default
but later may get changed.

Also, if we cut back your current solution to:
Code:

module StringCalculator
  def self.add(input)
    solution = input.to_i
    input.end_with?("\n") ? nil : solution                                         
  end
end

With this basic example, when the input ends with "\n" it will return 'nil', when looking at the advice you should probably return an error message that this is unacceptable input.

So really you need to actually go back prior to this and either use scan or split with just a comma as delimiter.
If you do use scan you will need to change to something like (for the basic 0,1,2 number solution):
Code:

delimiter = ","
solution = input.scan(/[^#{delimiter}]*/).map(&:to_i).reduce(0, :+) # or change scan to split(delimiter)

Then when accepting a new delimiter you can add a test in between to extract the new delimiter and assign to variable. Here the catch would be that you would then need to remove the
new delimiter definition from your original string prior to summing the data

Hope that helps

Lucien Lachance 02-27-2014 09:06 AM

I've tested those conditions initially with split(','). It's difficult to include every detail, and I realize that now. I'll re-iterate over this and break the tests one by one. One last thing. I'm raising an error for negatives (which the problem states), would it still be in the principle of single responsibility if I raised the error if the string ends in a newline in the function add? Or, should that be done else where?

Also, if you're interested, you can see my tests here: https://github.com/deidora/orlando_d...g_calc_spec.rb

grail 02-27-2014 10:34 AM

Well I did watch the video from Corey and whilst I followed his method it seemed a little unusual as he defined the module in such a way that it would need to be extended / added to String
for it to work.

Based on that, it turns out your original solution using split was very similar and once the items in the array have been mapped with to_i the empty cells simply default to 0.

As for raising errors, again the video simply ignored the fact by converting all newlines to the delimiter which means no string will ever end in a newline.
If you were to raise this type of error I would suggest it would need to be the very test applied to the input data.

Lucien Lachance 02-27-2014 11:33 AM

Took me quite some time to clean up everything, but I've taken care of what you said about handling an error if the string ends in a newline. I think this is as clean as I can get it. I still have yet to solve the delimiter issue. I left that for the final refactor because I know this will require some thinking time on my part. More often that not, delimiter by itself will be a method because of its depth. I'm pretty happy with this approach and your advice given. I think what Corey was aiming for in the video was to not have to worry about invalid input because if you're trying to do the simplest thing possible, handling for invalid input sidetracks you away from the task. Just my .02. I agree though, I always test for invalid cases at work as well.

https://github.com/deidora/orlando_d...string_calc.rb

grail 02-27-2014 08:03 PM

As an option / idea for the delimiter issue:

1. Check the string starts with '//' # If you wanted to be complete, you should actually check that the string starts with - - '//any_character\n' - - the reason is this is part of the description
of how a new delimiter is defined, so the user could enter - - '//blah,1,2,3' - - there is no new delimiter here as no single character followed by newline

2. Extract the character immediately after '//' into delimiter (variable or returned from method)

3.
a. Replace all up to first newline with nothing in original string
b. Split based on new delimiter and let to_i handle none digit characters (ie set to 0)

Lucien Lachance 02-27-2014 09:49 PM

Code:

module StringCalculator
  def self.add(string)
    string.end_with?("\n") ? fail('ends in newline') : solve(string)
  end

  def self.solve(string)
    verify(string)
    custom = delimiter(string)
    numbers = string.gsub(/\n/, custom).split(custom).map(&:to_i)
    numbers.reject { |n| n > 1000 }.reduce(0, :+)
  end

  def self.delimiter(string)
    string.match(%r{^//}) ? string[2, 1] : ','
  end

  def self.verify(string)
    find = string.scan(/-\d+/)
    fail("negatives not allowed: #{find.join(', ')}") if find.any?
  end

  private_class_method :verify, :delimiter, :solve
end

Something like this, right? This does the job, but I probably need to raise an error when string[2, 1] is not a number.

grail 02-27-2014 10:20 PM

Yeah not bad, although maybe we could have thought a little simpler:
Code:

def self.delimiter(string)
    string[0, 2] == '//' ? string[2, 1] : ','
end


Lucien Lachance 02-27-2014 10:42 PM

I did feel a little uncomfortable using #match in this instance, but I do see how that's much more readable. Thanks for all the help. It's almost as if we were pair programming the in the same room, haha.


All times are GMT -5. The time now is 05:47 PM.