[SOLVED] Script to print repeated values separated by line break
ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
But is not printing anything if I put the full path:
Code:
Dir.glob("C:\XMLs\latest\*.xml").each do |file|
My script
end
I try in cygwin and ruby for windows.
I expect you need to escape the backslashes, as in C:\\XMLs\\latest\\*.xml. If you installed ruby via cygwin you may need to use cygwin paths: /cygdrive/c/XMLs/latest/*.xml
Thanks for your answers. I've tried escaping the backslashes but didn't work. I've tried with full path for Cygwin
like you said and worked and for Ruby on Windows I had to change from "C:\XMLs\latest\*.xml" to "C:/XMLs/latest/*.xml"
like in Linux and works in that way.
The last code I have is below, I only would like to know how to compress the code for the parts of code possible, for example the long "puts command" at the end in red. Or if it is possible to include in a single array the 3 hashes in red (mr, pk, m3).
The other thing I've tried to reduce without success is replace the path with a.elements.each(...) with a variable
like this a.elements.each(Var << "NA"), but I don't know why the output changes when I do that.
for grail the output is correct when does that.
So I had a little play and this may or may not be what you need but it goes towards explaining what I meant when I said, "somehow use the value as a reference in the hash":
Code:
#!/usr/bin/env ruby
require 'rexml/document'
include REXML
# Format the hash data as required
# Currently there is no checking to see if a field may not exist
# that is in the header
def print_data(hsh)
hsh.each do |key,value|
if value.is_a?(Hash)
value.each_value{ |v| print '"' + v.join(",") + '"|' }
else
print value + (key =~ /^MXB/?"":"|")
end
end
puts
end
# Recurse down through all nodes / elements within the xml tree
# and store the values in a hash
def recurse(element, hsh = {})
element.elements.each do |child|
if child.name == 'SReport' && ! hsh.empty?
print_data(hsh)
hsh.clear
end
pk_finished = child.name =~ /^MX/?true:false
hsh[:mr] = {} if child.name == 'MR_NRanges'
hsh[:pk] = {} if child.name == 'PK_NRanges'
if child.has_text? && child.text =~ /^[^[:space:]]+$/
if pk_finished || ! hsh[:mr].is_a?(Hash)
hsh[child.name] = child.text
else
if hsh[:pk].is_a?(Hash)
hsh[:pk][child.name] ||= []
hsh[:pk][child.name] << child.text
else
hsh[:mr][child.name] ||= []
hsh[:mr][child.name] << child.text
end
end
end
recurse(child,hsh) if child.has_elements?
end
hsh
end
xmldoc = Document.new File.new("input_1.xml")
puts "RepName|RepIn|ReportType|Date|NA|NRB|SubRangeB|SubRangeE|NA|NRB|SubRangeB|SubRangeE|MXA|MXB"
print_data(recurse(xmldoc))
I did not include the option for looping but I am sure you can add that in
Let me know if any of it needs some clarification?
If we are lucky, one of the gurus like kurumi may jump in and show better ways to do some of the above
Just great, it seems to work and I'm trying to understand the way you did it.
I've tried to modify your last code in orderto avoid print all hashes surrounded with double quotes,
since only is mandatory when hash has more than one element. For example I need in the output as below.
From Column 5 to 14, if hash has one element don't surround with double quotes, but if has more than one element
surround the values with "".
For example,
- column 5(NA), in line 1, has 3 values, then should o surrounded in the output, like this ...|"763,358,852"|...
- column 5(NA), in line 2, has 1 value, then shouldn't be printed surrounded with "", like this ...|256|...
I'm trying yo understand your code in order to add more nodes to the extraction of values. For example, may you
explaing me a little bit the lines in red.
Code:
def print_data(hsh)
hsh.each do |key,value|
if value.is_a?(Hash)
value.each_value{ |v| print '"' + v.join(",") + '"|' }
else
print value + (key =~ /^MXB/?"":"|") Is this supposed to check the last node to print?
end
end
puts
end
# Recurse down through all nodes / elements within the xml tree
# and store the values in a hash
def recurse(element, hsh = {})
element.elements.each do |child|
if child.name == 'SReport' && ! hsh.empty?
print_data(hsh)
hsh.clear
end
pk_finished = child.name =~ /^MX/?true:false what is this line for? and instead of MXA and MXB I have MXA and MZV how would change? I've tried instead of /^MX/, I've tried /^MXA/||/^MZV/. Is this correct?
hsh[:mr] = {} if child.name == 'MR_NRanges'
hsh[:pk] = {} if child.name == 'PK_NRanges' Can I add more nodes here to extract more values?
if child.has_text? && child.text =~ /^[^[:space:]]+$/ This regex matches the closure of Top node?
if pk_finished || ! hsh[:mr].is_a?(Hash)
hsh[child.name] = child.text
else
if hsh[:pk].is_a?(Hash)
hsh[:pk][child.name] ||= []
hsh[:pk][child.name] << child.text what do these 2 lines mean?
else
hsh[:mr][child.name] ||= []
hsh[:mr][child.name] << child.text
end
end
end
recurse(child,hsh) if child.has_elements?
end
hsh
end
print value + (key =~ /^MXB/?"":"|") Is this supposed to check the last node to print?
Yes. This assumes an element name starting with MXB will be last and hence will not need a pipe (|) after it
Code:
pk_finished = child.name =~ /^MX/?true:false what is this line for? and instead of MXA and MXB I have MXA and MZV how would change? I've tried instead of /^MX/, I've tried /^MXA/||/^MZV/. Is this correct?
This identifies when we are out of the PK_NRanges so the following elements are not to be part of that hash nor require arrays to be created.
As for a change, you only need what will be the next element name after the pk section, so if MZV is not going to be next, ie MXA will be before it, then you do not need to change anything.
If on the other hand you are not sure which will appear first then the change would be:
Code:
/^M(X|Z)/
You would probably need to check that theses do not possibly appear elsewhere is in the data as it will cause issues.
Code:
hsh[:pk] = {} if child.name == 'PK_NRanges' Can I add more nodes here to extract more values?
I am not sure what you mean here? Will there be new sections where you need to append data, like the NA ranges?
Code:
if child.has_text? && child.text =~ /^[^[:space:]]+$/ This regex matches the closure of Top node?
This has nothing to do with nodes per say, it is checking if an element contains text and if so, does it also only contains data that is not white space.
Code:
<NA>731</NA> #valid as text is not whitespace
<SReport>
<blah> #the whitespace prior to <blah> is returned as the text part of that element, so not what we wanted
Code:
hsh[:pk][child.name] ||= []
hsh[:pk][child.name] << child.text what do these 2 lines mean?
The first checks to see if this value has been initialised to an array and if not then set it to an array type
The second then appends our data to the array. If you try to append prior to initialising ruby does not know its type and hence it will be from the Nil class which does not
have an append option.
I've been trying to modify your last code in order to handle a slightly different XML input, but showing different output.
I was wondering if your last code may be forced to extract only values of desired nodes, since the current code prints different ouput
if the XML input has some other nodes before <SReport> and after <MAX03_NRanges>. Those other nodes are not of interest but it seems the code is printin values of others nodes.
Besides that if I have another node similar to MR_NRanges and PK_NRanges, that is YU_NRanges that goes after PK_NRanges, how must be changed the code?
I've trying add another line as below in red, but is not working.
Code:
hsh[:mr] = {} if child.name == 'MR_NRanges'
hsh[:pk] = {} if child.name == 'PK_NRanges'
hsh[:pk] = {} if child.name == 'YU_NRanges'
Below I put the slightly different input for reference. The nodes I wamt to print is the same but adding YU_NRanges too.
---------- Post added 04-14-14 at 04:20 AM ----------
Hello grail,
Thanks for explanation.
I've been trying to modify your last code in order to handle a slightly different XML input, but showing different output.
I was wondering if your last code may be forced to extract only values of desired nodes, since the current code prints different ouput
if the XML input has some other nodes before <SReport> and after <MAX03_NRanges>. Those other nodes are not of interest but it seems the code is printin values of others nodes.
Besides that if I have another node similar to MR_NRanges and PK_NRanges, that is YU_NRanges that goes after PK_NRanges, how must be changed the code?
I've trying add another line as below in red, but is not working.
Code:
hsh[:mr] = {} if child.name == 'MR_NRanges'
hsh[:pk] = {} if child.name == 'PK_NRanges'
hsh[:pk] = {} if child.name == 'YU_NRanges'
Below I put the slightly different input for reference. The nodes I wamt to print is the same but adding YU_NRanges too.
I've been trying to modify your last code in order to handle a slightly different XML input, but showing different output.
I was wondering if your last code may be forced to extract only values of desired nodes, since the current code prints different ouput
if the XML input has some other nodes before <SReport> and after <MAX03_NRanges>. Those other nodes are not of interest but it seems the code is printin values of others nodes.
Besides that if I have another node similar to MR_NRanges and PK_NRanges, that is YU_NRanges that goes after PK_NRanges, how must be changed the code?
I've trying add another line as below in red, but is not working.
Code:
hsh[:mr] = {} if child.name == 'MR_NRanges'
hsh[:pk] = {} if child.name == 'PK_NRanges'
hsh[:pk] = {} if child.name == 'YU_NRanges'
Below I put the slightly different input for reference. The nodes I wamt to print is the same but adding YU_NRanges too.
The code as written well extract all nodes / elements inside a SReport (so before does not matter), but as for anything after MAX03 which is still inside the report, you simply need
to place something in the print_data to tell it to stop on the MX info.
As for adding another hash, you were correct except for the fact you did not assign a new name as a key:
Code:
hsh[:mr] = {} if child.name == 'MR_NRanges'
hsh[:pk] = {} if child.name == 'PK_NRanges'
hsh[:yu] = {} if child.name == 'YU_NRanges'
You will then of course need to add the corresponding section where you set the internal arrays and change from looking for when 'pk' finishes but now to look for when the last node (yu in this case)
finishes
Yes, I think I changing the code to get the values for YU_NRanges, but with the code I have so far, I don't know why, but
is printing a completely different output using the input I posted in post #24.
Thanks for the help so far.
Code:
#!/usr/bin/env ruby
require 'rexml/document'
include REXML
# Format the hash data as required
# Currently there is no checking to see if a field may not exist
# that is in the header
def print_data(hsh)
hsh.each do |key,value|
if value.is_a?(Hash)
#value.each_value{ |v| print '"' + v.join(",") + '"|' }
value.each_value{|v| print (v.size == 1?v[0]:('"' + v.join(",") + '"')) + "|" }
else
print value + (key =~ /^MXB/?(""):("|"))
end
end
puts
end
# Recurse down through all nodes / elements within the xml tree
# and store the values in a hash
def recurse(element, hsh = {})
element.elements.each do |child|
if child.name == 'SReport' && ! hsh.empty?
print_data(hsh)
hsh.clear
end
yu_finished = child.name =~ /^MX/?true:false
hsh[:mr] = {} if child.name == 'MR_NRanges'
hsh[:pk] = {} if child.name == 'PK_NRanges'
hsh[:yu] = {} if child.name == 'YU_NRanges'
if child.has_text? && child.text =~ /^[^[:space:]]+$/
if yu_finished || ! hsh[:mr].is_a?(Hash) || ! hsh[:pk].is_a?(Hash)
hsh[child.name] = child.text
else
if hsh[:yu].is_a?(Hash)
hsh[:yu][child.name] ||= []
hsh[:yu][child.name] << child.text
elsif hsh[:pk].is_a?(Hash)
hsh[:pk][child.name] ||= []
hsh[:pk][child.name] << child.text
else
hsh[:mr][child.name] ||= []
hsh[:mr][child.name] << child.text
end
end
end
recurse(child,hsh) if child.has_elements?
end
hsh
end
xmldoc = Document.new File.new("input_1.xml")
puts "RepName|RepIn|ReportType|Date|NA|NRB|SubRangeB|SubRangeE|NA|NRB|SubRangeB|SubRangeE|NA|NRB|SubRangeB|SubRangeE|MXA|MXB"
print_data(recurse(xmldoc))
Thank you, it works fine now, is not printing the nodes before SReport that I don't want.
The following line
Code:
if n.name == 'SReport' && h.has_key?("RepName")
it works like saying, when match node "RepName", begin to print nodes values? in other words, this line
says, this "RepName" is the first node I like to print?
Since there are nodes within "SReport/RepIni/ReportData/MainSec" that appear after node "MAX03_NRanges" that I don't
want to print, how can I say that the last node I want to print is "MAX03_NRanges"?
For example, I'm not interested to print the nodes in red below, since currently are being printed.
it works like saying, when match node "RepName", begin to print nodes values? in other words, this line
says, this "RepName" is the first node I like to print?
Not exactly. What it does say is that the hash must contain "RepName" as a key in order to be true. To this end I found it did not remove the guff data, but
rather just printed it along with the rest when called to print. The fix was a simply to move it into the if
Code:
if n.name == 'SReport'
print_data(h) if h.has_key?("RepName")
h.clear
end
The second issue has me a little more perplexed
Firstly, due to our code that checks if yu has finished, it is actually adding the data from the extra cells into the yu hash.
Secondly, due to us now using a recursive function to get the data, I think our return needs to happen either at the end of the file or when it reaches MXB.
Problem is we cannot tell it to do so when it reaches MXB as it will then not store it to printed.
So I will have a think one this one and let you know. If you come up with a solution, I would be keen to see it
I think it is simple but do not seem to be seeing the forest for the trees presently.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.