using awk to get blocks of data from a text file
I want to extract multiline blocks of data from a text file into a multiline string array from a multiline test file. The data file looks like
Code:
[DATATYPE1] Code:
$dataarray[1]="multiple unknown number of lines of Code:
# /bin/sh |
I'm confused. Are you trying to extract the lines with awk for a shell script array, or trying to set the lines in an awk array, or what? Could you explain the context for your request a bit more?
In awk, you'd probably have to set the RS to a string that matches each block, then further process it to exclude the lines you don't want, perhaps with sub/gsub. After that, it would depend on what you want to do with it. Code:
$dataarray[1]="text" |
First, thanks for looking. Second sorry, please disregard the description above for what I am trying to achieve as it is unclear and an incorrect syntax.
To be more clear: I do want to extract the desired lines into a shell array. I was hoping for one block of lines at each position in the array. In the example above I would expect to be able to add the following to the end of the script Code:
echo ${dataarray[1]} > firstblock.txt Code:
multiple unknown number of lines of Also I'd rather do it by adding dimensions rather than adding functions. Again sorry for being unclear, I would have thought that pulling blocks of multi-line text from a text file against a regex would have been common as dirt but I'm having real trouble figuring it out. Lots of hits in google but they're either far too complicated for me to follow for my "simple" problem or they are too far off topic. |
may be this helps:
put the contents in a text file named "solve_problem.txt" Code:
more solve_problem.txt Code:
#!/bin/bash |
Quote:
Code:
$~ IFS=$'\n' mapfile -t dataarray < <(awk ' |
Quote:
Quote:
|
Thanks for the clarification. I also just noticed that you only want to extract one datatype. That makes sed easier to use here.
Here's my quick solution: Code:
while read -r -d "@" line ; do Be sure to change it to a different character if at-marks could exist in the text, of course. Running it on the example text above, here's the output of the array. Code:
$ printf '[%s]\n\n' "${array[@]}" |
Just a note for people using this. Please be aware of the difference between \n and \r which cost me a day of grief.
|
Here's the final version for processing the lines from each data block of my data file one by one. Note as above that I needed to do a DOS2UNIX conversion of my data file to convert the \r into \n. Also I had to ensure that my dummy escape character @ was not used in my file.
Code:
#create block array |
I am presuming this is merely an exercise as you could have simply read the file in a while loop
at the start and delivered the same final output? |
Thanks for looking grail. No, this was a real problem successfully solved by David. I'm curious. How would the script/code of your solution differ from that posted? Are you suggesting the use of switches? are you suggesting the use of case? How would you capture the data block as described in OP differently to that described by David. Perhaps you have a link or example code where this problem was solved more efficiently by your alternate mathod? I'd definitely be happy to hear of something more efficient.
|
Well I was looking at your final code and using the input provided in your first post and example of the output would be:
Code:
processing line 1 of block 1 Code:
#!/bin/bash |
That's nice tidy code and makes immediate sence to me thanks. I didnt know I could use regex in simple string comparisons in bash. However, now that I've seen your solution I can see that I needed to be more explicit. I needed/wanted to import an entire block before I began processing that block - hence my request to get all the whole blocks into a multiline string array.
I'm not sure how I might modify your code to capture whole blocks at a time before processing the blocks, but presumably it might be straight forward. One might write the blocks to temporary files? or capture multiline strings into a string array as above giving essentially the same result? If, on the other hand your loop could be split into two nested parts somehow capturing blocks of data in the inner loop and processing them in an outer loop that would be the most efficient and usefull method of all, especially compared to sed since different "start_regex" variables might be used to parse out different kinds of data blocks that might require different processing tasks in the outer loop. Thanks a lot for having a look! I dont mean to muck anyone around here. It's just that I see data files laid out like this all the time and so I think a solution to parse them in bash would be very useful to lots of people. |
You may have to explain a little further as I am not sure I see why you would get the data and then perform further tasks on the data you have already parsed? Is it not more expedient to work on the
data as you hit it? The only thing I can think of off the top of my head is if you wish to maybe perform tasks out of order?? Sorry if I missed the point here :( If you are wanting the data stored in an array first then the examples presented by others would seem appropriate. You could instead of echoing the data store it in an array to be used later so that the entire solution is bash instead of using other commands as well. |
Quote:
Your suggestion to write lines to a multi line string array is probably what I would do to modify your code. Thanks again |
All times are GMT -5. The time now is 08:04 AM. |