LinuxQuestions.org
Visit Jeremy's Blog.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 09-18-2010, 10:25 AM   #1
DJCharlie
Member
 
Registered: Sep 2010
Posts: 37

Rep: Reputation: 4
bash: Split a text file into an array? (NOT line-by-line)


Hi all. First-time poster here...

Say I have a file (called twitterstatus.tmp) that looks like this:

Code:
<status>
  <id>24854489768</id>
  <text>Are we gonna ride the sun home?</text>
    <id>55266987</id>
    <screen_name>dj_johnnyfever</screen_name>
</status>
<status>
  <id>24852047832</id>
  <text>@dj_johnnyfever Hey Johnny! Can you see this yet?</text>
    <id>51269031</id>
    <screen_name>DJCharlieKJSR</screen_name>
</status>
<status>
  <id>24845941995</id>
  <text>Dog... donkey... Well, they both start with the letter &quot;N&quot;...</text>
    <id>55266987</id>
    <screen_name>dj_johnnyfever</screen_name>
</status>
How could I feed this into an array, with each element containing everything between the <status> </status> tags?

Thanks in advance!
 
Old 09-18-2010, 11:36 AM   #2
kurumi
Member
 
Registered: Apr 2010
Posts: 228

Rep: Reputation: 53
are you going to convert to csv?
 
Old 09-18-2010, 11:43 AM   #3
DJCharlie
Member
 
Registered: Sep 2010
Posts: 37

Original Poster
Rep: Reputation: 4
No. Once I have each segment set, I'll be searching for a specific string contained in the segment to act on.

So, from the sample I posted, say the script sees this segment:

Code:
<status>
  <id>24852047832</id>
  <text>@dj_johnnyfever Hey Johnny! Can you see this yet?</text>
    <id>51269031</id>
    <screen_name>DJCharlieKJSR</screen_name>
</status>
It would trigger on the @dj_johnnyfever keyword, and act accordingly.

The trouble I'm having is splitting the file into segments.
 
Old 09-18-2010, 12:02 PM   #4
PTrenholme
Senior Member
 
Registered: Dec 2004
Location: Olympia, WA, USA
Distribution: Fedora, (K)Ubuntu
Posts: 4,187

Rep: Reputation: 354Reputation: 354Reputation: 354Reputation: 354
Since that's XML, why not just use the XML functionality to do what you want?

By the way, since this is your first post, I suggest that you "Report" your thread to the moderators and request that they move it to the Programming sub-forum where you'd get much better responses. (It's not, really, a "general" question.)
 
Old 09-18-2010, 12:04 PM   #5
DJCharlie
Member
 
Registered: Sep 2010
Posts: 37

Original Poster
Rep: Reputation: 4
Well, ideally, I'd prefer it not be in XML. I'm actually stripping out the XML further along in the script. I need plain-text variables for that. But first I need to divide it into easily digestible segments bound by the <status> </status> tags.

And thanks, I'll report it.
 
Old 09-18-2010, 12:44 PM   #6
quanta
Member
 
Registered: Aug 2007
Location: Vietnam
Distribution: RedHat based, Debian based, Slackware, Gentoo
Posts: 724

Rep: Reputation: 101Reputation: 101
I haven't a solution to convert directly into an array, but I found the following command to split into multiple files:
Code:
awk '/<status>/{ close("twitter"c".status"); c++ } { print $0 > "twitter"c".status" }' twitterstatus.tmp

Last edited by quanta; 09-18-2010 at 01:13 PM.
 
Old 09-18-2010, 01:06 PM   #7
Kenhelm
Member
 
Registered: Mar 2008
Location: N. W. England
Distribution: Mandriva
Posts: 360

Rep: Reputation: 170Reputation: 170
Try
Code:
eval arr=("$(sed "s/'/'\"'\"'/g; s/<status>/'&/; s/<\/status>/&'/" file)")

echo "${arr[0]}"
<status>
  <id>24854489768</id>
  <text>Are we gonna ride the sun home?</text>
    <id>55266987</id>
    <screen_name>dj_johnnyfever</screen_name>
</status>
sed puts each block into single quotes.
s/'/'\"'\"'/g protects any literal single quotes from eval by placing them in double quotes in a gap in the single quotes, e.g.
Code:
......It's now or never....
would become
'......It'"'"'s now or never....'
 
Old 09-18-2010, 01:10 PM   #8
DJCharlie
Member
 
Registered: Sep 2010
Posts: 37

Original Poster
Rep: Reputation: 4
Solved it! It's a bit crufty, but it works.

Basically, each segment I need is 5 lines long. So I do a post=`head -5 twitterstatus.tmp`, scan it for the keyword, and then using sed, delete the top 5 lines of the file. If the keyword is found, split $post into individual variables, and process from there!

Thanks for letting me bounce ideas off you, everyone!
 
Old 09-18-2010, 01:59 PM   #9
XavierP
Moderator
 
Registered: Nov 2002
Location: Kent, England
Distribution: Debian Testing
Posts: 19,192
Blog Entries: 4

Rep: Reputation: 475Reputation: 475Reputation: 475Reputation: 475Reputation: 475
As requested, moved to Programming
 
1 members found this post helpful.
Old 09-19-2010, 09:22 PM   #10
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,008

Rep: Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193Reputation: 3193
Well I am guessing it depends on what other things you wish to do, but here is something you could consider:
Code:
#!/usr/bin/awk -f

BEGIN{ RS="</status>" }

/@dj_johnnyfever/{ <do your stuff to this record> }
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[bash] Read file line by line and split on whitespace tskuzzy Programming 4 07-06-2009 03:24 PM
bash : read every line from text file starting at given line number quadmore Programming 4 02-20-2009 12:29 PM
help with c program to read each line from text file, split line , process and output gkoumantaris Programming 12 07-01-2008 12:38 PM
C++ text file line by line/each line to string/array Dimitris Programming 15 03-11-2008 08:22 AM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 11:39 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration