LinuxQuestions.org
Visit Jeremy's Blog.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 04-22-2012, 07:14 AM   #1
phazeman
LQ Newbie
 
Registered: Mar 2002
Distribution: Mandrake 9.0
Posts: 9

Rep: Reputation: 0
Data distribution among lines within a file with bash


Hi All

I need to create a text file and distribute some numbers among the lines by percentage. What do i mean exactly:
i want to set percentage for each number and then fill the lines by that percentage
500 - 20%
501 - 30%
502 - 50%
i need that 20% of the lines will contain the number "500", 30% for "501" and remaining 50% with "502". I need that to be filled random and not followed one by another.

Any help will be appreciated !
 
Old 04-22-2012, 07:47 AM   #2
colucix
LQ Guru
 
Registered: Sep 2003
Location: Bologna
Distribution: CentOS 6.5 OpenSuSE 12.3
Posts: 10,509

Rep: Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983
Well, you can fill the file with numbers in sequence (according to their percentage) and scramble it later. You can try the shuf command or the shuffle function in perl, e.g.
Code:
shuf file
or
Code:
cat file | perl -MList::Util=shuffle -e 'print shuffle(<STDIN>);'
Hope this helps.
 
Old 04-22-2012, 08:43 AM   #3
phazeman
LQ Newbie
 
Registered: Mar 2002
Distribution: Mandrake 9.0
Posts: 9

Original Poster
Rep: Reputation: 0
Quote:
Originally Posted by colucix View Post
Well, you can fill the file with numbers in sequence (according to their percentage) and scramble it later. You can try the shuf command or the shuffle function in perl, e.g.
Code:
shuf file
or
Code:
cat file | perl -MList::Util=shuffle -e 'print shuffle(<STDIN>);'
Hope this helps.
Since there are more parameters involved in the line (i tried not to write garbage here) it's impossible to do so

the line will look like:
<some number>, <some number>, 500, <some number>

I'm generating the additional numbers with "for" sentences and that's not a problem, but i can't think of this specific distribution mechanism...
 
Old 04-22-2012, 09:11 AM   #4
colucix
LQ Guru
 
Registered: Sep 2003
Location: Bologna
Distribution: CentOS 6.5 OpenSuSE 12.3
Posts: 10,509

Rep: Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983
Looking at the relevant piece of your script should be useful. At first I would generate the shuffled sequence of numbers then insert them in the original text one at a time in a loop.
 
Old 04-22-2012, 09:14 AM   #5
phazeman
LQ Newbie
 
Registered: Mar 2002
Distribution: Mandrake 9.0
Posts: 9

Original Poster
Rep: Reputation: 0
this is the script that is ready:

for i in `seq -w 0 255`; do
for j in `seq -w 0 255`; do
echo -e "930000${i}${j},<here should come the distributed number>,,0,,930000${i}${j},930000${i}${j},English" >> test.txt
done
done
 
Old 04-22-2012, 10:21 AM   #6
colucix
LQ Guru
 
Registered: Sep 2003
Location: Bologna
Distribution: CentOS 6.5 OpenSuSE 12.3
Posts: 10,509

Rep: Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983
Ok. I would assign the shuffled sequence of numbers to an array and then use the Nth element of the array inside the loops, by increasing the index of the array by one at each iteration. Here we go:
Code:
#!/bin/bash
lines=65536
p1=$(( lines * 20 / 100 ))
p2=$(( lines * 30 / 100 ))
p3=$(( lines * 50 / 100 ))
p4=$(( lines - p1 - p2 - p3 ))
sequence=( $(echo "$(seq 1 $p1 | awk '{print 500}' && seq 1 $p2 | awk '{print 501}' && seq 1 $p3 | awk '{print 502}' && seq 1 $p4 | awk '{print 502}')" | shuf) )
for i in $(seq -w 0 255)
do
  for j in $(seq -w 0 255)
  do 
    echo "930000${i}${j},${sequence[((c++))]},,0,,930000${i}${j},930000${i}${j},English"
  done
done > test.txt
The part in blue increases the variable c by one at each iteration. Using this specific notation (inherited from the C language) the variable is increased after it is evaluated. This means that at the first iteration the value is still 0, at the second iteration it is 1 and so on. This is exactly what we want, since array elements in bash are numbered starting from 0.

Hope this helps.
 
Old 04-23-2012, 01:39 AM   #7
phazeman
LQ Newbie
 
Registered: Mar 2002
Distribution: Mandrake 9.0
Posts: 9

Original Poster
Rep: Reputation: 0
This looks very promising ! but i can't run it since my linux doesn't have the shuf. and i can't seem to find the rpm anywhere (RHEL 5.5 Tikanga 32bit). the official ISO doesn't have the rpm of it. Apparently coreutils rpm doesn't include it in RHEL 5.5...
 
Old 04-23-2012, 02:17 AM   #8
colucix
LQ Guru
 
Registered: Sep 2003
Location: Bologna
Distribution: CentOS 6.5 OpenSuSE 12.3
Posts: 10,509

Rep: Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983
Indeed, shuf is available in more recent versions of coreutils. You can try the perl command, that is:
Code:
sequence=( $(echo "$(seq 1 $p1 | awk '{print 500}' && seq 1 $p2 | awk '{print 501}' && seq 1 $p3 | awk '{print 502}' && seq 1 $p4 | awk '{print 502}')" | perl -MList::Util=shuffle -e 'print shuffle(<STDIN>);') )
This ensures compatibility with older systems.
 
1 members found this post helpful.
Old 04-23-2012, 03:43 AM   #9
phazeman
LQ Newbie
 
Registered: Mar 2002
Distribution: Mandrake 9.0
Posts: 9

Original Poster
Rep: Reputation: 0
THANK YOU VERY MUCH ! looks like it solved the problem !!!
 
Old 04-23-2012, 03:49 AM   #10
colucix
LQ Guru
 
Registered: Sep 2003
Location: Bologna
Distribution: CentOS 6.5 OpenSuSE 12.3
Posts: 10,509

Rep: Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983
Glad to hear it! Please, mark this thread as SOLVED. Thanks!
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] Extract multiple lines of data from a text file. shawnamiller Programming 8 04-30-2010 11:46 AM
Basic bash script question re: file size or # of lines in a file the_fornicator Programming 6 09-03-2009 09:41 AM
[bash] edit lines in a file pieperp Programming 4 01-31-2007 08:56 AM
Can't get lines of a file with a Bash script.. barisdemiray Programming 2 08-11-2004 12:42 PM
use file data in command lines karlis Linux - General 3 08-29-2003 07:27 PM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 03:32 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration