LinuxQuestions.org
Download your favorite Linux distribution at LQ ISO.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 06-29-2005, 07:03 PM   #1
sub_moa
LQ Newbie
 
Registered: Dec 2004
Location: Great Falls, Montana
Distribution: Mandrake 10.1, Fedora C3,4. Red Hat 8, LTSP
Posts: 16

Rep: Reputation: 0
wget a random file on phone server


Howdy all, I've never posted here, but i will admit i use linuxquestions as my bible. I've learned alot and gotten alot of my answers here. So thanks for your help in advance.

Here's my situation:
I have a server filled with a bunch of audio files. I would like to be able to pull several of the files off at one time, but in a random order. The server is a phone server and it records all the phone calls that come into the call center. I have to do call evals on each person once a week, so instead of getting 30 files by hand. I would like to have a script that would wget one audio file by each agent for that week. The files are listed on the server in a very complicated way, here's an example: agent-10-1117354150-4709.gsm. I've tried to randomize that last set of numbers but i havn't been able to get it working yet.

Is there any command that will go to the server and download say agent-10-111735* ??

Any help is greatly appriciated.

Thanks.
 
Old 06-29-2005, 09:38 PM   #2
Crashed_Again
Senior Member
 
Registered: Dec 2002
Location: Atlantic City, NJ
Distribution: Ubuntu & Arch
Posts: 3,503

Rep: Reputation: 57
Okay so the first two sets of digits(e.g. 10-1117354150) show which employee the recording belongs to right? You want to get 30 random recordings but only one per employee? Also, this is all going to be done remotely correct? You don't have direct access to the server itself?

I'm just trying to get this straight here. I think what I've deduced is correct but I want to double check.
 
Old 06-30-2005, 03:22 AM   #3
sub_moa
LQ Newbie
 
Registered: Dec 2004
Location: Great Falls, Montana
Distribution: Mandrake 10.1, Fedora C3,4. Red Hat 8, LTSP
Posts: 16

Original Poster
Rep: Reputation: 0
Correct, the first two digits are the agents ID and the second set is a random number created by the server to store the files. I havn't been able to find a similarity in the way it creates that number, except for the first few digits (1117) so when i wget, somewhere in the command line is going to have to be a wild card (wget agent-10-*). Your also correct, i'm not going to have direct access to the server, although i may be able to gain access if it would help.
 
Old 06-30-2005, 05:41 AM   #4
Crashed_Again
Senior Member
 
Registered: Dec 2002
Location: Atlantic City, NJ
Distribution: Ubuntu & Arch
Posts: 3,503

Rep: Reputation: 57
Well, if you were able to get a listing of all the possible files to download you could easily process them the way you want. I'm just not sure how to get a listing of remote files to download.

How do you download the files? FTP or HTTP?

One last thing. Reading your above post I'm wondering why you would wget agent-10-*. Wouldn't that get you all the files having to do with one employee? I thought you didn't want more then 1 file from each employee?

Also, could you possibly post a data sample of the audio files? I'm sure it would be huge but I think it would help to see what your dealing with.

Last edited by Crashed_Again; 06-30-2005 at 06:47 AM.
 
Old 06-30-2005, 10:23 AM   #5
sub_moa
LQ Newbie
 
Registered: Dec 2004
Location: Great Falls, Montana
Distribution: Mandrake 10.1, Fedora C3,4. Red Hat 8, LTSP
Posts: 16

Original Poster
Rep: Reputation: 0
Yeah I've trying to think of how to pull a list of files that are on it., but i havn't been having any luck. All the files have to be downloaded HTTP.

As far as wget agent-10-* since I'm doing call evals it has to be a random call to be "fair" for everyone, just so none of my supervisors dont think that i'm favoring anyone. I was hoping that i could pull one file off the server from one agent, then stop there, and move on to the next agent.

Also, as far as a data sample, what kind would you like? Really all i can give you is a sample list of the audio files. I'll post that next. If you would like anything else, let me know. It's great to have someone helping me, i've been giving myself a headache over this for around a week now.
 
Old 06-30-2005, 10:46 AM   #6
sub_moa
LQ Newbie
 
Registered: Dec 2004
Location: Great Falls, Montana
Distribution: Mandrake 10.1, Fedora C3,4. Red Hat 8, LTSP
Posts: 16

Original Poster
Rep: Reputation: 0
Here's a list of the audio files so you see the names of files i've been working with

agent-82-1117350474-4707.gsm
agent-10-1117354150-4709.gsm
agent-10-1117354173-4711.gsm
agent-10-1117357683-4713.gsm
agent-10-1117358274-4715.gsm
agent-82-1117359132-4720.gsm
agent-38-1117370905-4729.gsm
agent-38-1117370987-4734.gsm
agent-38-1117371411-4736.gsm
agent-38-1117371537-4739.gsm
agent-46-1117371692-4744.gsm
agent-38-1117372138-4749.gsm
agent-46-1117372206-4756.gsm
agent-46-1117372802-4764.gsm
agent-46-1117372972-4769.gsm
agent-46-1117373758-4774.gsm
agent-78-1117376160-4780.gsm
agent-78-1117378364-4788.gsm
agent-46-1117378444-4793.gsm
agent-38-1117379386-4795.gsm
agent-46-1117379472-4800.gsm
agent-78-1117379535-4805.gsm
agent-46-1117379770-4815.gsm
agent-46-1117380137-4821.gsm
agent-78-1117380338-4828.gsm
agent-26-1117380475-4831.gsm
agent-26-1117380596-4836.gsm
agent-78-1117380693-4838.gsm
agent-78-1117380836-4840.gsm
agent-78-1117380917-4842.gsm
agent-26-1117380921-4844.gsm
agent-46-1117381024-4846.gsm
agent-78-1117381066-4848.gsm
agent-78-1117381139-4850.gsm
agent-78-1117381237-4855.gsm
agent-46-1117381259-4857.gsm
agent-26-1117381343-4859.gsm
agent-46-1117381390-4861.gsm
agent-46-1117381483-4874.gsm
agent-46-1117381802-4876.gsm
agent-26-1117381888-4881.gsm

Ok, that should be enough copy and pasting. This is the call order yesterday in about a hour period. Now, if i could wget agent-26-11173 * then move on to the next agent, agent-27-11173 * and so on untill I have one call from each agent. I've already tried a randomizing set of numbers but that never actually hit a correct file in the wget because the server doesn't save the files in a paticualar order.

I hope this helps.
 
Old 06-30-2005, 12:12 PM   #7
Crashed_Again
Senior Member
 
Registered: Dec 2002
Location: Atlantic City, NJ
Distribution: Ubuntu & Arch
Posts: 3,503

Rep: Reputation: 57
Is this a public server that we could access? I need to see an example of the html page that is generated to download the files. I think you would have to write some code that would parse the html code and take out each file name. I know how to do this but I need some html code to work off of.

Can I look at this page that you are downloading from or can you give me another example?
 
Old 06-30-2005, 12:48 PM   #8
sub_moa
LQ Newbie
 
Registered: Dec 2004
Location: Great Falls, Montana
Distribution: Mandrake 10.1, Fedora C3,4. Red Hat 8, LTSP
Posts: 16

Original Poster
Rep: Reputation: 0
Sorry, it's not a public server. What are you looking for in the html? I may be able to give you some of that.
 
Old 06-30-2005, 03:38 PM   #9
rstewart
Member
 
Registered: Feb 2005
Location: Sunnyvale, CA
Distribution: Ubuntu
Posts: 205

Rep: Reputation: 38
Hi,

If you are familiar with coding web pages in php, why don't you create a simple web page that will algorithmically generate the file names to be downloaded, and then sends the specific files to you one at a time using a browser? I am sure that by using loops, and the glob() and rand() php functions you can generate a set of file names that are random in nature but assure you complete coverage. Once you have a file name, you can use the php "Location" command to actually send the file to your client browser and save the file like a normal web download.
 
Old 06-30-2005, 04:08 PM   #10
sub_moa
LQ Newbie
 
Registered: Dec 2004
Location: Great Falls, Montana
Distribution: Mandrake 10.1, Fedora C3,4. Red Hat 8, LTSP
Posts: 16

Original Poster
Rep: Reputation: 0
Sadly, I dont know php. I'm trying to get all this done in a shell script. Since all of this is giving me a headache i may just look into php and see what i can get done. Got any examples to get me started?
 
Old 06-30-2005, 06:40 PM   #11
Dave Kelly
Member
 
Registered: Aug 2004
Location: Todd Mission Texas
Distribution: Linspire
Posts: 215

Rep: Reputation: 31
It should give you a headache. If it was easy your girlfriend would do it for you.

Looks like you know what you want to happen but not how to get there. I would suggest that now is the time to put the computer aside and get out the old fashion pencil and paper. Something you can scratch through, draw long arrow to move down the page, generally it only needs to make sense to you in your mind, not to anyone else at the present.

What you need is a flow chart. Use you pencil and pad to develop this flow chart.
A flow chart should give you several things. A check to see if your logic is correct. A program flow and the results you expect from each function along with where does the program go if the expected results fails,and by using long function names what each function does.

Hint: When you name a function, start it with a number. example:
3 Extreeeeeeeeeeeeemllllllllllllly long descriptive function name
Later on if you need to loop back to this function it is easier to write 'goto 3' that the long name.

When you have the outline finished post it here. We can continue with the writing of the functions that are giving you trouble.

HTH
Dave
 
Old 06-30-2005, 07:13 PM   #12
sub_moa
LQ Newbie
 
Registered: Dec 2004
Location: Great Falls, Montana
Distribution: Mandrake 10.1, Fedora C3,4. Red Hat 8, LTSP
Posts: 16

Original Poster
Rep: Reputation: 0
Ok, i know what i want, and i know about 80% on how to get there.

Today I was googling and found a better command to use instead of wget. I can use curl. But as i was playing around with curl, i came across a small problem. It will write an output file even if the file isn't real on the server. Any way to manipulate curl to only write an output file if it finds the file on the server? Say:
if [ -f http://server/calls/agent-81-1777(00000-99999)-(0000-9999).gsm ]
then
curl http://server/calls/agent-81-1777(whatever numbers it found to be valid).gsm -O

If i do curl http://server/calls/agent-81-1777[0-9999] -O it writes a 9999 output files, including the one valid file. So basicly, how would i get it to verify the file is valid before it copys it to my computer. I hope i'm making sence, it's the end of the day and my brain has had it. I'm gonna go home and have a beer. Thanks all again.
 
Old 06-30-2005, 07:39 PM   #13
Crashed_Again
Senior Member
 
Registered: Dec 2002
Location: Atlantic City, NJ
Distribution: Ubuntu & Arch
Posts: 3,503

Rep: Reputation: 57
Well here is my idea and the reason I asked for the sample html page.

The first step is to get an array of all the possible files to download. This is why you would need to parse the html code to get the file names. The next step is to randomly select 1 file from every employee. From that data set you could randomly select 30 files and download them one at a time.

I think I could implement this in python but I would need a sample of the html from the page that serves the files.
 
Old 06-30-2005, 09:42 PM   #14
Dave Kelly
Member
 
Registered: Aug 2004
Location: Todd Mission Texas
Distribution: Linspire
Posts: 215

Rep: Reputation: 31
http://www.linuxvoodoo.com/resources/guides/abs-guide/
http://shamrockshire.yi.org/scripts.html
http://www.linuxlinks.com/Software/U...ts/index.shtml

Last edited by Dave Kelly; 06-30-2005 at 09:53 PM.
 
Old 07-01-2005, 10:00 AM   #15
Crashed_Again
Senior Member
 
Registered: Dec 2002
Location: Atlantic City, NJ
Distribution: Ubuntu & Arch
Posts: 3,503

Rep: Reputation: 57
Okay check this out:

Code:
#!/usr/bin/env python

import os, random

# SET THIS TO THE URL WHERE YOU DOWNLOAD THE FILES FROM.
# DON'T FORGET THE / AT THE END OF THE URL.
url = 'http://www.siteyourdownloadingfrom.com/directory/files/'

files = ['agent-10-1117354150-4709.gsm', 'agent-10-1817354150-4709.gsm', 'agent-10-1114554150-4709.gsm',
         'agent-11-4567354150-4709.gsm', 'agent-11-1165454150-4709.gsm', 'agent-11-1458654150-4709.gsm',
         'agent-12-1117534550-4709.gsm', 'agent-12-1345674150-4709.gsm', 'agent-12-1116785450-4709.gsm',
         'agent-13-1132454350-4709.gsm', 'agent-13-1454324450-4709.gsm', 'agent-13-3454363150-4709.gsm',
         'agent-14-1134534150-4709.gsm', 'agent-14-3453354150-4709.gsm', 'agent-14-1113454550-4709.gsm',
         'agent-15-1767654150-4709.gsm', 'agent-15-1456456150-4709.gsm', 'agent-15-1456454150-4709.gsm',
         'agent-16-4621354150-4709.gsm', 'agent-16-1113453330-4709.gsm', 'agent-16-1114564450-4709.gsm',
         'agent-17-1523454150-4709.gsm', 'agent-17-1115454544-4709.gsm', 'agent-17-1117345640-4709.gsm',
         'agent-18-1153244440-4709.gsm', 'agent-18-1114545320-4709.gsm', 'agent-18-1135343150-4709.gsm',
         'agent-19-6324554335-4709.gsm', 'agent-19-1164545150-4709.gsm', 'agent-19-1163463350-4709.gsm',
         'agent-20-1115434555-4709.gsm', 'agent-20-1115443330-4709.gsm', 'agent-20-1766777450-4709.gsm',
         'agent-21-1115345454-4709.gsm', 'agent-21-1117354545-4709.gsm', 'agent-21-1114564550-4709.gsm',
         'agent-22-4534544150-4709.gsm', 'agent-22-1453354150-4709.gsm', 'agent-22-1117456350-4709.gsm',
         'agent-23-1134545150-4709.gsm', 'agent-23-4534454150-4709.gsm', 'agent-23-1134545150-4709.gsm',
         'agent-24-1455223330-4709.gsm', 'agent-24-1345454150-4709.gsm', 'agent-24-1116556560-4709.gsm',
         'agent-25-5634545440-4709.gsm', 'agent-25-3454224150-4709.gsm', 'agent-25-1117375656-4709.gsm',
         'agent-26-1145454634-4709.gsm', 'agent-26-1154545450-4709.gsm', 'agent-26-1254556345-4709.gsm',
         'agent-27-1116346666-4709.gsm', 'agent-27-1113456633-4709.gsm', 'agent-27-1113456777-4709.gsm',
         'agent-28-5845865485-4709.gsm', 'agent-28-1117345577-4709.gsm', 'agent-28-3434534322-4709.gsm',
         'agent-29-1485958757-4709.gsm', 'agent-29-1534518676-4709.gsm', 'agent-29-1463346740-4709.gsm',
         'agent-30-1958403048-4709.gsm', 'agent-30-1176574565-4709.gsm', 'agent-30-1444454150-4709.gsm',
         'agent-31-1587639490-4709.gsm', 'agent-31-1175745670-4709.gsm', 'agent-31-1116734550-4709.gsm',
         'agent-32-9847364857-4709.gsm', 'agent-32-1117674566-4709.gsm', 'agent-32-1543234421-4709.gsm',
         'agent-33-9564345857-4709.gsm', 'agent-33-1456456546-4709.gsm', 'agent-33-1654645221-4709.gsm',
         'agent-34-9847564565-4709.gsm', 'agent-34-1117776567-4709.gsm', 'agent-34-1144456354-4709.gsm',
         'agent-35-9456456456-4709.gsm', 'agent-35-7567765464-4709.gsm', 'agent-35-1656435421-4709.gsm',
         'agent-36-4345344857-4709.gsm', 'agent-36-1767347546-4709.gsm', 'agent-36-1654643321-4709.gsm',
         'agent-37-9645432237-4709.gsm', 'agent-37-1117856807-4709.gsm', 'agent-37-1145454644-4709.gsm',
         'agent-38-9865748940-4709.gsm', 'agent-38-0594736455-4709.gsm', 'agent-38-1564563255-4709.gsm',
         'agent-39-9847564645-4709.gsm', 'agent-39-1456456565-4709.gsm', 'agent-39-1165678896-4709.gsm',
         'agent-40-9834444857-4709.gsm', 'agent-40-1565345577-4709.gsm', 'agent-40-6563345781-4709.gsm',
         'agent-41-2424677347-4709.gsm', 'agent-41-6456734554-4709.gsm', 'agent-41-1145534524-4709.gsm',
         'agent-42-9845456367-4709.gsm', 'agent-42-1115434523-4709.gsm', 'agent-42-7674545671-4709.gsm',
         'agent-43-9345345345-4709.gsm', 'agent-43-1656564324-4709.gsm', 'agent-43-1146456345-4709.gsm']


dic = {}
for entry in files:
    # GET THE FIRST NUMBER AFTER 'agent-'
    number = entry.split('-')[1]
    # IF THE DICTIONARY ALREADY HAS THE 'number' IN IT
    # APPEND THE 'entry' TO THE 'number' KEY IN 'dic'
    if dic.has_key(number):
        dic[number].append(entry)
    # OTHERWISE CREATE THE DICTIONARY KEY
    # USING THE 'number' AND ADD THE ENTRY
    else:
        dic[number] = [entry]

tmp_list = []
# RANDOMLY CHOOSE ONE FILE FROM EACH EMPLOYEE
for vals in dic.values():
    tmp_list.append(random.choice(vals))

final_list = []
counter = 1
while counter <= 30:
    # RANDOMLY CHOOSE ONE OF THE FILES
    fn = random.choice(tmp_list)
    # IF THE FILE IS ALREADY IN THE LIST
    # START THE LOOP OVER AGAIN
    if fn in final_list:
        continue
    # OTHERWISE DOWNLOAD THE FILE
    else:
        print 'Downloading %s...' % fn
        # DOWNLOAD THE FILE USING WGET
        try:
            os.system('wget '+ url + fn) 
        except:
            print "Could not download %s.  Server down?  File doesn't exist?" % fn
        print '%s finished downloading.\n' % fn
        # INCREASE THE COUNTER
        counter += 1

print '\n\nDownloading complete.  Have a nice day!'
I tried to comment as much as I could so you can follow whats going on. This should do the trick for you. The only problem is, as you can see, I provided the data set of file names. In order to write something to parse the html code to get the file names available to download I'd need to see the html code of the page that you download from.

Last edited by Crashed_Again; 07-01-2005 at 11:07 AM.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
where does konqueror and wget downloaded file go eatmars Linux - Software 2 08-28-2005 10:11 PM
File size limit exceeded with wget jimdaworm Slackware 1 04-24-2005 03:06 AM
file transfer to nokia phone spuzzzzzzz Programming 0 11-19-2004 03:03 AM
Any somebody tell me HOW can I get the file list from ftp(http) server by wget? wuzhong Linux - Networking 2 09-07-2004 08:17 AM
wget any version of a file? shishimo Linux - General 1 07-08-2004 01:37 PM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 03:03 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration