LinuxQuestions.org
Download your favorite Linux distribution at LQ ISO.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 10-11-2012, 08:25 PM   #1
amboxer21
Member
 
Registered: Mar 2012
Location: New Jersey
Distribution: Gentoo
Posts: 291

Rep: Reputation: Disabled
preserving file contents while using wget or curl


Anyone know how to preserve file contents while using wget or curl. I have to download a series of json objects and want to save myself the trouble of having to write a crazy awk/sed parser. Everything has been merged into one freaking line. I would like the contents to remain intact. They way they appear on the internet.

EDIT:
At the rate this thread is moving, it would be faster to write a parsing script! Which is what i am going to do. If someone wants to reply with an answer for future reference or for other people searching for the same answer... then feel free. Thanks

Last edited by amboxer21; 10-11-2012 at 08:49 PM.
 
Old 10-12-2012, 07:32 AM   #2
Snark1994
Senior Member
 
Registered: Sep 2010
Distribution: Debian
Posts: 1,632
Blog Entries: 3

Rep: Reputation: 346Reputation: 346Reputation: 346Reputation: 346
You... gave up on an answer after half an hour? Bearing in mind that in my time zone you posted at 2AM, you're only really going to get an answer from people living near to you (globally) or with freakish sleep patterns. Anyway.

Do you have an example of the file you're downloading? Whenever I have used wget or curl, the file has downloaded exactly as it was on the web. The only things I can think of are:
  • Different line endings (shouldn't really be a problem, I would expect the all-on-one-line symptom if you were downloading a unix line ending file onto Windows)
  • The file uses <br/> for line endings and has no actual line breaks

However, it would be easier to work out what's going wrong if we have the link to the actual file.
 
Old 10-12-2012, 08:49 AM   #3
arunchinnachamy
LQ Newbie
 
Registered: Dec 2010
Posts: 4
Blog Entries: 1

Rep: Reputation: 0
amboxer21,
Like Snark1994 mentioned, wget or curl downloads the exact file as it is in internet. Probably what you see in browser and after wget/curl is different as the browser will format the output based on the content type. Any link to the file might help to understand the issue better.
 
Old 10-12-2012, 10:46 AM   #4
amboxer21
Member
 
Registered: Mar 2012
Location: New Jersey
Distribution: Gentoo
Posts: 291

Original Poster
Rep: Reputation: Disabled
Like I said, I already wrote a one liner to parse the info. For some odd reason, wget was removing tabs and new line chars and saving everything in one huge line. Whatever.

Last edited by amboxer21; 10-12-2012 at 10:48 AM.
 
Old 10-13-2012, 07:13 AM   #5
Snark1994
Senior Member
 
Registered: Sep 2010
Distribution: Debian
Posts: 1,632
Blog Entries: 3

Rep: Reputation: 346Reputation: 346Reputation: 346Reputation: 346
Well done for solving your own problem, and marking the thread as 'SOLVED' if you could post your one-liner, then it would help other people who are having a similar problem.

Thanks,
 
Old 10-13-2012, 10:18 AM   #6
amboxer21
Member
 
Registered: Mar 2012
Location: New Jersey
Distribution: Gentoo
Posts: 291

Original Poster
Rep: Reputation: Disabled
...See below.

Last edited by amboxer21; 10-13-2012 at 01:49 PM.
 
Old 10-13-2012, 01:49 PM   #7
amboxer21
Member
 
Registered: Mar 2012
Location: New Jersey
Distribution: Gentoo
Posts: 291

Original Poster
Rep: Reputation: Disabled
Updated....

Last edited by amboxer21; 10-20-2012 at 02:53 AM.
 
Old 10-20-2012, 02:53 AM   #8
amboxer21
Member
 
Registered: Mar 2012
Location: New Jersey
Distribution: Gentoo
Posts: 291

Original Poster
Rep: Reputation: Disabled
I figured, I would share something I am working on ATM. A program to download a whole photostream. This is the parser part of it.

The file contains 1243 words and cannot post it here. So, here's a link -> http://www.4shared.com/file/UMrpIUld/photostream.html

That's just a file that consists of a bunch of Json objects. The new lines and tabs are removed. I have parsed out all of the http(s) urls.

The parser ->
awk '{gsub(",", "\n"); print}' photostream | sed -n 's/["{\\]//g;s/^[a-zA-Z0-9]*\://g;/\(^[http].*\:\).*\([jpg]$\)/p'

The file is obtained with an access token allowing you to obtain the Json file. Then the URL's are parsed out with the parser above. What do you think? If anyone wants to tighten up the parser, that would be cool and I welcome any suggestions!

I keep procrastinating and pushing this off to the side due to the next step being difficult. I wrote the downloader, but haven't accounted for the next URL set. Which resides at the bottom but is currently parsed out. I would have to separate the http urls from the next set url. Then run the parsed URL's through the downloader and download the next photoset before downloading them and proceeding. It's a bigger pain than it sounds!

Maybe someone wants to help write this?? Could incorporate some Perl to automate the access token process and a GUI with GTK!?

THOUGHTS? ADVICE?
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] mount bind, preserving directory contents batfastad Linux - Newbie 5 11-17-2010 11:43 AM
curl and wget http post williebens Linux - Newbie 4 10-11-2010 11:03 PM
Switching distros, preserving ~ contents? Lordandmaker Linux - Newbie 1 05-11-2006 08:16 AM
YOU for SUSE 9.1 - curl or wget? djc SUSE / openSUSE 1 02-15-2005 03:26 PM
Wget and cURL can't connect umberleigh Linux - Newbie 0 09-21-2004 05:59 PM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 08:37 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration