Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question?
If it is not in the man pages or the how-to's this is the place! |
Notices |
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
Are you new to LinuxQuestions.org? Visit the following links:
Site Howto |
Site FAQ |
Sitemap |
Register Now
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
|
|
06-04-2012, 02:45 PM
|
#1
|
LQ Newbie
Registered: Jun 2012
Posts: 2
Rep:
|
How do I generate a new text file for each line of text in a document?
Hi all,
i have a text file of several thousand lines, and each lines needs to be outputed in a separate text file.
Split command doesn´t work, because of suffix failure in the case of so many lines. This is what I have tried so far:
while read LINE; do echo $LINE>$LINE.txt; done <all_texts_new.txt
This works perfectly until the ">$LINE.txt" part, here it seems to fail.
Any ideas? Thanks a lot
|
|
|
06-04-2012, 03:39 PM
|
#2
|
Senior Member
Registered: Dec 2004
Location: Olympia, WA, USA
Distribution: Fedora, (K)Ubuntu
Posts: 4,187
|
Try while read LINE; do echo ${LINE}>"${LINE}".txt; done <all_texts_new.txt (By default bash will "tokenize" LINE, and any blanks, etc., in LINE will make bash try the parts of LINE after the blank as commands to be executed if the quotes are missing.)
There are other, more sophisticated, ways to do this sort of thing. And the .txt, while it may be useful as a "tag" for you, is not normally required. (Linux files a categorized by an internal "magic number," and, although extensions can often help users,they are't needed by most programs.)
By the way, you're going to end up with a directory containing files whose contents is redundant with the file's name. You could, instead, create empty files without loosing any information: while read LINE; touch "${LINE}"; done <all_texts_new.txt
|
|
1 members found this post helpful.
|
06-04-2012, 03:46 PM
|
#3
|
LQ Newbie
Registered: Jun 2012
Posts: 2
Original Poster
Rep:
|
Thank you for your suggestion, PTrenholme! Unfortunately, this did not work in my case, probably because lines are too long to function as file names???
Anyways, found the solution just a couple of minutes ago:
awk '{print >> "s" sprintf("%03d",++c) ".txt"}' all_texts_new.txt
Phew, glad this is done, it was a loong evening!
|
|
|
06-04-2012, 04:05 PM
|
#4
|
LQ Guru
Registered: Sep 2009
Location: Perth
Distribution: Arch
Posts: 10,028
|
What if the line is blank? Do you want an empty file for that line?
|
|
|
06-04-2012, 08:25 PM
|
#5
|
Senior Member
Registered: Dec 2004
Location: Olympia, WA, USA
Distribution: Fedora, (K)Ubuntu
Posts: 4,187
|
Quote:
Originally Posted by dio
Thank you for your suggestion, PTrenholme! Unfortunately, this did not work in my case, probably because lines are too long to function as file names???
Anyways, found the solution just a couple of minutes ago:
awk '{print >> "s" sprintf("%03d",++c) ".txt"}' all_texts_new.txt
Phew, glad this is done, it was a loong evening!
|
IIRC, the file name length limit is about 256 characters, but it may be longer if you're using a 64-bit OS.
Anyhow, if you just wanted to name the file sequentially, you should have said so in your first post. The AWK code to do what you first said you wanted would be something like
gawk '{print > "\"" gensub(/\"/,"\\\"","g",$0) "\".txt"}' all_texts_new.txt
(Note that the gensub function is a gawk extension to ANSI AWK.)
And the bash code to do what you ended up doing would be:
c=0;while read LINE; do c=$((++c));echo $LINE>$(printf "s%03d" ${c});done <all_texts_new.txt
(But the AWK would be much faster . . .)
Last edited by PTrenholme; 06-05-2012 at 01:17 PM.
Reason: Typo in the bash solution.
|
|
|
06-05-2012, 03:09 AM
|
#6
|
LQ Guru
Registered: Sep 2009
Location: Perth
Distribution: Arch
Posts: 10,028
|
Well I will assume you do not wish for blank files, but also that the line an entry is on is important, so this could work:
Code:
awk 'NF{print > sprintf("s%03d.txt",NR)}' file
|
|
|
06-05-2012, 03:31 PM
|
#7
|
Bash Guru
Registered: Jun 2004
Location: Osaka, Japan
Distribution: Arch + Xfce
Posts: 6,852
|
Please use *** [code][/code] tags*** around your code and data, to preserve formatting and to improve readability. Please do not use quote tags, colors, or other fancy formatting.
The " ++" increment operator resets the variable contents directly, so there's no need to use the " =".
In bash or ksh, you can use the ((..)) arithmetic operator. For portability, prefix $((..)) with the true command or use let.
Code:
(( ++c ))
: $(( ++c ))
let ++c
In this case however, we want to pass the value directly to printf, so we use $((..)).
Also, don't forget to quote your expansions, to avoid word splitting.
So, to flesh out the loop (assuming bash):
Code:
c=0
while read -r line || [[ -n $line ]]; do
[[ -z $line ]] && continue #skips empty lines
echo "$line" > "$( printf "s%03d.txt" "$((++c))" )"
done <all_texts_new.txt
The [[ -n $line ]] test catches cases where there's no final newline in the input text.
Finally, since environment variables are generally all upper-case, it's good practice to keep your own user variables in lower-case or mixed-case to help differentiate them.
|
|
|
All times are GMT -5. The time now is 01:45 PM.
|
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.
|
Latest Threads
LQ News
|
|