LinuxQuestions.org
Download your favorite Linux distribution at LQ ISO.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 09-02-2017, 10:47 PM   #1
allan.sammar@vfemail.net
LQ Newbie
 
Registered: Sep 2017
Posts: 2

Rep: Reputation: Disabled
Question split large text file


How can i split 1 GB text file [wordlists] into multiple smaller size text file?
 
Old 09-02-2017, 11:09 PM   #2
frankbell
LQ Guru
 
Registered: Jan 2006
Location: Virginia, USA
Distribution: Slackware, Ubuntu MATE, Mageia, and whatever VMs I happen to be playing with
Posts: 19,323
Blog Entries: 28

Rep: Reputation: 6142Reputation: 6142Reputation: 6142Reputation: 6142Reputation: 6142Reputation: 6142Reputation: 6142Reputation: 6142Reputation: 6142Reputation: 6142Reputation: 6142
See man split.

Oh, and thanks for the question. I didn't know about the split command before.

Last edited by frankbell; 09-02-2017 at 11:10 PM.
 
2 members found this post helpful.
Old 09-03-2017, 03:52 AM   #3
pan64
LQ Addict
 
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 21,842

Rep: Reputation: 7308Reputation: 7308Reputation: 7308Reputation: 7308Reputation: 7308Reputation: 7308Reputation: 7308Reputation: 7308Reputation: 7308Reputation: 7308Reputation: 7308
oh yes, and you can use awk too (or perl/python/whatever) if you have a not-so-trivial condition to split.
https://is.gd/j4MZnR
 
Old 09-03-2017, 08:47 AM   #4
Sefyir
Member
 
Registered: Mar 2015
Distribution: Linux Mint
Posts: 634

Rep: Reputation: 316Reputation: 316Reputation: 316Reputation: 316
split was my immediate thought and it works but I'm going to assume the OP wants the integrity of each word saved.
Examining the file below you'll notice it splits on [ok] and [4]. Not good if you have a wordlist.

Code:
$ cat foobar
ok1
ok2
ok3
ok4
ok5
ok6
ok7
Code:
$ split -n2 foobar
$ cat xaa; echo -e '\n\n'; cat xab
...
ok3
ok


4
ok5
...
To make this work, examine these parts of the -n --number=CHUNKS command
Code:
       N      split into N files based on size of input

       K/N    output Kth of N to stdout

       l/N    split into N files without splitting lines/records

       l/K/N  output Kth of N to stdout without splitting lines/records

       r/N    like 'l' but use round robin distribution

       r/K/N  likewise but only output Kth of N to stdout
There's another part, separating on another character besides the newline, however I could not get that to work

Code:
       -t, --separator=SEP
              use  SEP instead of newline as the record separator; '\0' (zero)
              specifies the NUL character
Code:
split -t '\n' foobar 
split: multi-character separator ‘\\n’
I'm not sure why the above does not work, or more specifically, how to specify a newline (or tab, etc)

Last edited by Sefyir; 09-03-2017 at 09:10 AM.
 
1 members found this post helpful.
Old 09-03-2017, 11:27 PM   #5
chrism01
LQ Guru
 
Registered: Aug 2004
Location: Sydney
Distribution: Rocky 9.2
Posts: 18,359

Rep: Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751
For really simple stuff eg just N lines, you could use head/tail, but split is more capable.
As above, for non-trivial rules/situations, use Perl (or similar).
 
Old 09-13-2017, 09:50 PM   #6
allan.sammar@vfemail.net
LQ Newbie
 
Registered: Sep 2017
Posts: 2

Original Poster
Rep: Reputation: Disabled
Smile

Quote:
Originally Posted by Sefyir View Post
split was my immediate thought and it works but I'm going to assume the OP wants the integrity of each word saved.
Examining the file below you'll notice it splits on [ok] and [4]. Not good if you have a wordlist.

Code:
$ cat foobar
ok1
ok2
ok3
ok4
ok5
ok6
ok7
Code:
$ split -n2 foobar
$ cat xaa; echo -e '\n\n'; cat xab
...
ok3
ok


4
ok5
...
To make this work, examine these parts of the -n --number=CHUNKS command
Code:
       N      split into N files based on size of input

       K/N    output Kth of N to stdout

       l/N    split into N files without splitting lines/records

       l/K/N  output Kth of N to stdout without splitting lines/records

       r/N    like 'l' but use round robin distribution

       r/K/N  likewise but only output Kth of N to stdout
There's another part, separating on another character besides the newline, however I could not get that to work

Code:
       -t, --separator=SEP
              use  SEP instead of newline as the record separator; '\0' (zero)
              specifies the NUL character
Code:
split -t '\n' foobar 
split: multi-character separator ‘\\n’
I'm not sure why the above does not work, or more specifically, how to specify a newline (or tab, etc)
thanks Sefyir that is help alot .. it does work fine ,, thanks again
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
LXer: How to split a large archive*file into multiple small files using Split command in Linux LXer Syndicated Linux News 0 11-07-2016 05:20 PM
split very large 200mb text file by every N lines (sed/awk fails) doug23 Programming 8 08-10-2009 06:08 PM
[quick] trying to split a large file but linux says it's to large steve51184 Linux - General 16 05-06-2008 07:40 AM
how do I split large file by string? khairil Programming 5 04-28-2008 10:37 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie

All times are GMT -5. The time now is 05:14 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration