LinuxQuestions.org
Welcome to the most active Linux Forum on the web.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 09-02-2017, 10:47 PM   #1
allan.sammar@vfemail.net
LQ Newbie
 
Registered: Sep 2017
Posts: 2

Rep: Reputation: Disabled
Question split large text file


How can i split 1 GB text file [wordlists] into multiple smaller size text file?
 
Old 09-02-2017, 11:09 PM   #2
frankbell
LQ Guru
 
Registered: Jan 2006
Location: Virginia, USA
Distribution: Slackware, Debian, Mageia, and whatever VMs I happen to be playing with
Posts: 13,829
Blog Entries: 24

Rep: Reputation: 3685Reputation: 3685Reputation: 3685Reputation: 3685Reputation: 3685Reputation: 3685Reputation: 3685Reputation: 3685Reputation: 3685Reputation: 3685Reputation: 3685
See man split.

Oh, and thanks for the question. I didn't know about the split command before.

Last edited by frankbell; 09-02-2017 at 11:10 PM.
 
2 members found this post helpful.
Old 09-03-2017, 03:52 AM   #3
pan64
LQ Guru
 
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 11,141

Rep: Reputation: 3329Reputation: 3329Reputation: 3329Reputation: 3329Reputation: 3329Reputation: 3329Reputation: 3329Reputation: 3329Reputation: 3329Reputation: 3329Reputation: 3329
oh yes, and you can use awk too (or perl/python/whatever) if you have a not-so-trivial condition to split.
https://is.gd/j4MZnR
 
Old 09-03-2017, 08:47 AM   #4
Sefyir
Member
 
Registered: Mar 2015
Distribution: Linux Mint
Posts: 579

Rep: Reputation: 267Reputation: 267Reputation: 267
split was my immediate thought and it works but I'm going to assume the OP wants the integrity of each word saved.
Examining the file below you'll notice it splits on [ok] and [4]. Not good if you have a wordlist.

Code:
$ cat foobar
ok1
ok2
ok3
ok4
ok5
ok6
ok7
Code:
$ split -n2 foobar
$ cat xaa; echo -e '\n\n'; cat xab
...
ok3
ok


4
ok5
...
To make this work, examine these parts of the -n --number=CHUNKS command
Code:
       N      split into N files based on size of input

       K/N    output Kth of N to stdout

       l/N    split into N files without splitting lines/records

       l/K/N  output Kth of N to stdout without splitting lines/records

       r/N    like 'l' but use round robin distribution

       r/K/N  likewise but only output Kth of N to stdout
There's another part, separating on another character besides the newline, however I could not get that to work

Code:
       -t, --separator=SEP
              use  SEP instead of newline as the record separator; '\0' (zero)
              specifies the NUL character
Code:
split -t '\n' foobar 
split: multi-character separator ‘\\n’
I'm not sure why the above does not work, or more specifically, how to specify a newline (or tab, etc)

Last edited by Sefyir; 09-03-2017 at 09:10 AM.
 
1 members found this post helpful.
Old 09-03-2017, 11:27 PM   #5
chrism01
LQ Guru
 
Registered: Aug 2004
Location: Sydney
Distribution: Centos 6.10, Centos 7.3
Posts: 17,537

Rep: Reputation: 2420Reputation: 2420Reputation: 2420Reputation: 2420Reputation: 2420Reputation: 2420Reputation: 2420Reputation: 2420Reputation: 2420Reputation: 2420Reputation: 2420
For really simple stuff eg just N lines, you could use head/tail, but split is more capable.
As above, for non-trivial rules/situations, use Perl (or similar).
 
Old 09-13-2017, 09:50 PM   #6
allan.sammar@vfemail.net
LQ Newbie
 
Registered: Sep 2017
Posts: 2

Original Poster
Rep: Reputation: Disabled
Smile

Quote:
Originally Posted by Sefyir View Post
split was my immediate thought and it works but I'm going to assume the OP wants the integrity of each word saved.
Examining the file below you'll notice it splits on [ok] and [4]. Not good if you have a wordlist.

Code:
$ cat foobar
ok1
ok2
ok3
ok4
ok5
ok6
ok7
Code:
$ split -n2 foobar
$ cat xaa; echo -e '\n\n'; cat xab
...
ok3
ok


4
ok5
...
To make this work, examine these parts of the -n --number=CHUNKS command
Code:
       N      split into N files based on size of input

       K/N    output Kth of N to stdout

       l/N    split into N files without splitting lines/records

       l/K/N  output Kth of N to stdout without splitting lines/records

       r/N    like 'l' but use round robin distribution

       r/K/N  likewise but only output Kth of N to stdout
There's another part, separating on another character besides the newline, however I could not get that to work

Code:
       -t, --separator=SEP
              use  SEP instead of newline as the record separator; '\0' (zero)
              specifies the NUL character
Code:
split -t '\n' foobar 
split: multi-character separator \\n
I'm not sure why the above does not work, or more specifically, how to specify a newline (or tab, etc)
thanks Sefyir that is help alot .. it does work fine ,, thanks again
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
LXer: How to split a large archive*file into multiple small files using Split command in Linux LXer Syndicated Linux News 0 11-07-2016 05:20 PM
split very large 200mb text file by every N lines (sed/awk fails) doug23 Programming 8 08-10-2009 06:08 PM
[quick] trying to split a large file but linux says it's to large steve51184 Linux - General 16 05-06-2008 07:40 AM
how do I split large file by string? khairil Programming 5 04-28-2008 10:37 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie

All times are GMT -5. The time now is 01:44 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration