LinuxQuestions.org
Visit Jeremy's Blog.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 11-25-2012, 10:36 AM   #16
ASTRAPI
Member
 
Registered: Feb 2007
Posts: 210

Original Poster
Rep: Reputation: 16

@ntubski

Yes i need for every character even it is letter/number/special character...

So for first job i think the best is:

Code:
cat file1.txt file2.txt | sort | uniq > output.txt
(i don't care to view the content after the job is done.I just need to have the correct stuff inside )

For the second one:

Code:
egrep -v '^[[:.:]]{1,4}$' output2.txt > output3.txt
is teh above correct? [[:.:]]

or

Code:
awk 'length($0) >= 5' output2.txt > final_output.txt
?

Thank you

Last edited by ASTRAPI; 11-25-2012 at 10:39 AM.
 
Old 11-25-2012, 10:45 AM   #17
ntubski
Senior Member
 
Registered: Nov 2005
Distribution: Debian, Arch
Posts: 3,781

Rep: Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081
Use just "." not "[[:.:]]".

Any of

Code:
egrep -v '^.{1,4}$' output2.txt > output3.txt
egrep '^.{5,}$' output2.txt > output3.txt
awk 'length($0) >= 5' output2.txt > final_output.txt
should work fine. Although the first command lets blank lines through as well (you could change to {0,4} to fix that).
 
Old 11-25-2012, 03:07 PM   #18
ASTRAPI
Member
 
Registered: Feb 2007
Posts: 210

Original Poster
Rep: Reputation: 16
All seems to work great

Can i use this for many files?

Code:
cat file1.txt file2.txt file3.txt file4.txt file5.txt file6.txt | sort | uniq > output.txt
Is thare any system limit for big txt files or the possibility for timeout or something?

As i have a very fast pc with core i7 3770K and 16gb ram

Thank you
 
Old 11-25-2012, 03:57 PM   #19
ntubski
Senior Member
 
Registered: Nov 2005
Distribution: Debian, Arch
Posts: 3,781

Rep: Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081
Quote:
Originally Posted by ASTRAPI View Post
Can i use this for many files?
Yup.

Quote:
Is thare any system limit for big txt files or the possibility for timeout or something?
Nope, the only timeout will be your patience running out.
 
Old 11-25-2012, 09:39 PM   #20
shivaa
Senior Member
 
Registered: Jul 2012
Location: Grenoble, Fr.
Distribution: Sun Solaris, RHEL, Ubuntu, Debian 6.0
Posts: 1,800
Blog Entries: 4

Rep: Reputation: 286Reputation: 286Reputation: 286
One shot solution for both jobs:
Code:
awk 'length($0) >= 5' file1.txt file2.txt | sort -u > output.txt
 
Old 11-26-2012, 01:07 AM   #21
ASTRAPI
Member
 
Registered: Feb 2007
Posts: 210

Original Poster
Rep: Reputation: 16
ok all done great

Now i have one big file 8gb ....

How can i split it to 500mb parts or better to 1gb parts?

Thank you

Last edited by ASTRAPI; 11-26-2012 at 01:08 AM.
 
Old 11-26-2012, 02:22 AM   #22
shivaa
Senior Member
 
Registered: Jul 2012
Location: Grenoble, Fr.
Distribution: Sun Solaris, RHEL, Ubuntu, Debian 6.0
Posts: 1,800
Blog Entries: 4

Rep: Reputation: 286Reputation: 286Reputation: 286
Let's say you have file longfile.txt that has 100 lines, then you can split it's parts using head and tail commands, and can save that parts in seperate files.
For instance, for first 20 lines:
Code:
head -20 longfile.txt > output1.txt
For next 20 lines i.e. 21-40 lines:
Code:
cat longfile.txt | head -40 | tail -20 > output2.txt
cat longfile.txt | head -60 | tail -40 > output3.txt 
And so on...
To check how many lines that longfile.txt has, use:
Code:
cat longfile.txt | wc -l
Suppose it gives result as 500000, then you can use:
Code:
head -125000 longfile.txt > output1.txt
cat longfile.txt | head -250000 | tail -124999 > output2.txt
cat longfile.txt | head -375000 | tail -249999 > output3.txt
And so on for as many as parts you want to do..
Although you use head and tail commands with -c option to split data on bytes basis, but that would not be much convenient. So better try as said above.
Also read man pages of head and tail for better understanding.

Last edited by shivaa; 11-26-2012 at 02:30 AM.
 
Old 11-26-2012, 03:05 AM   #23
ASTRAPI
Member
 
Registered: Feb 2007
Posts: 210

Original Poster
Rep: Reputation: 16
It will be ok if i use:

Code:
cat longfile.txt | wc -l
result: 8.000.000

and then use:

Code:
split -l 1000000 longfile.txt new
It will create eight files new.txt newb.txt newc.txt newd.txt newe.txt newf.txt newg.txt newh.txt files with 1000000 lines inside each ?

Thank you
 
Old 11-26-2012, 03:20 AM   #24
shivaa
Senior Member
 
Registered: Jul 2012
Location: Grenoble, Fr.
Distribution: Sun Solaris, RHEL, Ubuntu, Debian 6.0
Posts: 1,800
Blog Entries: 4

Rep: Reputation: 286Reputation: 286Reputation: 286
Command split is also a good option, you can use it as:
Code:
split -n <number_of_parts> <filename>
For example, to split the file filename in 10 parts, do:
Code:
split -n 10 filename
It will then create 10 new files of equal sizes, named xaa, xab, xac... so on or whatever in current working directoy.

Last edited by shivaa; 11-26-2012 at 03:23 AM.
 
Old 11-26-2012, 03:41 AM   #25
ASTRAPI
Member
 
Registered: Feb 2007
Posts: 210

Original Poster
Rep: Reputation: 16
I just try:

Code:
split -n 4 longfile.txt
And i got this:

Code:
split -n 4 output2.txt
split: invalid option -- 'n'
Try `split --help' for more information.
It will be great to be able to split the file on equal size four parts...


Thank you

Last edited by ASTRAPI; 11-26-2012 at 05:34 AM.
 
Old 11-26-2012, 08:14 AM   #26
shivaa
Senior Member
 
Registered: Jul 2012
Location: Grenoble, Fr.
Distribution: Sun Solaris, RHEL, Ubuntu, Debian 6.0
Posts: 1,800
Blog Entries: 4

Rep: Reputation: 286Reputation: 286Reputation: 286
Option -n is available in Ubuntu only (I am not sure in other Linux flavors), but perhaps in your case it's not availale. Then you should better use:
Code:
split -l 1000000 longfile.txt new    ## To create 8 new files named newa, newb, newc...
split -l 2000000 longfile.txt new    ## To create 4 new files named newa, newb, newc...
Note: File sizes will be equal in this case also.
Or if you want to split that file on size basis, then check size of file (I assume here it is 8GB), calculate it's 8th part, convert that into kbs (size in MB x 1024) and split:
Code:
du -sh longfile.txt
8192M
split -b 1048578 longfile.txt new   ## To create 8 new files named newa, newb, newc...
split -b 2048578 longfile.txt new   ## To create 4 new files named newa, newb, newc...
 
Old 11-26-2012, 10:15 AM   #27
schneidz
LQ Guru
 
Registered: May 2005
Location: boston, usa
Distribution: fedora-35
Posts: 5,313

Rep: Reputation: 918Reputation: 918Reputation: 918Reputation: 918Reputation: 918Reputation: 918Reputation: 918Reputation: 918
Quote:
Originally Posted by ASTRAPI View Post
@ntubski

Yes i need for every character even it is letter/number/special character...

So for first job i think the best is:

Code:
cat file1.txt file2.txt | sort | uniq > output.txt
(i don't care to view the content after the job is done.I just need to have the correct stuff inside )

For the second one:

Code:
egrep -v '^[[:.:]]{1,4}$' output2.txt > output3.txt
is teh above correct? [[:.:]]

or

Code:
awk 'length($0) >= 5' output2.txt > final_output.txt
?

Thank you
Quote:
Originally Posted by ASTRAPI View Post
All seems to work great

Can i use this for many files?

Code:
cat file1.txt file2.txt file3.txt file4.txt file5.txt file6.txt | sort | uniq > output.txt
Is thare any system limit for big txt files or the possibility for timeout or something?

As i have a very fast pc with core i7 3770K and 16gb ram

Thank you

why dont you just try stuff to find out if it works to your satisfaction ?

Last edited by schneidz; 11-26-2012 at 10:17 AM.
 
1 members found this post helpful.
Old 11-26-2012, 12:30 PM   #28
ASTRAPI
Member
 
Registered: Feb 2007
Posts: 210

Original Poster
Rep: Reputation: 16
All working great

Is it safe to rename the newaa file to newaa.txt ?

Thank you
 
Old 11-26-2012, 08:11 PM   #29
shivaa
Senior Member
 
Registered: Jul 2012
Location: Grenoble, Fr.
Distribution: Sun Solaris, RHEL, Ubuntu, Debian 6.0
Posts: 1,800
Blog Entries: 4

Rep: Reputation: 286Reputation: 286Reputation: 286
As you wish, there is no problem in renaming. :-)

BTW, if there's no more issue. please mark the thread as solved from the top right side of the page, under Thread Tools option.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] remove directories that only contain .txt and .log files? jmark Linux - Newbie 13 08-09-2010 03:55 AM
How do I create words.db from words.txt using gdbm? kline General 8 12-14-2008 08:48 PM
changing words in txt files Raakh Linux - Newbie 3 11-15-2007 02:40 AM
Remove files w/ extension txt recursively spiri Linux - General 4 12-14-2005 03:52 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie

All times are GMT -5. The time now is 09:49 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration