LinuxQuestions.org
Welcome to the most active Linux Forum on the web.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Software
User Name
Password
Linux - Software This forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.

Notices


Reply
  Search this Thread
Old 05-22-2011, 03:44 AM   #1
Seregwethrin
Member
 
Registered: Feb 2008
Posts: 112

Rep: Reputation: 16
Post Best software to replace a string in a big text file?


Hi,

I have a big text file (350mb) and i need to replace some strings in it.

The VIM is too slow. Also I don't need to open file and view its contents. I just need to give the command to replace it for faster processing.

Which software would be the best for this purpose?

Thanks
 
Old 05-22-2011, 03:56 AM   #2
David the H.
Bash Guru
 
Registered: Jun 2004
Location: Osaka, Japan
Distribution: Arch + Xfce
Posts: 6,852

Rep: Reputation: 2025Reputation: 2025Reputation: 2025Reputation: 2025Reputation: 2025Reputation: 2025Reputation: 2025Reputation: 2025Reputation: 2025Reputation: 2025Reputation: 2025
It might help to showed us exactly what the format of the file is like, and what kind of strings you're trying to replace, so we could give you a clearer answer.

But in general, for simple replacement, sed is usually your best bet. The syntax is very similar to vim's replacement command, and it can edit files in place, although it does work through a background temp file to do so.

The basic command:
Code:
sed -i 's/string/replacement/' file
Sed only processes single lines by default though. If your text spans multiple lines, things get more complex.

A few useful sed references.
http://www.grymoire.com/Unix/Sed.html
http://sed.sourceforge.net/sedfaq.html
http://sed.sourceforge.net/sed1line.txt
 
1 members found this post helpful.
Old 05-22-2011, 04:04 AM   #3
Seregwethrin
Member
 
Registered: Feb 2008
Posts: 112

Original Poster
Rep: Reputation: 16
The format of text is Sql script dumped by mysqldump tool. In total and if all tables counted it has over 1 million rows.
And I want to replace some invalid unicode characters with valid ones due to database transfer with wrong charsets.

Do you think it is appropriet to use sed David?
 
Old 05-22-2011, 04:49 AM   #4
H_TeXMeX_H
LQ Guru
 
Registered: Oct 2005
Location: $RANDOM
Distribution: slackware64
Posts: 12,928
Blog Entries: 2

Rep: Reputation: 1292Reputation: 1292Reputation: 1292Reputation: 1292Reputation: 1292Reputation: 1292Reputation: 1292Reputation: 1292Reputation: 1292
For single character, global replacement I would use 'tr' instead.
 
Old 05-22-2011, 04:57 AM   #5
Seregwethrin
Member
 
Registered: Feb 2008
Posts: 112

Original Poster
Rep: Reputation: 16
But broken characters are generally shown as two weird characters.
 
Old 05-22-2011, 05:08 AM   #6
H_TeXMeX_H
LQ Guru
 
Registered: Oct 2005
Location: $RANDOM
Distribution: slackware64
Posts: 12,928
Blog Entries: 2

Rep: Reputation: 1292Reputation: 1292Reputation: 1292Reputation: 1292Reputation: 1292Reputation: 1292Reputation: 1292Reputation: 1292Reputation: 1292
It doesn't matter how they are displayed, as long as they are one character. Use the octal format, see 'man tr'.
 
Old 05-22-2011, 05:29 AM   #7
David the H.
Bash Guru
 
Registered: Jun 2004
Location: Osaka, Japan
Distribution: Arch + Xfce
Posts: 6,852

Rep: Reputation: 2025Reputation: 2025Reputation: 2025Reputation: 2025Reputation: 2025Reputation: 2025Reputation: 2025Reputation: 2025Reputation: 2025Reputation: 2025Reputation: 2025
What character encoding is mysqldump outputting it as? If you can figure that out, then you should be able to use iconv to convert it directly to utf-8. I'd guess it's most likely ISO_8859-1 or a variant, offhand.

Even better would be to see if mysqldump has the ability to change its output encoding directly to utf-8.
 
Old 05-22-2011, 05:51 AM   #8
David the H.
Bash Guru
 
Registered: Jun 2004
Location: Osaka, Japan
Distribution: Arch + Xfce
Posts: 6,852

Rep: Reputation: 2025Reputation: 2025Reputation: 2025Reputation: 2025Reputation: 2025Reputation: 2025Reputation: 2025Reputation: 2025Reputation: 2025Reputation: 2025Reputation: 2025
I looked up the mysqldump man page online. Under the option --default-character-set, it says this:

Quote:
Use charset_name as the default character set. See Section 9.2,
"The Character Set Used for Data and Sorting". If no character set
is specified, mysqldump uses utf8, and earlier versions use latin1.

This option has no effect for output data files produced by using
the --tab option. See the description for that option.
It says it outputs to utf-8 by default, so you shouldn't be seeing any problem unless you are using the --tab option. For that it says this:
Quote:
Column values are dumped using the binary character set and the
--default-character-set option is ignored. In effect, there is no
character set conversion. If a table contains columns in several
character sets, the output data file will as well and you may not
be able to reload the file correctly.
So it looks like the individual fields will be printed out in whatever character encoding they were originally created under. That means that if there were multiple input encodings, then there's probably no easy conversion solution. You'd have to use tr or sed or something and do them all individually. But if you can be sure that all the fields are stored in the same encoding, then it may be possible to filter the output through iconv.

There are a couple more charset options, but this is getting beyond my level of understanding. I'm not a database expert.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
How to replace a string in a text file jpan Linux - General 3 10-14-2012 07:17 PM
how do i replace a text string in a file with a random string? (with sed etc) steve51184 Linux - Software 16 09-02-2010 12:05 PM
Linux command to find and replace string within text file chips11 Linux - Newbie 5 11-24-2008 03:25 PM
How to replace string containing / in a text file tikit Linux - Newbie 4 09-05-2008 09:48 AM
replace a string/number in a text file jpan Linux - General 3 10-22-2004 10:33 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Software

All times are GMT -5. The time now is 08:55 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration