Linux - NewbieThis Linux forum is for members that are new to Linux.
Just starting out and have a question?
If it is not in the man pages or the how-to's this is the place!
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
Hi, I'm not really new to Linux but usually use GUIs and am gradually coming out of my shell (pun intended). I'm trying to use SED to insert some semicolons into a vocabulary file that's got some Japanese and English words in it. The file is in the following format:
(The first has 2 semicolons, the second has an additional one after
the "Kanacharacters")
Here's the real deal:
無料: むりょう free; no charge;
酒: さけ alcohol; sake;
Would become:
無料: むりょう; free; no charge;
酒: さけ; alcohol; sake;
I've tried numerous combinations of regexs and am getting nowhere. Anyone out there have an idea?
One possibility might be to put a semicolon before the second space in each line (after ensuring that there are no double spaces in the file). This ought to be simple but I'm just not getting it...sigh.
Okay, that was absurdly simple. Thank you so much...
However, having run this I now see that it's not quite as simple after all. There's an additional wrinkle to the equation, which is that some of these lines have an additional part of speech enclosed within parentheses and I'd ideally like the semicolon to come _after_ the ")" if indeed the line _has_ one. For example:
And the result would end up as:
Kanjicharacters: Kanacharacters (n, vs); Englishdef1; Englishdef2;
Kanjicharacters: Kanacharacters; Englishdef1; Englishdef2;
Kanjicharacters: Kanacharacters; Englishdef1; Englishdef2;
Kanjicharacters: Kanacharacters (n); Englishdef1; Englishdef2;
I _did_ manage to write something that will put the semicolon _into_ the ones with parens in the right place, but if I then use the line above it will give the undesired result of:
I need essentially to add a "only add a semicolon before the second space if the line doesn't have anything with parenthesis right after it" (because sometimes the definitions also have parentheses).
You could simply follow with a search and replace on "; (" to make it " (" on the current result of your modded approach.
If you have perl available, it is very fast for this sort of purpose. You can run it from the command line - if your file name is kanji.txt the following will do it:
Code:
perl -pi -e 's/; \(/ \(/ig' kanji.txt
Quote:
Originally Posted by goemon
Okay, that was absurdly simple. Thank you so much...
However, having run this I now see that it's not quite as simple after all. There's an additional wrinkle to the equation, which is that some of these lines have an additional part of speech enclosed within parentheses and I'd ideally like the semicolon to come _after_ the ")" if indeed the line _has_ one. For example:
And the result would end up as:
Kanjicharacters: Kanacharacters (n, vs); Englishdef1; Englishdef2;
Kanjicharacters: Kanacharacters; Englishdef1; Englishdef2;
Kanjicharacters: Kanacharacters; Englishdef1; Englishdef2;
Kanjicharacters: Kanacharacters (n); Englishdef1; Englishdef2;
I _did_ manage to write something that will put the semicolon _into_ the ones with parens in the right place, but if I then use the line above it will give the undesired result of:
I need essentially to add a "only add a semicolon before the second space if the line doesn't have anything with parenthesis right after it" (because sometimes the definitions also have parentheses).
Hmmm, the hardest part seems to be developing the eye for creating simpler searches. That helped a lot; however the final additional wrinkle was that I needed it to only remove a semicolon if it was before an open parens with a v, n, or a right after it. (For example, (adv), (n), (v), and so on)
I think I might have gotten a really clunky way...inserting semicolons, then tagging ones that are incorrect with an '@' symbol, and then removing anything that has a '@;'. Here's my current complete script (I'm sure there's easier ways to do this with Perl and so on, but it took me so long to grok SED that I'm trying to stick with something I know):
for file
do
echo $file
mv $file $$.tempfile
sed 's/ /; /2
s/ /; /3
s/; ([vna]/@&/g
s/@;//g
s/;;/;/g
s/; ;/; /g
s/:/;/g' $$.tempfile > $file
done
rm $$.tempfile
There's probably a cleaner, simpler way to do this, but for now I guess I'm in good shape. If you have time to clean it up or simplify it, great...but otherwise, I think I'm all set for now.
With SED, what would the syntax be for "the first block of text before a space"? It'd be great to identify and repeat the first block of text if, and only if, there was only one Japanese-encoded block. Some lines do not have a Kanjicharacter field, just a lone Kanacharacter field, and thus I need to manually repeat it so that I have a properly fleshed out line.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.