LinuxQuestions.org
Visit the LQ Articles and Editorials section
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices

Reply
 
Search this Thread
Old 02-03-2009, 10:21 AM   #1
Niels Olson
LQ Newbie
 
Registered: Apr 2007
Posts: 9

Rep: Reputation: 0
scripting: how to change markdown links to wikitext links?


Hello,

I have a personal wiki of notes, with now thousands of links in markdown format:

[link text](http://example.com)

but now that fckeditor is available for mediawiki (very beta), it has become much better to just stick with wikitext format. There are only a few conversions to do: tables, links, and bulleted lists. The lists are a fairly simple regex and fckeditor magically reformats the tables, so all I'm left with is the links. But I'm not a regex master. How do I reformat

[link text](http://example.com)

to this

[http://example.com link text]

The steps, in no particular order are

* delete ")"
* add a space after "example.com" ==> "example.com "
* move "link text](" to the end
* delete "("

any suggestions would be greatly appreciated.
 
Old 02-03-2009, 10:37 AM   #2
colucix
Moderator
 
Registered: Sep 2003
Location: Bologna
Distribution: CentOS 6.5 OpenSuSE 12.3
Posts: 10,488

Rep: Reputation: 1956Reputation: 1956Reputation: 1956Reputation: 1956Reputation: 1956Reputation: 1956Reputation: 1956Reputation: 1956Reputation: 1956Reputation: 1956Reputation: 1956
You can try the following sed command:
Code:
sed -i.bck 's/\[link text\](\(.*\))/\[\1 link text\]/g' file
the -i.bck will save a backup copy of the original file using .bck as extension, before editing the file.
 
Old 02-03-2009, 01:25 PM   #3
Niels Olson
LQ Newbie
 
Registered: Apr 2007
Posts: 9

Original Poster
Rep: Reputation: 0
Awesome, thanks so much. That is a really nice way to do it. Now I just need to figure out how do identify the stuff in between [] and, I guess, make that a variable in place of "link text" in that sed command. Is this best done in a particular language? I really wish I understood how to do this kind of very simple thing . . .


Code:
** [Varicella zoster](http://en.wikipedia.org/wiki/Varicella_zoster_vaccine)
** [Intranasal influenza](http://en.wikipedia.org/wiki/Intranasal_influenza_vaccine)
** [Oral polio (Sabin's)](http://en.wikipedia.org/wiki/Sabin_vaccine)
** [Yellow fever](http://en.wikipedia.org/wiki/Yellow_fever_vaccine)
** [Rotavirus](http://en.wikipedia.org/wiki/Rotavirus_vaccine)
** [Smallpox](http://en.wikipedia.org/wiki/Smallpox_vaccine)
 
Old 02-03-2009, 01:41 PM   #4
theNbomr
LQ 5k Club
 
Registered: Aug 2005
Distribution: OpenSuse, Fedora, Redhat, Debian
Posts: 5,395
Blog Entries: 2

Rep: Reputation: 903Reputation: 903Reputation: 903Reputation: 903Reputation: 903Reputation: 903Reputation: 903Reputation: 903
This perl code should perform the conversion, even if the tagged data is embedded in the middle of a line of text, as I expect would be the norm for your application.
Code:
#! /usr/bin/perl -w
use strict;
    while(<>){
        if( $_ =~ m/\[(.+)\]\s*\((.+)\)/ ){
            print $`."[".$2." ".$1."]".$';
        }
        else{
            print $_;
        }
    }
Run it with the name of the input file as an argument.

--- rod.
 
Old 02-03-2009, 01:55 PM   #5
Niels Olson
LQ Newbie
 
Registered: Apr 2007
Posts: 9

Original Poster
Rep: Reputation: 0
You know, I thought it would come down to Perl. I have been whittling away at simple Python programs and hacking PHP and CSS on a couple of server projects, avoiding perl like the plague, but you just convinced me. I need to go buy a Perl book. Because you just saved me so much time I could literally drag myself to the bookstore with my tongue and figure out how that script works while reading upside down and backwards and that little script would still save me an order of magnitude more time. What timezones are you guys in, so that I may bow in your direction?
 
Old 02-03-2009, 01:58 PM   #6
colucix
Moderator
 
Registered: Sep 2003
Location: Bologna
Distribution: CentOS 6.5 OpenSuSE 12.3
Posts: 10,488

Rep: Reputation: 1956Reputation: 1956Reputation: 1956Reputation: 1956Reputation: 1956Reputation: 1956Reputation: 1956Reputation: 1956Reputation: 1956Reputation: 1956Reputation: 1956
Maybe this:
Code:
sed -i.bck 's/\[\(.*\)\](\(.*\))/\[\2 \1\]/g' file
to record a part of the matching text, just embed it within escaped parentheses in the regular expression. Then use \1 for the first recorded text, \2 for the second one and so on. Here is a colorful version of the command above to distinguish the two recorded patterns:
Code:
s/\[\(.*\)\](\(.*\))/\[\2 \1\]/g
The other relevant parts of this sed expression are the escaped square brackets, otherwise sed interprets them as character lists. Finally the .* symbol matches any sequence of characters.

Here is a good tutorial about sed programming, if you want to deepen a bit more.
 
Old 02-03-2009, 05:12 PM   #7
theNbomr
LQ 5k Club
 
Registered: Aug 2005
Distribution: OpenSuse, Fedora, Redhat, Debian
Posts: 5,395
Blog Entries: 2

Rep: Reputation: 903Reputation: 903Reputation: 903Reputation: 903Reputation: 903Reputation: 903Reputation: 903Reputation: 903
Quote:
Originally Posted by Niels Olson View Post
You know, I thought it would come down to Perl. I have been whittling away at simple Python programs and hacking PHP and CSS on a couple of server projects, avoiding perl like the plague, but you just convinced me. I need to go buy a Perl book. Because you just saved me so much time I could literally drag myself to the bookstore with my tongue and figure out how that script works while reading upside down and backwards and that little script would still save me an order of magnitude more time. What timezones are you guys in, so that I may bow in your direction?
Ah shucks. Actually, colucix did it best with sed (assuming it actually works). Python and PHP are definitely not best for this job. Awk would also be high on the list. Perl is just like sed with a programming language built around it, and since I like to write, it feels right for me. I don't know about all those colors in the sed code, though; kinda makes me dizzy along with all of those backslashes.
--- rod.

Last edited by theNbomr; 02-03-2009 at 05:14 PM.
 
Old 02-03-2009, 06:09 PM   #8
Niels Olson
LQ Newbie
 
Registered: Apr 2007
Posts: 9

Original Poster
Rep: Reputation: 0
Okay, I tried sed for my grand markdown-2-wikitext translator. I actually got a good sed command for the lists, I thought, before I posted, so I thought I'd try to go all sed. (That and Larry Wall's "Programming Perl" seems to require more than an hour or two to grok). Anyway, here's my script, which fails miserably, and I'm trying to run it with this command:

niels@school$ sh md2wt.sh immunizations_md

md2wt.sh:
Code:
#! /bin/bash

# First, headlines. Every # should be replaced with surrounding = =
# that is, #Title becomes =Title= and ##Subtitle becomes ==Subtitle==
# This also needs a while loop.

sed 's/\#\(.*\)/\=\1\=/g' |

# Second, top level lists, which, in markdown, are numerical. But, and 
# here's a nasty but, sometimes, if 2. immediately follows 1., then 
# fckeditor also strips the newline, so 2. is on the same line. I need 
# to put the newline back in also. fckeditor will take out redundant 
# lines later.

sed 's/[0-9]\.\ /\n\*/g' | /bin/echo |

# Third, the deeper lists. Markdown uses a sort of pythony 4 spaces for
# each level of indention, but wikitext just uses another * for each
# level, so "    *" becomes "**" and "        *" becomes "***"
# this also needs a while loop

sed 's/\ \ \ \ \*/\*\*/g' |

# Finally, the grand poo bah, the links

sed 's/\[\(.*\)\](\(.*\))/\[\2 \1\]/g'
Where did I go wrong?

Edit: I added examples of input (immunizations_md.txt) and desired output (immunizations_perfect.txt).

Edit2: added comments that I need a while loop in here.

Edit3: went looking for how to parse the perl. Regarding =~, perl.org says "everyone knows how =~ and =! work" (http://dev.perl.org/perl6/rfc/164.html). Um, I don't know.
Attached Files
File Type: txt immunizations_perfect.txt (1.4 KB, 4 views)
File Type: txt immunizations_md.txt (1.5 KB, 3 views)

Last edited by Niels Olson; 02-03-2009 at 06:58 PM. Reason: added file attachment
 
Old 02-03-2009, 07:00 PM   #9
theNbomr
LQ 5k Club
 
Registered: Aug 2005
Distribution: OpenSuse, Fedora, Redhat, Debian
Posts: 5,395
Blog Entries: 2

Rep: Reputation: 903Reputation: 903Reputation: 903Reputation: 903Reputation: 903Reputation: 903Reputation: 903Reputation: 903
Okay, since you probably wanted a one-liner, and since I got way too wordy on my first effort:
Code:
perl -e 'while(<>){ $_ =~ s/\[(.+)\]\s*\((.+)\)/[$2 $1]/g; print;}'  filename.whatever
I still think that's more readable than the sed version (leaves out a few tilty sticks)...
--- rod.

Last edited by theNbomr; 02-03-2009 at 07:01 PM.
 
Old 02-03-2009, 07:18 PM   #10
Niels Olson
LQ Newbie
 
Registered: Apr 2007
Posts: 9

Original Poster
Rep: Reputation: 0
Don't worry, I have a degree in physics. I mean . . . don't take that the wrong way, but I feel comfortable with nested functions and multiple lines, and I the general idea of thinks like perl leaves you to use whitespace as you like. If I could grok the perl *generally* I would probably prefer to do it that way. In fact, that is really my more general goal. I'm comfortable with the sed function (right now), and I'd really rather get better at parsing a more general language, and, since I tend to do more sysadmin than anything, perl seems to be a logical choice. I've hacked other people's perl scripts, for cronjobs, rsync, etc, just haven't written my own script to solve my own problem, hence, in part, taking advantage of this real world exercise.

For instance, what is the s* for? Is that "second argument" or "new stanza" or what? and the =~, is that for "approximately equal", and if so, approximately equal to what?

And how would I introduce the other three functions in my "md2wt" translator (above) if I rewrote it in perl? Would I need to nest the functions inside your first while function (and then nest additional while functions inside)? How does one syntactically do that? What's "$_" and what's up with the parens and angle brackets for the while loop?

Last edited by Niels Olson; 02-03-2009 at 07:42 PM.
 
Old 02-03-2009, 07:54 PM   #11
colucix
Moderator
 
Registered: Sep 2003
Location: Bologna
Distribution: CentOS 6.5 OpenSuSE 12.3
Posts: 10,488

Rep: Reputation: 1956Reputation: 1956Reputation: 1956Reputation: 1956Reputation: 1956Reputation: 1956Reputation: 1956Reputation: 1956Reputation: 1956Reputation: 1956Reputation: 1956
Well, here is a slightly modified version of your script. Does it work as expected?
Code:
#!/bin/bash
#
# Repeat the sequence of hashes at the end of the line
#
sed 's/^\(#*\)\(.*\)/\1\2\1/g' immunizations_md.txt |
#
# Substitute all the hashes with equal signs
#
sed 's/#/=/g' |
#
# Substitute numbers with newline + asterisk
#
sed 's/[0-9]\.\ /\*/g' |
#
# First take care of the inner item in the list
#
sed 's/         \*/\*\*\*/g' |
#
# Second take care of the rest
#
sed 's/    \*/\*\*/g' |
#
# Change the links
#
sed 's/\[\(.*\)\](\(.*\))/\[\2 \1\]/g'
 
Old 02-03-2009, 10:26 PM   #12
Niels Olson
LQ Newbie
 
Registered: Apr 2007
Posts: 9

Original Poster
Rep: Reputation: 0
Works like a champ. Awesome. One note, I had a typo in the sample text, so there should be only spaces in groups of 4. I apparently had 9 in there. Sorry.

here's what I've got right now
Code:
#!/bin/bash
#
# Repeat the sequence of hashes at the end of the line
#
sed 's/^\(#*\)\(.*\)/\1\2\1/g' md2wt.pad |
#
# Substitute all the hashes with equal signs
#
sed 's/#/=/g' |
#
# Substitute numbers with newline + asterisk
#
sed 's/[0-9]\.\ /\*/g' |
#
# This really seems like it needs some recursion, doesn't it?
#
sed 's/                        \*/\*\*\*\*\*\*\*/g' |
sed 's/                    \*/\*\*\*\*\*\*/g' |
sed 's/                \*/\*\*\*\*\*/g' |
sed 's/            \*/\*\*\*\*/g' |
sed 's/        \*/\*\*\*/g' |
sed 's/    \*/\*\*/g' |
#
# Change the links
#
sed 's/\[\(.*\)\](\(.*\))/\[\2 \1\]/g'
One thing this has highlighted is that there really are some newlines that I need to clean up and I'm not clear yet whether they were manually put in by me, or if it's an effect of fckeditor's parsing.
 
Old 02-04-2009, 10:17 AM   #13
theNbomr
LQ 5k Club
 
Registered: Aug 2005
Distribution: OpenSuse, Fedora, Redhat, Debian
Posts: 5,395
Blog Entries: 2

Rep: Reputation: 903Reputation: 903Reputation: 903Reputation: 903Reputation: 903Reputation: 903Reputation: 903Reputation: 903
Quote:
Originally Posted by Niels Olson View Post
For instance, what is the s* for? Is that "second argument" or "new stanza" or what? and the =~, is that for "approximately equal", and if so, approximately equal to what?
In perl regular expressions, '\s' is shorthand for 'whitespace'. The '*' modifier says 'zero or more of the previous token'. The net effect is that it allows zero or more whitespace characters between the two parts of the original parsed text. As a general practice, I like to put these into my regex's to make them more general. Some syntaxes allow whitespace in various places, and when allowed, whitespace is not always applied consistently. I don't think sed has the shorthand notations that perl has, and I know it doesn't have a lot of the extended regular expression syntax that perl has. I do find the perl shorthand notations handy, but the extended regex's are not often used, although I'm sure there are times when I could have used them to good effect if I had the notation in my head, instead of having to look them up. I suppose that's kind of where you're at with the whole language and regular expression thing.
--- rod.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
technical question about hard links v soft links 7stud Linux - Newbie 2 02-23-2007 06:57 PM
links (hard links and soft links..) sachitha Programming 1 08-10-2005 12:10 PM
Sym links and hard links akudewan Linux - Newbie 4 02-09-2005 05:08 AM
how to change relitive links to specific links? wolfe2554 Linux - General 4 07-03-2004 12:40 AM
Links Section updated - Please add your Linux links. jeremy Linux - General 2 11-24-2001 11:35 AM


All times are GMT -5. The time now is 05:15 PM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration