LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (http://www.linuxquestions.org/questions/programming-9/)
-   -   Language pairing of files (http://www.linuxquestions.org/questions/programming-9/language-pairing-of-files-4175442160/)

corfuitl 12-19-2012 10:13 AM

Language pairing of files
 
hi,

I have in a directory a couple of files in different languages with names like:

bb_aaaaaa-aaaa-en.html
bb_aaaaaa-aaaa-es.html
bb_aaaaaa-aaaa-de.html
vv_aaaaaa-aaaa-en.html
vv_aaaaaa-aaaa-es.html
vv_aaaaaa-aaaa-de.html
ff_aaaaaa-aaaa-en.html
ff_aaaaaa-aaaa-es.html
ff_aaaaaa-aaaa-de.html

I want to create a 2 columns list with for a language pair. For example for the en de language pair.

bb_aaaaaa-aaaa-en.html bb_aaaaaa-aaaa-de.html
vv_aaaaaa-aaaa-en.html vv_aaaaaa-aaaa-de.html
ff_aaaaaa-aaaa-en.html ff_aaaaaa-aaaa-de.html

I have a bash script but it doesn't work.

Code:

#!/bin/sh

L1=en
L2=de

cat list.txt | awk '/*__.*__'$L1'.*__'$L2'/{
  for (i=1; i <= NF; i++)
    if (index($i, "__'$L1'.") > 0) { printf("%s", $i); break; }
  for (  ; i <= NF; i++)
    if (index($i, "__'$L2'.") > 0) printf(" %s", $i);
  printf("\n"); }'

Could you help me please? Thank you in advance!

PTrenholme 12-19-2012 08:05 PM

Well, there are several problems with your code. For example:
  1. You are using cat to send your data to awk, but awk can do it's own input. The form awk '<code>' <input> is normally preferred to cat <input> | awk '<code>' since it avoids a unnecessary process creation and pipe.
  2. The RE you're using ('/*__.*__'$L1'.*__'$L2'/') does not match any line in your (sample) input file. I suspect you may have meant to use something like '/^[[:alpha:]_][[:alnum:]-_]*[_-]+'${L1}'[.]+.* +[[:alpha:]_][[:alnum:]-_]*[_-]+'${L2}'[.]*[[:alnum:]]*$'
  3. If you pasted your code into you post, perhaps the backslashes escaping your dots were stripped, but, if not, and you want a literal dot in your RE, the [.] form should work.
  4. The logic in you code assumes that $L1 will always proceed $L2 for any pair, but you don't check that your input file list is sorted, nor that $L1 sorts before $L2, nor do you check the value of LC_NAME or LC_COLLATE. (Check the locale command output.)

danielbmartin 12-19-2012 11:59 PM

Try this ...
Code:

join -j2 $InFile $InFile                                        \
|sed 's/\( .._\)\(.*\)\( .._\)\(.*\)/\1 \3 \1\2\3\4/'            \
|awk -F " " '{if ($1==$2 && $3!=$4) print $3 " " $4}'            \
|sed 's/\(.*\)\(\-..\.\)\(.*\)\(\-..\.\)\(.*\)/\1 \2 \3 \4 \5/'  \
|sort -k2,2                                                      \
|awk -F " " '{print $1$2$3 " " $4$5$6}'                          \
> $OutFile

Daniel B. Martin


All times are GMT -5. The time now is 09:43 PM.