If the 1th column of file f1 and file f2 is the same, then export those line with maximum string of 2nd column

weichanghe2000 · 04-26-2018, 12:04 PM

please help to write a awk command-line programs to achieve the following functions: Thank in advance.

requeset Description:

compare two files f1 and f2, export to file f3:

1 Delete duplicate rows of in file f1 and file f2

2 If the 1th column of file f1 and file f2 is the same, then export those line with maximum string of 2nd column.

for example:

Code:

  0.1-37    < 0.2-53;
  6.1.4-b.0 < 6.1.5-c.2;
  9.13.2    < 11.5.6; 
  18b-16    > 8c-7;
  D15       < F4;
  1.b5_a    < 1.b12_d
  4c5.8     < 4c12.8
  d18g      < d18j

3 Rule: For the 2nd column of 2 files:

> num of 0-9 consecutive occurrences may be different, such as 9.13.2 vs 11.5.6, D15 vs F4

> The type, order, num of other characters (such as '.' '_' '-' 'A-Z' 'a-Z') except 0-9 is the same.

like 6.1.4-b.0 vs 6.1.5-c.2, 1.b5_a vs 1.b12_d, D15 vs F4 ....

> if find the 1st large string after comparison, then stop comparing the 2nd column, and output this line of those file,

such as 'IO 1.b5_a' of f1, 'IO 1.b12_d' of f2, will output 'IO 1.b12_d'

4 cat f1:

Code:

PK      0.1-37 
Art     6.1.4-b.0 
Fle     9.13.2    
Uni     18b-16  
STD     D15  
IO      1.b5_a 
FPG     4c5.8
SRA     d18g
....
....

cat f2:

Code:

Uni     8c-7
IO      1.b12_d
Art     6.1.5-c.2
PK      0.2-53
Fle     11.5.6
SRA     d18j
STD     F4
FPG     4c12.8
....
....

desired file f3:

Code:

Art     6.1.5-c.2
Fle     11.5.6
IO      1.b12_d
PK      0.2-53
STD     F4
Uni     18b-16  
FPG     4c12.8
SRA     d18j
...
...

rtmistler · 04-26-2018, 12:42 PM

Quote:

Originally Posted by weichanghe2000

please help to write a awk command-line programs to achieve the following functions: Thank in advance.

requeset Description:

compare two files f1 and f2, export to file f3:

1 Delete duplicate rows of in file f1 and file f2

2 If the 1th column of file f1 and file f2 is the same, then export those line with maximum string of 2nd column.

for example:

Code:

  0.1-37    < 0.2-53;
  6.1.4-b.0 < 6.1.5-c.2;
  9.13.2    < 11.5.6;
  18b-16    > 8c-7;
  D15       < F4;
  1.b5_a    < 1.b12_d
  4c5.8     < 4c12.8
  d18g      < d18j

3 Rule: For the 2nd column of 2 files:

> num of 0-9 consecutive occurrences may be different, such as 9.13.2 vs 11.5.6, D15 vs F4

> The type, order, num of other characters (such as '.' '_' '-' 'A-Z' 'a-Z') except 0-9 is the same.

like 6.1.4-b.0 vs 6.1.5-c.2, 1.b5_a vs 1.b12_d, D15 vs F4 ....

> if find the 1st large string after comparison, then stop comparing the 2nd column, and output this line of those file,

such as 'IO 1.b5_a' of f1, 'IO 1.b12_d' of f2, will output 'IO 1.b12_d'

4 cat f1:

Code:

PK      0.1-37
Art     6.1.4-b.0
Fle     9.13.2    
Uni     18b-16  
STD     D15  
IO      1.b5_a
FPG     4c5.8
SRA     d18g
....
....

cat f2:

Code:

Uni     8c-7
IO      1.b12_d
Art     6.1.5-c.2
PK      0.2-53
Fle     11.5.6
SRA     d18j
STD     F4
FPG     4c12.8
....
....

desired file f3:

Code:

Art     6.1.5-c.2
Fle     11.5.6
IO      1.b12_d
PK      0.2-53
STD     F4
Uni     18b-16  
FPG     4c12.8
SRA     d18j
...
...

Hi,

Have you tried to solve this yourself?

Have you read the LQ Rules?

Are you aware of the rule which cites:

Quote:

Do not post homework assignments verbatim. We're happy to assist if you have specific questions or have hit a stumbling point, however. Let us know what you've already tried and what references you have used (including class notes, books, and searches) and we'll do our best to help. Keep in mind that your instructor might also be an LQ member.

While your question is stated in great detail, it appears to be an assignment of some type. This is not a problem really but it is recommended to not post it verbatim to be polite to the creators of a question or problem. Meanwhile LQ members like to know what you've tried to do to solve this so that they can understand your expertise and know where you are stuck.

If your answer is you're completely stuck, then how did you get to this point where you are trying to solve this problem, without proper background?