LinuxQuestions.org
Latest LQ Deal: Latest LQ Deals
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 09-08-2015, 11:47 AM   #16
Rozak
LQ Newbie
 
Registered: Sep 2015
Posts: 23

Original Poster
Rep: Reputation: Disabled

Quote:
Originally Posted by grail View Post
and this one:
Code:
001 100 020
002 111 025
001 100 001
003 100 111
and if you had of left the first column in on previous example i would expect quite a different result.
Code:
 
001 100 020
002 111 025
003     001
        111
and for the previous example for the first column I only would have had one row with 080 .

Last edited by Rozak; 09-08-2015 at 11:55 AM.
 
Old 09-08-2015, 12:25 PM   #17
NevemTeve
Senior Member
 
Registered: Oct 2011
Location: Budapest
Distribution: Debian/GNU/Linux, AIX
Posts: 4,863
Blog Entries: 1

Rep: Reputation: 1869Reputation: 1869Reputation: 1869Reputation: 1869Reputation: 1869Reputation: 1869Reputation: 1869Reputation: 1869Reputation: 1869Reputation: 1869Reputation: 1869
> please ignore the first column
Not possible. Let's assume this:

Code:
080 001 100 020
    002 111 025
    003
Well, pick up a programming language (script or otherwise, but 64-bit capable), and do the following:

1. read everything into memory; arrange data column-wize
2. for every column: sort and drop duplicates
3. create the output
 
Old 09-08-2015, 12:31 PM   #18
Rozak
LQ Newbie
 
Registered: Sep 2015
Posts: 23

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by NevemTeve View Post
> please ignore the first column
Not possible. Let's assume this:

Code:
080 001 100 020
    002 111 025
    003
Well, pick up a programming language (script or otherwise, but 64-bit capable), and do the following:

1. read everything into memory; arrange data column-wize
2. for every column: sort and drop duplicates
3. create the output
THank you, But how? i am just beginner. I do not know how. I used this command in linux :
awk < input.txt '{print $1}' | sort | uniq > ouput.txt
but it gave me the answer only for the first column. I am wondering how to change this command to have the answers for all columns at the same time.
can you guid me please?

Last edited by Rozak; 09-08-2015 at 12:45 PM.
 
Old 09-08-2015, 12:47 PM   #19
NevemTeve
Senior Member
 
Registered: Oct 2011
Location: Budapest
Distribution: Debian/GNU/Linux, AIX
Posts: 4,863
Blog Entries: 1

Rep: Reputation: 1869Reputation: 1869Reputation: 1869Reputation: 1869Reputation: 1869Reputation: 1869Reputation: 1869Reputation: 1869Reputation: 1869Reputation: 1869Reputation: 1869
Sorry, but this is a task for a programmer. Nobody can turn you into a programmer via forum-posts.
 
1 members found this post helpful.
Old 09-08-2015, 01:03 PM   #20
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,007

Rep: Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192
Quote:
Originally Posted by Rozak View Post
Code:
 
001 100 020
002 111 025
003     001
        111
and for the previous example for the first column I only would have had one row with 080 .
Would you be able to explain how you got this output?

You have moved 003 from the 4th row up to the 3rd
Left a gap in column 2 / row 3 which you have not done previously
And left 2 gaps in the 4th row

You have previously mentioned that not all rows may have the same number of columns, but your now saying you also do not wish to lose any columns ... is this correct?
Or in another way, if a row starts with 4 columns it will always have 4 columns but some may now be empty?

You say that you would have only one row with 080 ... so are you now saying that the first column will never be repeated?
Are there other rows that this may also occur with?

as you can see, this is not an easy problem and is only made harder with the more information you omit.
 
Old 09-08-2015, 01:14 PM   #21
Rozak
LQ Newbie
 
Registered: Sep 2015
Posts: 23

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by grail View Post
Would you be able to explain how you got this output?

You have moved 003 from the 4th row up to the 3rd
Left a gap in column 2 / row 3 which you have not done previously
And left 2 gaps in the 4th row

You have previously mentioned that not all rows may have the same number of columns, but your now saying you also do not wish to lose any columns ... is this correct?
Or in another way, if a row starts with 4 columns it will always have 4 columns but some may now be empty?

You say that you would have only one row with 080 ... so are you now saying that the first column will never be repeated?
Are there other rows that this may also occur with?

as you can see, this is not an easy problem and is only made harder with the more information you omit.
I just want to extract uniqe values within each column which means if a valuse repeat 3 more that one time in a columns I want to see it only one time in my new data.

Last edited by Rozak; 09-08-2015 at 01:16 PM.
 
Old 09-08-2015, 02:46 PM   #22
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Mint 17.3
Posts: 1,881

Rep: Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660
With this InFile ...
Code:
123 000 111
232 123 123
123 123 123
123 000 123
... this awk ...
Code:
awk '{for (j=1;j<=NF;j++) !a[j","$j]++?b[j]=b[j]"\n"$j:0;}
  END{for (j=1;j<=NF;j++) print "\nIn column",j,
 "The unique values are:"b[j]}' $InFile >$OutFile
... produced this OutFile ...
Code:
In column 1 The unique values are:
123
232

In column 2 The unique values are:
000
123

In column 3 The unique values are:
111
123
Daniel B. Martin
 
Old 09-08-2015, 03:23 PM   #23
Rozak
LQ Newbie
 
Registered: Sep 2015
Posts: 23

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by danielbmartin View Post
With this InFile ...
Code:
123 000 111
232 123 123
123 123 123
123 000 123
... this awk ...
Code:
awk '{for (j=1;j<=NF;j++) !a[j","$j]++?b[j]=b[j]"\n"$j:0;}
  END{for (j=1;j<=NF;j++) print "\nIn column",j,
 "The unique values are:"b[j]}' $InFile >$OutFile
... produced this OutFile ...
Code:
In column 1 The unique values are:
123
232

In column 2 The unique values are:
000
123

In column 3 The unique values are:
111
123
Daniel B. Martin
Thank you! but i want my new data file have the same number of columns like the original one (exactly same structure but without duplication). my mean is I want it to be like this:
new file:
Code:
123 000 111
232 123 123
I can also send you a part of my original data if you would like to see.
 
Old 09-08-2015, 04:49 PM   #24
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Mint 17.3
Posts: 1,881

Rep: Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660
With this InFile ...
Code:
123 000 111
232 123 123
123 123 123
123 000 123
... this awk code ...
Code:
 awk '{for (j=1;j<=NF;j++) !a[j","$j]++?b[j]=b[j]" "$j:0;}
   END{for (j=1;j<=NF;j++) print b[j]}' $InFile   \
|awk '{for (j=1;j<=NF;j++) a[j]=a[j]" "$j} 
  END {j=1; while (j in a) {print a[j];j++}}' >$OutFile
... produced this OutFile ...
Code:
 123 000 111
 232 123 123
Daniel B. Martin

Last edited by danielbmartin; 09-08-2015 at 04:54 PM. Reason: Cosmetic improvement, no code change
 
Old 09-08-2015, 05:07 PM   #25
Rozak
LQ Newbie
 
Registered: Sep 2015
Posts: 23

Original Poster
Rep: Reputation: Disabled
Unhappy

Quote:
Originally Posted by danielbmartin View Post
With this InFile ...
Code:
123 000 111
232 123 123
123 123 123
123 000 123
... this awk code ...
Code:
 awk '{for (j=1;j<=NF;j++) !a[j","$j]++?b[j]=b[j]" "$j:0;}
   END{for (j=1;j<=NF;j++) print b[j]}' $InFile   \
|awk '{for (j=1;j<=NF;j++) a[j]=a[j]" "$j} 
  END {j=1; while (j in a) {print a[j];j++}}' >$OutFile
... produced this OutFile ...
Code:
 123 000 111
 232 123 123
Daniel B. Martin
for this code i got this error when I tried to run it in linux:
awk: fatal: cannot open file ` ' for reading (No such file or directory)
while I am in correct directory and I changed my input file name into hap.txt
could you help me by solving the problem?
 
Old 09-08-2015, 05:23 PM   #26
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Mint 17.3
Posts: 1,881

Rep: Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660
Quote:
Originally Posted by Rozak View Post
could you help me by solving the problem?
$InFile is the symbolic name for the input file.
$OutFile is the symbolic name for the output file.

This is the way my code reads ...
Code:
# File identification
   Path=${0%.*}
 InFile=$Path"inp.txt"
OutFile=$Path"out.txt"
... but that won't work on your computer because I don't know the names of your input and output files.

My preference is to have the program and data files in the same directory. Many people follow a different convention. Regardless of this distinction the awk code should work if you correctly identify the files.

On my machine the program is named dbm1484.bin; the InFile is dbm1484inp.txt; the OutFile is dbm1484out.txt.

Suggestion: get help from someone at your location.

Daniel B. Martin

Last edited by danielbmartin; 09-08-2015 at 05:25 PM. Reason: Cosmetic improvements
 
Old 09-08-2015, 05:29 PM   #27
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Mint 17.3
Posts: 1,881

Rep: Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660
For what it's worth, this is my program in its entirety.
Code:
#!/bin/bash   Daniel B. Martin   Sep15
#
# To execute this program, launch a terminal session and enter:
#  bash /home/daniel/Desktop/LQfiles/dbm1484.bin

# This program inspired by ...
#  http://www.linuxquestions.org/questions/programming-9/
#    how-to-find-unique-characters-within-each-column-in-a-txt-file-in-linux-4175552929/

# Keywords: unique within column; ternary operator; transpose matrix

# File identification
   Path=${0%%.*}
 InFile=$Path"inp.txt"
OutFile=$Path"out.txt"

echo; echo "Method #1 of LQ Member danielbmartin."
awk '{for (j=1;j<=NF;j++) !a[j","$j]++?b[j]=b[j]"\n"$j:0;}
  END{for (j=1;j<=NF;j++) print "\nIn column",j,
 "the unique values are:"b[j]}' $InFile >$OutFile
echo "InFile ...";  cat $InFile;  echo "End Of File ("$(wc -l <$InFile)"  lines)"
echo "OutFile ..."; cat $OutFile; echo "End Of File ("$(wc -l <$OutFile)" lines)"

echo; echo "Method #2 of LQ Member danielbmartin."
 awk '{for (j=1;j<=NF;j++) !a[j","$j]++?b[j]=b[j]" "$j:0;}
   END{for (j=1;j<=NF;j++) print b[j]}' $InFile   \
|awk '{for (j=1;j<=NF;j++) a[j]=a[j]" "$j} 
  END {j=1; while (j in a) {print a[j];j++}}' >$OutFile
echo "InFile ...";  cat $InFile;  echo "End Of File ("$(wc -l <$InFile)"  lines)"
echo "OutFile ..."; cat $OutFile; echo "End Of File ("$(wc -l <$OutFile)" lines)"

echo; echo "Normal end of job."; echo; exit
Daniel B. Martin
 
Old 09-08-2015, 06:29 PM   #28
Rozak
LQ Newbie
 
Registered: Sep 2015
Posts: 23

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by danielbmartin View Post
$InFile is the symbolic name for the input file.
$OutFile is the symbolic name for the output file.

This is the way my code reads ...
Code:
# File identification
   Path=${0%.*}
 InFile=$Path"inp.txt"
OutFile=$Path"out.txt"
... but that won't work on your computer because I don't know the names of your input and output files.

My preference is to have the program and data files in the same directory. Many people follow a different convention. Regardless of this distinction the awk code should work if you correctly identify the files.

On my machine the program is named dbm1484.bin; the InFile is dbm1484inp.txt; the OutFile is dbm1484out.txt.

Suggestion: get help from someone at your location.

Daniel B. Martin
My input file is hap.txt and my output file is uniqhap.txt I changed them to these but I get this error:
skarimi@signal[19:20][~]$ cd mkhap
skarimi@signal[19:26][~/mkhap]$ awk '{for (j=1;j<=NF;j++) !a[j","$j]++?b[j]=b[j]" "$j:0;} END{for (j=1;j<=NF;j++) print b[j]}' hap.txt \ |awk '{for (j=1;j<=NF;j++) a[j]=a[j]" "$j} END {j=1; while (j in a) {print a[j];j++}}' > uniqhap.txt
awk: cmd. line:1: fatal: cannot open file ` ' for reading (No such file or directory)
skarimi@signal[19:27][~/mkhap]$
should I add a program to my file? is this the problem? can you please guide me?
 
Old 09-08-2015, 06:35 PM   #29
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Mint 17.3
Posts: 1,881

Rep: Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660
Quote:
Originally Posted by Rozak View Post
awk: cmd. line:1: fatal: cannot open file ` ' for reading (No such file or directory)
This means you have not correctly identified the InFile, so the awk says "there is no InFile so I can't execute."

Again, you need help from someone at your location.

Daniel B. Martin

Last edited by danielbmartin; 09-08-2015 at 06:35 PM. Reason: Cosmetic improvement
 
Old 09-08-2015, 06:49 PM   #30
Rozak
LQ Newbie
 
Registered: Sep 2015
Posts: 23

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by danielbmartin View Post
This means you have not correctly identified the InFile, so the awk says "there is no InFile so I can't execute."

Again, you need help from someone at your location.

Daniel B. Martin
but it works based on the previous command that you wrote. and it does not give any error. look:
skarimi@signal[19:41][~/mkhap]$ awk '{for (j=1;j<=NF;j++) !b[j","$j]++?b[j]=b[j]"\n"$j:0;} END{for (j=1;j<=NF;j++) print "\nIn column",j,
> "The unique values are:"b[j]}' hap.txt > uniq.txt
skarimi@signal[19:46][~/mkhap]$
it runs without any error and I can see the result. but it does not work base on the last command you sent to me. are you sure nothin is wrong with the command? I really appreciat your help!
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] Addition of characters to column in tab file sawdusted Linux - Newbie 17 04-04-2013 08:30 PM
[SOLVED] Replace Characters in txt file struct Linux - Newbie 4 11-07-2010 01:28 PM
html2text >> a.txt creates the file but has extra characters in it Kakarot_Rathish Linux - General 4 03-08-2010 05:01 AM
How can I use Shell script to edit row 23 column 5-8 in a txt file? leena_d Linux - Newbie 4 12-14-2009 03:43 AM
strange characters when routing man page to txt file DJOtaku Linux - General 3 05-15-2005 01:03 AM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 11:12 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration