LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   rows and columns in data(matrix) trimming command (https://www.linuxquestions.org/questions/linux-newbie-8/rows-and-columns-in-data-matrix-trimming-command-4175457375/)

smart1seo 04-08-2013 11:55 AM

rows and columns in data(matrix) trimming command
 
I have data set in ascii format.
Data size is row x column= 23 x 55 matrix.
I want to trim inside of data first one row, last two rows, first two columns, and last three columns so it will be 20 x 50.
e.g) data from below, i want to trim all zeros
but data is random numbers not just 1s and 0s
000000000000
001111111000
001111111000
001111111000
001111111000
000000000000
000000000000

I've used 'sed' command to delete rows, but i don't know if I can use for deleting columns as well.
Thank you

grail 04-08-2013 01:13 PM

Just tell sed to remove the same column item for every row, hence column removed.

smart1seo 04-09-2013 10:03 AM

can you give me an example what option to use from sed?

grail 04-09-2013 10:12 AM

Well you probably will need to know the number of characters per line and then make the necessary character blank:
Code:

sed -r 's/(.{10})./\1/' file

John VV 04-09-2013 01:52 PM

unless you have to do this all the time
-- this dates back a very long time ---
( a long time ago in a window far far away ...)
libreoffice calc ( excel)
import the data as a CSV using blank spaces and not a " ,"
trim the rows and columns and export

David the H. 04-09-2013 06:41 PM

ed is more convenient than sed when it comes to targeting specific rows, particularly that one before the end, because it can use line offsets in its addresses.

Code:

printf '%s\n' '1d' '$-1,$d' '%s/^..\(.*\)...$/\1/' '%p' | ed -s infile.txt
'%p' will print the modified buffer to stdout. Change it to 'w' if you want to write the changes back to the file.

How to use ed:
http://wiki.bash-hackers.org/howto/edit-ed
http://snap.nlc.dcccd.edu/learn/nlc/ed.html
(also read the info page)

grail 04-10-2013 05:39 AM

Well if we are going to change tool, I would say awk is the tool for this job.

David the H. 04-11-2013 03:30 PM

I'd say that any good text processing tool could handle the job, really. The only problem with sed, or awk, for that matter, is that there's no simple way to delete the next-to-last line unless you know the actual line number in advance. I'm sure it could be done with a bit of work, but it wouldn't be pretty. It could be done in awk with an array, for example.

Of course, if you do know the exact size of the grid, then it becomes trivial to do with any of them.

Code:

sed -r '1d ; 6d ; 7d ; { s/^..(.*)...$/\1/ }' infile.txt
printf '%s\n' '1d' '5,6d' '%s/^..\(.*\)...$/\1/' '%p' | ed -s infile.txt
awk '( NR>1 && NR<6 ) { sub(/^../,""); sub(/...$/,""); print }' infile.txt


Actually, you could even do this entirely in the bash shell, which can be quite fast when working with small files. In a few tests I've ran, if the input is under about a kilobyte and the modifications aren't too complex, I've found that it can beat out any external process.

Code:

$ time {
    mapfile -t grid <infile.txt
    grid=( "${grid[@]:1:${#grid[@]}-3}" )
    grid=( "${grid[@]#??}" )
    printf '%s\n' "${grid[@]%???}"
}
1111111
1111111
1111111
1111111

real    0m0.001s
user    0m0.000s
sys    0m0.000s

$ time printf '%s\n' '1d' '$-1,$d' '%s/^..\(.*\)...$/\1/' '%p' | ed -s infile.txt
1111111
1111111
1111111
1111111

real    0m0.009s
user    0m0.000s
sys    0m0.000s

Although I suppose it's a bit of an exaggeration to call it a big difference. ;)

BruceFerjulian 04-11-2013 04:04 PM

Tail & Cut
 
Tailor to suit your needs.

Example:

tail -54 tmp.lis | head -52 | cut -b3-20


BEFORE:

..01XXXXXXXXXXXXXXXX...
..02XXXXXXXXXXXXXXXX...
..03XXXXXXXXXXXXXXXX...
..04XXXXXXXXXXXXXXXX...
..05XXXXXXXXXXXXXXXX...
..06XXXXXXXXXXXXXXXX...
..07XXXXXXXXXXXXXXXX...
..08XXXXXXXXXXXXXXXX...
..09XXXXXXXXXXXXXXXX...
..10XXXXXXXXXXXXXXXX...
..11XXXXXXXXXXXXXXXX...
..12XXXXXXXXXXXXXXXX...
..13XXXXXXXXXXXXXXXX...
..14XXXXXXXXXXXXXXXX...
..15XXXXXXXXXXXXXXXX...
..16XXXXXXXXXXXXXXXX...
..17XXXXXXXXXXXXXXXX...
..18XXXXXXXXXXXXXXXX...
..19XXXXXXXXXXXXXXXX...
..20XXXXXXXXXXXXXXXX...
..21XXXXXXXXXXXXXXXX...
..22XXXXXXXXXXXXXXXX...
..23XXXXXXXXXXXXXXXX...
..24XXXXXXXXXXXXXXXX...
..25XXXXXXXXXXXXXXXX...
..26XXXXXXXXXXXXXXXX...
..27XXXXXXXXXXXXXXXX...
..28XXXXXXXXXXXXXXXX...
..29XXXXXXXXXXXXXXXX...
..30XXXXXXXXXXXXXXXX...
..31XXXXXXXXXXXXXXXX...
..32XXXXXXXXXXXXXXXX...
..33XXXXXXXXXXXXXXXX...
..34XXXXXXXXXXXXXXXX...
..35XXXXXXXXXXXXXXXX...
..36XXXXXXXXXXXXXXXX...
..37XXXXXXXXXXXXXXXX...
..38XXXXXXXXXXXXXXXX...
..39XXXXXXXXXXXXXXXX...
..40XXXXXXXXXXXXXXXX...
..41XXXXXXXXXXXXXXXX...
..42XXXXXXXXXXXXXXXX...
..43XXXXXXXXXXXXXXXX...
..44XXXXXXXXXXXXXXXX...
..45XXXXXXXXXXXXXXXX...
..46XXXXXXXXXXXXXXXX...
..47XXXXXXXXXXXXXXXX...
..48XXXXXXXXXXXXXXXX...
..49XXXXXXXXXXXXXXXX...
..50XXXXXXXXXXXXXXXX...
..51XXXXXXXXXXXXXXXX...
..52XXXXXXXXXXXXXXXX...
..53XXXXXXXXXXXXXXXX...
..54XXXXXXXXXXXXXXXX...
..55XXXXXXXXXXXXXXXX...

AFTER

02XXXXXXXXXXXXXXXX
03XXXXXXXXXXXXXXXX
04XXXXXXXXXXXXXXXX
05XXXXXXXXXXXXXXXX
06XXXXXXXXXXXXXXXX
07XXXXXXXXXXXXXXXX
08XXXXXXXXXXXXXXXX
09XXXXXXXXXXXXXXXX
10XXXXXXXXXXXXXXXX
11XXXXXXXXXXXXXXXX
12XXXXXXXXXXXXXXXX
13XXXXXXXXXXXXXXXX
14XXXXXXXXXXXXXXXX
15XXXXXXXXXXXXXXXX
16XXXXXXXXXXXXXXXX
17XXXXXXXXXXXXXXXX
18XXXXXXXXXXXXXXXX
19XXXXXXXXXXXXXXXX
20XXXXXXXXXXXXXXXX
21XXXXXXXXXXXXXXXX
22XXXXXXXXXXXXXXXX
23XXXXXXXXXXXXXXXX
24XXXXXXXXXXXXXXXX
25XXXXXXXXXXXXXXXX
26XXXXXXXXXXXXXXXX
27XXXXXXXXXXXXXXXX
28XXXXXXXXXXXXXXXX
29XXXXXXXXXXXXXXXX
30XXXXXXXXXXXXXXXX
31XXXXXXXXXXXXXXXX
32XXXXXXXXXXXXXXXX
33XXXXXXXXXXXXXXXX
34XXXXXXXXXXXXXXXX
35XXXXXXXXXXXXXXXX
36XXXXXXXXXXXXXXXX
37XXXXXXXXXXXXXXXX
38XXXXXXXXXXXXXXXX
39XXXXXXXXXXXXXXXX
40XXXXXXXXXXXXXXXX
41XXXXXXXXXXXXXXXX
42XXXXXXXXXXXXXXXX
43XXXXXXXXXXXXXXXX
44XXXXXXXXXXXXXXXX
45XXXXXXXXXXXXXXXX
46XXXXXXXXXXXXXXXX
47XXXXXXXXXXXXXXXX
48XXXXXXXXXXXXXXXX
49XXXXXXXXXXXXXXXX
50XXXXXXXXXXXXXXXX
51XXXXXXXXXXXXXXXX
52XXXXXXXXXXXXXXXX
53XXXXXXXXXXXXXXXX


All times are GMT -5. The time now is 03:34 AM.