LinuxQuestions.org
Visit Jeremy's Blog.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 03-17-2011, 01:27 PM   #1
cliffyao
LQ Newbie
 
Registered: Oct 2009
Posts: 27

Rep: Reputation: 15
Any shell scripts for cutting and pasting part of data?


Hi,

I have a tab-delimited txt file as below. It is part of the original file.
Quote:
##Hello
##Welcome
#C1 C2 C3
1 1 1
2 2 2
3 3 3
3 3 3
I want to cut the lines starting with "3" in column1 and paste them before the lines starting with "1" in column 1. So I will get
Quote:
##Hello
##Welcome
#C1 C2 C3
3 3 3
3 3 3
1 1 1
2 2 2
Anyone knows any simple shell scripts to do that? The original file is too big so I want to just use shell scripts to process that data.

Thanks

-C
 
Old 03-17-2011, 02:46 PM   #2
szboardstretcher
Senior Member
 
Registered: Aug 2006
Location: Detroit, MI
Distribution: GNU/Linux systemd
Posts: 4,278

Rep: Reputation: 1694Reputation: 1694Reputation: 1694Reputation: 1694Reputation: 1694Reputation: 1694Reputation: 1694Reputation: 1694Reputation: 1694Reputation: 1694Reputation: 1694
http://www.linuxquestions.org/questi...script-737224/

I found this here.
 
Old 03-17-2011, 08:42 PM   #3
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Arch
Posts: 10,039

Rep: Reputation: 3203Reputation: 3203Reputation: 3203Reputation: 3203Reputation: 3203Reputation: 3203Reputation: 3203Reputation: 3203Reputation: 3203Reputation: 3203Reputation: 3203
Depending on how large the file is you may be able to do it with something like awk. The warning is it will store the information in memory so
if file is large you may need an alternate solution:
Code:
awk '$1 == 3{a=0;b=1}$1 == 1{a=1}a{store=(store)?store"\n"$0:$0;next}b && $1 != 3{b=0;print store}1;END{if(b)print store}' file
This should also work for the following scenario:
Code:
##Hello
##Welcome
#C1 C2 C3
1 1 1
2 2 2
3 3 3
3 3 3
4 4 4
And supply output of:
Code:
##Hello
##Welcome
#C1 C2 C3
3 3 3
3 3 3
1 1 1
2 2 2
4 4 4
 
1 members found this post helpful.
Old 03-18-2011, 11:05 AM   #4
theNbomr
LQ 5k Club
 
Registered: Aug 2005
Distribution: OpenSuse, Fedora, Redhat, Debian
Posts: 5,399
Blog Entries: 2

Rep: Reputation: 908Reputation: 908Reputation: 908Reputation: 908Reputation: 908Reputation: 908Reputation: 908Reputation: 908
I concur with grail; this is a job better suited to a more complete programming language such as Awk or Perl. Is that an option?

--- rod.
 
Old 03-18-2011, 11:31 AM   #5
David the H.
Bash Guru
 
Registered: Jun 2004
Location: Osaka, Japan
Distribution: Arch + Xfce
Posts: 6,852

Rep: Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037
Quote:
Originally Posted by szboardstretcher View Post
Unfortunately, the sed expression I gave in the above thread isn't really suitable for this problem. In that case we only had to concatenate a couple of lines, while here we have to shift whole blocks of lines around. sed just isn't really designed for major multi-line editing, so I agree that awk or perl would be best here.

It might be fun to try writing a bash-only script that can do the same thing, but it would probably end up being too complex to be worth the effort.
 
1 members found this post helpful.
Old 03-18-2011, 11:45 AM   #6
theNbomr
LQ 5k Club
 
Registered: Aug 2005
Distribution: OpenSuse, Fedora, Redhat, Debian
Posts: 5,399
Blog Entries: 2

Rep: Reputation: 908Reputation: 908Reputation: 908Reputation: 908Reputation: 908Reputation: 908Reputation: 908Reputation: 908
Quote:
Originally Posted by David the H. View Post
I agree that awk or perl would be best here.
In particular, the Perl 'splice' function seems to be ideally suited to most parts of the task.
--- rod.
 
Old 03-18-2011, 07:13 PM   #7
kurumi
Member
 
Registered: Apr 2010
Posts: 228

Rep: Reputation: 53
Quote:
Originally Posted by theNbomr View Post
In particular, the Perl 'splice' function seems to be ideally suited to most parts of the task.
--- rod.

how? may i ask. do you read the whole file into memory?
 
Old 03-18-2011, 08:34 PM   #8
theNbomr
LQ 5k Club
 
Registered: Aug 2005
Distribution: OpenSuse, Fedora, Redhat, Debian
Posts: 5,399
Blog Entries: 2

Rep: Reputation: 908Reputation: 908Reputation: 908Reputation: 908Reputation: 908Reputation: 908Reputation: 908Reputation: 908
Quote:
how? may i ask. do you read the whole file into memory?
Untested...

Code:
#! /usr/bin/perl -w
use strict;

    # Open data file and swallow whole
    #
    open( DATAFILE, "/your/data/file.name" );
    my @datafile = <DATAFILE>;
    close DATAFILE;

    my $records = @datafile;
    my $record1;
    for( my $i = 0; $i < $records; $i++ ){

        # Assuming only one of these...
        if( $datafile[$i] =~ m/1 1 1/ ){
            
            # remember where to insert the '3 3 3' records.
            $record1 = $i;
        }
        elsif( $datafile[$i] =~ m/3 3 3/ ){
            push @records3, $datafile[$i];
            splice( @datafile, $i, 1 );
       }
    }
    splice @datafile, $record1, 0, @records3;

    open( DATAFILE, ">/your/data/file.newname" );
    print DATAFILE @datafile;
    close DATAFILE;
    exit 0;
I wasn't going to write the whole thing, but what the heck....

--- rod.
 
Old 03-18-2011, 09:12 PM   #9
kurumi
Member
 
Registered: Apr 2010
Posts: 228

Rep: Reputation: 53
Quote:
Originally Posted by theNbomr View Post
Untested...
Code:
#! /usr/bin/perl -w
use strict;

    # Open data file and swallow whole
    #
    open( DATAFILE, "/your/data/file.name" );
    my @datafile = <DATAFILE>;
    close DATAFILE;
well i do not know whether he meant the file is too big to post here, or whether it is in fact a very huge file, but

Quote:
Originally Posted by cliffyao
The original file is too big.......

Last edited by kurumi; 03-18-2011 at 09:21 PM.
 
Old 03-19-2011, 02:55 AM   #10
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Arch
Posts: 10,039

Rep: Reputation: 3203Reputation: 3203Reputation: 3203Reputation: 3203Reputation: 3203Reputation: 3203Reputation: 3203Reputation: 3203Reputation: 3203Reputation: 3203Reputation: 3203
Quote:
It might be fun to try writing a bash-only script that can do the same thing
I will take the challenge
Code:
#!/bin/bash

testing=true
found3=false

while read -r line
do
    if [[ $line =~ ^[0-9] ]] && $testing
    then
        if [[ $line =~ ^3 ]]
        then
            found3=true
            testing=false
        else
            [[ $insert ]] && insert+="$line\n" || insert="$line\n"
            continue
        fi
    fi

    if $found3 && [[ ! $line =~ ^3 ]]
    then
        line="$insert$line"
        found3=false
    fi

    echo -e "$line" >> out_file
done<in_file

if $found3
then
    echo -e "$insert" >> out_file
fi
You could also change 'insert' to be a temp file and hence if original file is really large then the issue of storing in memory is abated.
 
1 members found this post helpful.
Old 03-19-2011, 09:58 AM   #11
theNbomr
LQ 5k Club
 
Registered: Aug 2005
Distribution: OpenSuse, Fedora, Redhat, Debian
Posts: 5,399
Blog Entries: 2

Rep: Reputation: 908Reputation: 908Reputation: 908Reputation: 908Reputation: 908Reputation: 908Reputation: 908Reputation: 908
Quote:
Originally Posted by kurumi View Post
well i do not know whether he meant the file is too big to post here, or whether it is in fact a very huge file, but
I took it to mean 'too big to do this manually' That seems to be the usual case in these forums.
--- rod.
 
Old 03-19-2011, 01:59 PM   #12
David the H.
Bash Guru
 
Registered: Jun 2004
Location: Osaka, Japan
Distribution: Arch + Xfce
Posts: 6,852

Rep: Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037
grail, that's superb. I didn't really expect anyone to take up the challenge. I'd just thought about it for a few moments before deciding that I didn't have the time or energy to take it on myself.

I'm still trying to figure out the whole process of what you wrote, but it's already taught me something new:
Code:
found=true
if $found ; then echo "true" ; else echo "false" ; fi
# evaluates as "true"

found=false
if $found ; then echo "true" ; else echo "false" ; fi
# evaluates as "false"
It took me a minute to realize that the "true" and "false" contained in the variable are being evaluated as the commands, rather than strings. Interesting usage.

However, I think I'd still just use a regular string test myself, and since the double brackets can handle multiple conditions, I'd change your first if statement, for example, to:
Code:
if [[ $line =~ ^[0-9] && $testing == true ]]
Besides personal preference, I think it makes what's being tested clearer.
 
Old 03-20-2011, 12:37 AM   #13
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Arch
Posts: 10,039

Rep: Reputation: 3203Reputation: 3203Reputation: 3203Reputation: 3203Reputation: 3203Reputation: 3203Reputation: 3203Reputation: 3203Reputation: 3203Reputation: 3203Reputation: 3203
Quote:
However, I think I'd still just use a regular string test myself, and since the double brackets can handle multiple conditions
I follow where you are coming from And in the simple case like this I generally agree, but in some code I may have a test (using [[) along with a boolean and
also an arithmetic expression. So I have gotten in the habit of using the appropriate test for each.
Like so:
Code:
if [[ $string && -d $is_dir ]] || (( max > MAX || min < MIN)) || $start
These can all be handled within the confines of '[[', but (to me) it is clearer which test I am performing based on the nomenclature used.

Thanks for the feedback though as I do also need to work well with others and sometimes forget
 
Old 03-20-2011, 01:42 AM   #14
David the H.
Bash Guru
 
Registered: Jun 2004
Location: Osaka, Japan
Distribution: Arch + Xfce
Posts: 6,852

Rep: Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037
Yes, that's reasonable, when the types of evaluation are clearly different. On the other hand, it could be argued that combining multiple conditions is one of the main purposes of [[..]], so it seems clear enough to me either way.

I think that it's more the subtler tricks, where it may not be obvious just what the code is doing, that should be avoided as much as possible--at least when other people are going to see it. I was expecting to see a string test, so it was a bit of a surprise to discover that you were actually using true/false as commands.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
LXer: Writing Better Shell Scripts - Part 3 LXer Syndicated Linux News 0 09-25-2010 03:30 AM
Questions about cutting & pasting in VirtualBox/Ubuntu? comcastuser Linux - Software 2 09-10-2010 12:07 AM
LXer: Writing Better Shell Scripts – Part 2 LXer Syndicated Linux News 0 07-26-2010 07:50 PM
LXer: Writing Better Shell Scripts – Part 1 LXer Syndicated Linux News 0 06-15-2010 02:20 PM
cutting and pasting lxandrthegr8 Linux - General 8 08-17-2003 02:04 PM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 07:11 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration