LinuxQuestions.org
Go Job Hunting at the LQ Job Marketplace
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices

Reply
 
Search this Thread
Old 11-08-2012, 11:23 AM   #1
piyush128k
Member
 
Registered: Jun 2012
Posts: 61
Blog Entries: 1

Rep: Reputation: Disabled
Unhappy Split a file into two


I have a file with first name in first column and their status as "Active" or "Inactive" in second column. It is a very large file with 40 thousands users. I need to split the file into two. One named Active including only active users and the other named Inactive including inactive users.

Help me achieve that
 
Old 11-08-2012, 11:33 AM   #2
colucix
Moderator
 
Registered: Sep 2003
Location: Bologna
Distribution: CentOS 6.5 OpenSuSE 12.3
Posts: 10,458

Rep: Reputation: 1941Reputation: 1941Reputation: 1941Reputation: 1941Reputation: 1941Reputation: 1941Reputation: 1941Reputation: 1941Reputation: 1941Reputation: 1941Reputation: 1941
A simple awk one-liner should do the trick:
Code:
awk '{print > $2}' file
 
1 members found this post helpful.
Old 11-08-2012, 11:35 AM   #3
druuna
LQ Veteran
 
Registered: Sep 2003
Posts: 10,532
Blog Entries: 7

Rep: Reputation: 2371Reputation: 2371Reputation: 2371Reputation: 2371Reputation: 2371Reputation: 2371Reputation: 2371Reputation: 2371Reputation: 2371Reputation: 2371Reputation: 2371
I haven't come across anyone that is called Active or Inactive, so if this is a unique filed:
Code:
awk -v actives=actives.out -v inactives=inactives.out '/Inactive/ { print $1 >> inactives } /Active/ { print $1 >> actives }' infile
You don't provide an example of the input file so you might need to set a proper separator.

EDIT: Use colucix' answer, more elegant!

Last edited by druuna; 11-08-2012 at 11:36 AM.
 
1 members found this post helpful.
Old 11-08-2012, 02:00 PM   #4
piyush128k
Member
 
Registered: Jun 2012
Posts: 61
Blog Entries: 1

Original Poster
Rep: Reputation: Disabled
It is kind of complicated. Let me try.
db user list:
--------------
a active
b active
c inactive
d active
e inactive

etc/passwd user list:
----------------
a
c
f
g
h

resultant combined file:
-------------------------
a
a active
b active
c inactive
c
d active
e inactive
f
g
h

What i want is:
-----------------
c inactive user exists in both db and etc/paaswd
e inactive user exists only in db
f rogue user
g rogue user
h rogue user

i got confused on how to tackle it. should i split or .....if i split then it becomes difficult to join without the words active and inactive to segregate the users. what i want to achieve is the last table above.
How to get this?

---------- Post added 11-08-12 at 02:01 PM ----------

so you see, I am not concerned about active users and dont want them to reflect in the final file
 
Old 11-08-2012, 02:01 PM   #5
piyush128k
Member
 
Registered: Jun 2012
Posts: 61
Blog Entries: 1

Original Poster
Rep: Reputation: Disabled
@colucix, @druuna
 
Old 11-08-2012, 02:09 PM   #6
schneidz
Senior Member
 
Registered: May 2005
Location: boston, usa
Distribution: fc-15/ fc-19-live-usb/ aix
Posts: 3,840

Rep: Reputation: 590Reputation: 590Reputation: 590Reputation: 590Reputation: 590Reputation: 590
would grep -f work here ?

else, maybe a for loop multiplexing /etc/passwd with db-user-list .
 
Old 11-08-2012, 02:29 PM   #7
piyush128k
Member
 
Registered: Jun 2012
Posts: 61
Blog Entries: 1

Original Poster
Rep: Reputation: Disabled
could you kindly elaborate on this please "<<else, maybe a for loop multiplexing /etc/passwd with db-user-list .>>
 
Old 11-08-2012, 02:58 PM   #8
ntubski
Senior Member
 
Registered: Nov 2005
Distribution: Debian
Posts: 2,396

Rep: Reputation: 814Reputation: 814Reputation: 814Reputation: 814Reputation: 814Reputation: 814Reputation: 814
How about a database:
Code:
CREATE TABLE users_passwd(username);
CREATE TABLE users_active(username, active);

.separator ' '
.import users-passwd.txt users_passwd
.import users-db.txt users_active

SELECT users_passwd.username, 'inactive user exists in both db and etc/passwd'
FROM users_passwd, users_active WHERE users_active.active = 'inactive'
AND users_passwd.username = users_active.username;

SELECT username, 'inactive user exists only in db'
FROM users_active WHERE active = 'inactive'
AND username NOT IN users_passwd;

SELECT username, 'rogue user'
FROM users_passwd
WHERE username NOT IN (SELECT username FROM users_active);
Code:
% sqlite3 < check-users.sql
c inactive user exists in both db and etc/passwd
e inactive user exists only in db
f rogue user
g rogue user
h rogue user
 
Old 11-08-2012, 04:22 PM   #9
druuna
LQ Veteran
 
Registered: Sep 2003
Posts: 10,532
Blog Entries: 7

Rep: Reputation: 2371Reputation: 2371Reputation: 2371Reputation: 2371Reputation: 2371Reputation: 2371Reputation: 2371Reputation: 2371Reputation: 2371Reputation: 2371Reputation: 2371
Have a look at this:
Code:
#!/bin/bash

awk 'BEGIN { 
  FS = "[: ]"
  while ( ( getline < "/etc/passwd" ) > 0 )  _[$1] = $1 
}
{ _[$1] = _[$1]" "$2 } 
END { 
  for ( i in _ ) { 
    if ( _[i] !~ / active/ ) { 
      if ( i    ~ _[i] )         { print i > "users.rogue" } 
      if ( _[i] ~ /^ inactive/ ) { print i > "users.inactive.db.only" } 
      if ( _[i] ~ /. inactive/ ) { print i > "users.inactive.in.both" } 
    }
  }
}' db.users.list
The above creates 3 files that hold the respective users.
 
1 members found this post helpful.
Old 11-09-2012, 11:52 AM   #10
piyush128k
Member
 
Registered: Jun 2012
Posts: 61
Blog Entries: 1

Original Poster
Rep: Reputation: Disabled
I appreciate ntubski but i will go with druuna because I am to do it in bash. I am only allowed to get one file from the db and that is given.

---------- Post added 11-09-12 at 11:53 AM ----------

This is what I had and I was going in circles. I searched and tested a lot.

#! /bin/bash


CMD="use metadata; select usernames.SNo, usernames.DataTelid, usernames.UName, personaldata.ActiveInactive from usernames, personaldata where usernames.DataTelid=personaldata.DataTelid ORDER By usernames.UName, personaldata.ActiveInactive into outfile '/tmp/querydb';"

mysql -u root -pnew-password -e

"$CMD"
exec &> /tmp/final
awk -F':' '{print $1}' /etc/passwd | sort > /tmp/userlist




#sed s/Inactive/Passive/ /tmp/querydb >/tmp/parse

#sed '/Active/d' /tmp/parse > /tmp/strpdb

#awk -F"\," 'FILENAME== "/tmp/userlist"{A[$1$2]=$1$2}FILENAME== "/tmp/querydb"{if(A[$1]){print}}' /tmp/userlist / tmp/querydb

#% join -j 1 /tmp/querydb /tmp/userlist


awk '{print $3" " $5}' /tmp/querydb | sort > /tmp/strpdb


#awk '/Active/' /tmp/strpdb | sort > /tmp/parse

#awk '{print $1}' /tmp/parse | sort > /tmp/act

#awk '/Inactive/' /tmp/strpdb | sort > /tmp/indbu

#awk '{print $1}' /tmp/indbu | sort > /tmp/inc

#awk -F "," '{close(f);f=$3} {print > f".txt"}' /tmp/strpdb

#awk '/^Active|^Inactive/{close("add+_"f);f++}{print $1>"add_"f}' /tmp/strpdb

#join -t: /tmp/strpdb /tmp/userlist -1 2 -1 2 -t
#sort -m /tmp/strpdb /tmp/userlist

#paste -d" " /tmp/strpdb /tmp/userlist | sort > /tmp/indbu

#sort -k2 -n /tmp/indbu

#pr -tm /tmp/querydb /tmp/userlist | awk '{$3"$1}'

#join -t: /tmp/querydb /tmp/userlist | more

#echo Inactive Compiltion
echo

#comm -3 /tmp/strpdb /tmp/userlist
echo

#echo Above are the inactive users that exist in etcpasswd list. Intruder alert.
diff /tmp/strpdb /tmp/userlist | grep '{print $1}' | sed "s/^> \(.*\)/\1 ---> Rogue User/;s/^<\(.*\)/\1 <-------Ignore this Inactive user. Only exists in database/"

echo

echo


exit 0
 
Old 11-09-2012, 11:59 AM   #11
piyush128k
Member
 
Registered: Jun 2012
Posts: 61
Blog Entries: 1

Original Poster
Rep: Reputation: Disabled
Question

Sorry I meant I appreciate both druuna (he shows how to compare with etc/passwd) and ntubski (he gives me the sql files). I need to put them together so they work in order
 
Old 11-09-2012, 12:07 PM   #12
piyush128k
Member
 
Registered: Jun 2012
Posts: 61
Blog Entries: 1

Original Poster
Rep: Reputation: Disabled
@ntubski, now that you see my code and the cloud i am in, i am not to sure if what you suggested fits what i want. I can only decide rogue user if the user only exists in etc/password and does not in my db file that i got from the sql query.
@druuna please explain below

#!/bin/bash -> got it

awk 'BEGIN { -> got it
FS = "[: ]" -> got it
while ( ( getline < "/etc/passwd" ) > 0 ) _[$1] = $1 -> are we stripping the file to column $1, meaning just for the usernames???
}
{ _[$1] = _[$1]" "$2 } -> I am not to sure whats happening here
END {
for ( i in _ ) { -> Why _ ?
if ( _[i] !~ / active/ ) { -> Please justify ?
if ( i ~ _[i] ) { print i > "users.rogue" }
if ( _[i] ~ /^ inactive/ ) { print i > "users.inactive.db.only" }
if ( _[i] ~ /. inactive/ ) { print i > "users.inactive.in.both" }
}
}
}' db.users.list
 
Old 11-09-2012, 12:39 PM   #13
schneidz
Senior Member
 
Registered: May 2005
Location: boston, usa
Distribution: fc-15/ fc-19-live-usb/ aix
Posts: 3,840

Rep: Reputation: 590Reputation: 590Reputation: 590Reputation: 590Reputation: 590Reputation: 590
Quote:
Originally Posted by druuna View Post
I haven't come across anyone that is called Active or Inactive, so if this is a unique filed:
Code:
awk -v actives=actives.out -v inactives=inactives.out '/Inactive/ { print $1 >> inactives } /Active/ { print $1 >> actives }' infile
You don't provide an example of the input file so you might need to set a proper separator.

EDIT: Use colucix' answer, more elegant!
i give both of you a point, colucixs is more elegant but yours is more self-explanatory.
 
Old 11-09-2012, 12:50 PM   #14
druuna
LQ Veteran
 
Registered: Sep 2003
Posts: 10,532
Blog Entries: 7

Rep: Reputation: 2371Reputation: 2371Reputation: 2371Reputation: 2371Reputation: 2371Reputation: 2371Reputation: 2371Reputation: 2371Reputation: 2371Reputation: 2371Reputation: 2371
Quote:
Originally Posted by piyush128k
@druuna please explain
Code:
awk 'BEGIN { 
  # blue part is done before reading db.users.list
  # set the 2 needed separators (: or a space)
  FS = "[: ]"
  # read the user names from /etc/passwd and store them in an array. username is also the index.
  while ( ( getline < "/etc/passwd" ) > 0 )  _[$1] = $1 
}
# brown part is done for each line in db.users.list
# store or add field 2 from db.users.list.
# If the entry already exist a space and field 2 is added to the username,
# if it doesn't exist a space and field 2 is stored in a new array entry. 
{ _[$1] = _[$1]" "$2 } 
END { 
  # green part is done when db.users.list is completely read
  # array now holds all usernames present in /etc/passwd and db.users.list. some of those have an extra field (active/inactive)
  # for all entries in array
  for ( i in _ ) { 
    # dismiss array entries that contain a space followed by active
    if ( _[i] !~ / active/ ) { 
      # print user only entries
      if ( i    ~ _[i] )         { print i > "users.rogue" } 
      # print lines that start with a space followed by inactive
      if ( _[i] ~ /^ inactive/ ) { print i > "users.inactive.db.only" } 
      # print lines that contain any character followed by a space followed by inactive
      if ( _[i] ~ /. inactive/ ) { print i > "users.inactive.in.both" } 
    }
  }
}' db.users.list

Last edited by druuna; 11-10-2012 at 07:47 AM. Reason: made it a bit more readable
 
1 members found this post helpful.
Old 11-10-2012, 10:29 AM   #15
ntubski
Senior Member
 
Registered: Nov 2005
Distribution: Debian
Posts: 2,396

Rep: Reputation: 814Reputation: 814Reputation: 814Reputation: 814Reputation: 814Reputation: 814Reputation: 814
Quote:
Originally Posted by piyush128k
Sorry I meant I appreciate both druuna (he shows how to compare with etc/passwd) and ntubski (he gives me the sql files). I need to put them together so they work in order
My solution happened to use sql (specifically, sqlite), but it takes as input the text files that were extracted from the /etc/passwd (users-passwd.txt) and the database (users-db.txt). It does NOT interact at all with your actual database.

Both solutions do pretty much the same thing (the output format is a bit different); druuna's solution saves you the extra step of reading from /etc/passwd.
 
  


Reply

Tags
pattern, split, split2


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] AWK: split the file into multiple file and request for explanation of a known code cristalp Programming 4 11-23-2011 07:29 AM
[SOLVED] How can I split a file, without using 'split'? szboardstretcher Linux - Software 11 05-20-2011 02:43 PM
[SOLVED] Split a file to multiple file using awk or perl fad216 Programming 17 03-02-2011 06:15 AM
How to split file , .. awk or split ERBRMN Linux - General 9 08-15-2006 12:02 AM
mysqldump : Can I split the file up to 2GB max per file? Swakoo Linux - General 10 10-17-2005 04:13 AM


All times are GMT -5. The time now is 04:23 PM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration