Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question?
If it is not in the man pages or the how-to's this is the place! |
| Notices |
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
Are you new to LinuxQuestions.org? Visit the following links:
Site Howto |
Site FAQ |
Sitemap |
Register Now
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
 |
GNU/Linux Basic Guide
This 255-page guide will provide you with the keys to understand the philosophy of free software, teach you how to use and handle it, and give you the tools required to move easily in the world of GNU/Linux. Many users and administrators will be taking their first steps with this GNU/Linux Basic guide and it will show you how to approach and solve the problems you encounter.
Click Here to receive this Complete Guide absolutely free. |
|
 |
|
11-08-2012, 11:23 AM
|
#1
|
|
Member
Registered: Jun 2012
Posts: 61
Rep: 
|
Split a file into two
I have a file with first name in first column and their status as "Active" or "Inactive" in second column. It is a very large file with 40 thousands users. I need to split the file into two. One named Active including only active users and the other named Inactive including inactive users.
Help me achieve that
|
|
|
|
11-08-2012, 11:33 AM
|
#2
|
|
Moderator
Registered: Sep 2003
Location: Bologna
Distribution: CentOS 6.4 OpenSuSE 12.2
Posts: 9,896
|
A simple awk one-liner should do the trick:
Code:
awk '{print > $2}' file
|
|
|
1 members found this post helpful.
|
11-08-2012, 11:35 AM
|
#3
|
|
LQ Veteran
Registered: Sep 2003
Location: the Netherlands
Distribution: lfs, debian, rhel
Posts: 8,703
|
I haven't come across anyone that is called Active or Inactive, so if this is a unique filed:
Code:
awk -v actives=actives.out -v inactives=inactives.out '/Inactive/ { print $1 >> inactives } /Active/ { print $1 >> actives }' infile
You don't provide an example of the input file so you might need to set a proper separator.
EDIT: Use colucix' answer, more elegant!
Last edited by druuna; 11-08-2012 at 11:36 AM.
|
|
|
1 members found this post helpful.
|
11-08-2012, 02:00 PM
|
#4
|
|
Member
Registered: Jun 2012
Posts: 61
Original Poster
Rep: 
|
It is kind of complicated. Let me try.
db user list:
--------------
a active
b active
c inactive
d active
e inactive
etc/passwd user list:
----------------
a
c
f
g
h
resultant combined file:
-------------------------
a
a active
b active
c inactive
c
d active
e inactive
f
g
h
What i want is:
-----------------
c inactive user exists in both db and etc/paaswd
e inactive user exists only in db
f rogue user
g rogue user
h rogue user
i got confused on how to tackle it. should i split or .....if i split then it becomes difficult to join without the words active and inactive to segregate the users. what i want to achieve is the last table above.
How to get this?
---------- Post added 11-08-12 at 02:01 PM ----------
so you see, I am not concerned about active users and dont want them to reflect in the final file
|
|
|
|
11-08-2012, 02:01 PM
|
#5
|
|
Member
Registered: Jun 2012
Posts: 61
Original Poster
Rep: 
|
@colucix, @druuna
|
|
|
|
11-08-2012, 02:09 PM
|
#6
|
|
Senior Member
Registered: May 2005
Location: boston, usa
Distribution: fc-12/ fc-11-live-usb/ aix
Posts: 2,672
|
would grep -f work here ?
else, maybe a for loop multiplexing /etc/passwd with db-user-list .
|
|
|
|
11-08-2012, 02:29 PM
|
#7
|
|
Member
Registered: Jun 2012
Posts: 61
Original Poster
Rep: 
|
could you kindly elaborate on this please "<<else, maybe a for loop multiplexing /etc/passwd with db-user-list .>>
|
|
|
|
11-08-2012, 02:58 PM
|
#8
|
|
Senior Member
Registered: Nov 2005
Distribution: Debian
Posts: 2,019
|
How about a database:
Code:
CREATE TABLE users_passwd(username);
CREATE TABLE users_active(username, active);
.separator ' '
.import users-passwd.txt users_passwd
.import users-db.txt users_active
SELECT users_passwd.username, 'inactive user exists in both db and etc/passwd'
FROM users_passwd, users_active WHERE users_active.active = 'inactive'
AND users_passwd.username = users_active.username;
SELECT username, 'inactive user exists only in db'
FROM users_active WHERE active = 'inactive'
AND username NOT IN users_passwd;
SELECT username, 'rogue user'
FROM users_passwd
WHERE username NOT IN (SELECT username FROM users_active);
Code:
% sqlite3 < check-users.sql
c inactive user exists in both db and etc/passwd
e inactive user exists only in db
f rogue user
g rogue user
h rogue user
|
|
|
|
11-08-2012, 04:22 PM
|
#9
|
|
LQ Veteran
Registered: Sep 2003
Location: the Netherlands
Distribution: lfs, debian, rhel
Posts: 8,703
|
Have a look at this:
Code:
#!/bin/bash
awk 'BEGIN {
FS = "[: ]"
while ( ( getline < "/etc/passwd" ) > 0 ) _[$1] = $1
}
{ _[$1] = _[$1]" "$2 }
END {
for ( i in _ ) {
if ( _[i] !~ / active/ ) {
if ( i ~ _[i] ) { print i > "users.rogue" }
if ( _[i] ~ /^ inactive/ ) { print i > "users.inactive.db.only" }
if ( _[i] ~ /. inactive/ ) { print i > "users.inactive.in.both" }
}
}
}' db.users.list
The above creates 3 files that hold the respective users.
|
|
|
1 members found this post helpful.
|
11-09-2012, 11:52 AM
|
#10
|
|
Member
Registered: Jun 2012
Posts: 61
Original Poster
Rep: 
|
I appreciate ntubski but i will go with druuna because I am to do it in bash. I am only allowed to get one file from the db and that is given.
---------- Post added 11-09-12 at 11:53 AM ----------
This is what I had and I was going in circles. I searched and tested a lot.
#! /bin/bash
CMD="use metadata; select usernames.SNo, usernames.DataTelid, usernames.UName, personaldata.ActiveInactive from usernames, personaldata where usernames.DataTelid=personaldata.DataTelid ORDER By usernames.UName, personaldata.ActiveInactive into outfile '/tmp/querydb';"
mysql -u root -pnew-password -e
"$CMD"
exec &> /tmp/final
awk -F':' '{print $1}' /etc/passwd | sort > /tmp/userlist
#sed s/Inactive/Passive/ /tmp/querydb >/tmp/parse
#sed '/Active/d' /tmp/parse > /tmp/strpdb
#awk -F"\," 'FILENAME== "/tmp/userlist"{A[$1$2]=$1$2}FILENAME== "/tmp/querydb"{if(A[$1]){print}}' /tmp/userlist / tmp/querydb
#% join -j 1 /tmp/querydb /tmp/userlist
awk '{print $3" " $5}' /tmp/querydb | sort > /tmp/strpdb
#awk '/Active/' /tmp/strpdb | sort > /tmp/parse
#awk '{print $1}' /tmp/parse | sort > /tmp/act
#awk '/Inactive/' /tmp/strpdb | sort > /tmp/indbu
#awk '{print $1}' /tmp/indbu | sort > /tmp/inc
#awk -F "," '{close(f);f=$3} {print > f".txt"}' /tmp/strpdb
#awk '/^Active|^Inactive/{close("add+_"f);f++}{print $1>"add_"f}' /tmp/strpdb
#join -t: /tmp/strpdb /tmp/userlist -1 2 -1 2 -t
#sort -m /tmp/strpdb /tmp/userlist
#paste -d" " /tmp/strpdb /tmp/userlist | sort > /tmp/indbu
#sort -k2 -n /tmp/indbu
#pr -tm /tmp/querydb /tmp/userlist | awk '{$3"$1}'
#join -t: /tmp/querydb /tmp/userlist | more
#echo Inactive Compiltion
echo
#comm -3 /tmp/strpdb /tmp/userlist
echo
#echo Above are the inactive users that exist in etcpasswd list. Intruder alert.
diff /tmp/strpdb /tmp/userlist | grep '{print $1}' | sed "s/^> \(.*\)/\1 ---> Rogue User/;s/^<\(.*\)/\1 <-------Ignore this Inactive user. Only exists in database/"
echo
echo
exit 0
|
|
|
|
11-09-2012, 11:59 AM
|
#11
|
|
Member
Registered: Jun 2012
Posts: 61
Original Poster
Rep: 
|
Sorry I meant I appreciate both druuna (he shows how to compare with etc/passwd) and ntubski (he gives me the sql files). I need to put them together so they work in order
|
|
|
|
11-09-2012, 12:07 PM
|
#12
|
|
Member
Registered: Jun 2012
Posts: 61
Original Poster
Rep: 
|
@ntubski, now that you see my code and the cloud i am in, i am not to sure if what you suggested fits what i want. I can only decide rogue user if the user only exists in etc/password and does not in my db file that i got from the sql query.
@druuna please explain below
#!/bin/bash -> got it
awk 'BEGIN { -> got it
FS = "[: ]" -> got it
while ( ( getline < "/etc/passwd" ) > 0 ) _[$1] = $1 -> are we stripping the file to column $1, meaning just for the usernames???
}
{ _[$1] = _[$1]" "$2 } -> I am not to sure whats happening here
END {
for ( i in _ ) { -> Why _ ?
if ( _[i] !~ / active/ ) { -> Please justify ?
if ( i ~ _[i] ) { print i > "users.rogue" }
if ( _[i] ~ /^ inactive/ ) { print i > "users.inactive.db.only" }
if ( _[i] ~ /. inactive/ ) { print i > "users.inactive.in.both" }
}
}
}' db.users.list
|
|
|
|
11-09-2012, 12:39 PM
|
#13
|
|
Senior Member
Registered: May 2005
Location: boston, usa
Distribution: fc-12/ fc-11-live-usb/ aix
Posts: 2,672
|
Quote:
Originally Posted by druuna
I haven't come across anyone that is called Active or Inactive, so if this is a unique filed:
Code:
awk -v actives=actives.out -v inactives=inactives.out '/Inactive/ { print $1 >> inactives } /Active/ { print $1 >> actives }' infile
You don't provide an example of the input file so you might need to set a proper separator.
EDIT: Use colucix' answer, more elegant!
|
i give both of you a point, colucixs is more elegant but yours is more self-explanatory.
|
|
|
|
11-09-2012, 12:50 PM
|
#14
|
|
LQ Veteran
Registered: Sep 2003
Location: the Netherlands
Distribution: lfs, debian, rhel
Posts: 8,703
|
Quote:
|
Originally Posted by piyush128k
@druuna please explain
|
Code:
awk 'BEGIN {
# blue part is done before reading db.users.list
# set the 2 needed separators (: or a space)
FS = "[: ]"
# read the user names from /etc/passwd and store them in an array. username is also the index.
while ( ( getline < "/etc/passwd" ) > 0 ) _[$1] = $1
}
# brown part is done for each line in db.users.list
# store or add field 2 from db.users.list.
# If the entry already exist a space and field 2 is added to the username,
# if it doesn't exist a space and field 2 is stored in a new array entry.
{ _[$1] = _[$1]" "$2 }
END {
# green part is done when db.users.list is completely read
# array now holds all usernames present in /etc/passwd and db.users.list. some of those have an extra field (active/inactive)
# for all entries in array
for ( i in _ ) {
# dismiss array entries that contain a space followed by active
if ( _[i] !~ / active/ ) {
# print user only entries
if ( i ~ _[i] ) { print i > "users.rogue" }
# print lines that start with a space followed by inactive
if ( _[i] ~ /^ inactive/ ) { print i > "users.inactive.db.only" }
# print lines that contain any character followed by a space followed by inactive
if ( _[i] ~ /. inactive/ ) { print i > "users.inactive.in.both" }
}
}
}' db.users.list
Last edited by druuna; 11-10-2012 at 07:47 AM.
Reason: made it a bit more readable
|
|
|
1 members found this post helpful.
|
11-10-2012, 10:29 AM
|
#15
|
|
Senior Member
Registered: Nov 2005
Distribution: Debian
Posts: 2,019
|
Quote:
|
Originally Posted by piyush128k
Sorry I meant I appreciate both druuna (he shows how to compare with etc/passwd) and ntubski (he gives me the sql files). I need to put them together so they work in order
|
My solution happened to use sql (specifically, sqlite), but it takes as input the text files that were extracted from the /etc/passwd (users-passwd.txt) and the database (users-db.txt). It does NOT interact at all with your actual database.
Both solutions do pretty much the same thing (the output format is a bit different); druuna's solution saves you the extra step of reading from /etc/passwd.
|
|
|
|
| Thread Tools |
Search this Thread |
|
|
|
Posting Rules
|
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
HTML code is Off
|
|
|
All times are GMT -5. The time now is 07:11 AM.
|
|
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.
|
Latest Threads
LQ News
|
|