LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Software (http://www.linuxquestions.org/questions/linux-software-2/)
-   -   how to sort text file and split into smaller files (http://www.linuxquestions.org/questions/linux-software-2/how-to-sort-text-file-and-split-into-smaller-files-590215/)

michaeljoser 10-08-2007 05:40 AM

how to sort text file and split into smaller files
 
Hi,

I want to be able to sort a big file which is in the format below. So i want to sort it by ip address then split the file by the ipaddress so i will end up with multiple file containing records for one ip only in each of them... i came across the sort command but not sure how to split the file with the ipaddress as criteria..
thx



RECV <41.212.144.207:27015>: L 10/07/2007 - 13:25:22: Team "TERRORIST" triggered "Terrorists_Win" (CT "4") (T "2")
RECV <41.212.144.207:27015>: L 10/07/2007 - 13:25:22: World triggered "Round_End"
RECV <41.212.144.207:27015>: L 10/07/2007 - 13:25:26: "(o.0)_.!.<7><STEAM_ID_LAN><CT>" say "aaaa ti p envi coupe oli" (dead)
RECV <41.212.144.207:27015>: L 10/07/2007 - 13:25:27: "theEnd<5><STEAM_ID_LAN><TERRORIST>" triggered "Spawned_With_The_Bomb"
RECV <41.212.144.207:27015>: L 10/07/2007 - 13:25:29: "(o.0)_.!.<7><STEAM_ID_LAN><CT>" say "salope vans"
RECV <41.212.158.233:27015>: L 10/07/2007 - 13:23:22: "x3non<2><STEAM_ID_LAN><TERRORIST>" killed "noob<8><STEAM_ID_LAN><CT>" with "galil"
RECV <41.212.158.233:27015>: L 10/07/2007 - 13:23:22: "TaZ<4><STEAM_ID_LAN><CT>" killed "x3non<2><STEAM_ID_LAN><TERRORIST>" with "awp"
RECV <41.212.144.207:27015>: L 10/07/2007 - 13:25:33: World triggered "Round_Start"
RECV <41.212.144.207:27015>: L 10/07/2007 - 13:25:34: "payne<2><STEAM_ID_LAN><TERRORIST>" say " kill oli get 1000 pt XD"
RECV <41.212.144.207:27015>: L 10/07/2007 - 13:25:34: "(o.0)_.!.<7><STEAM_ID_LAN><CT>" say "ti p atane oli"
RECV <41.212.158.233:27015>: L 10/07/2007 - 13:23:27: "TaZ<4><STEAM_ID_LAN><CT>" killed "NuLL<9><STEAM_ID_LAN><TERRORIST>" with "awp"
RECV <41.212.158.233:27015>: L 10/07/2007 - 13:23:29: "TaZ<4><STEAM_ID_LAN><CT>" killed "Emo|Jpol<6><STEAM_ID_LAN><TERRORIST>" with "deagle"
RECV <41.212.158.233:27015>: L 10/07/2007 - 13:23:29: Team "CT" triggered "CTs_Win" (CT "13") (T "5")
RECV <41.212.158.233:27015>: L 10/07/2007 - 13:23:29: World triggered "Round_End"
RECV <41.212.144.207:27015>: L 10/07/2007 - 13:25:39: "theEnd<5><STEAM_ID_LAN><TERRORIST>" say "twa vans?"
RECV <41.212.158.233:27015>: L 10/07/2007 - 13:23:32: "x3non<2><STEAM_ID_LAN><TERRORIST>" say_team "ok" (dead)
RECV <41.212.158.233:27015>: L 10/07/2007 - 13:23:33: "noob<8><STEAM_ID_LAN><CT>" say "wawa" (dead)
RECV <41.212.158.233:27015>: L 10/07/2007 - 13:23:34: "x3non<2><STEAM_ID_LAN><TERRORIST>" say_team "aster" (dead)
RECV <41.212.158.233:27015>: L 10/07/2007 - 13:23:35: World triggered "Round_Start"
RECV <41.212.144.207:27015>: L 10/07/2007 - 13:25:45: "oLi<3><STEAM_ID_LAN><TERRORIST>" say "lol lash sa? :P"
RECV <41.212.158.233:27015>: L 10/07/2007 - 13:23:36: "NuLL<9><STEAM_ID_LAN><TERRORIST>" say_team "pa P bon ditou :S"
RECV <41.212.144.207:27015>: L 10/07/2007 - 13:25:51: "la vie | Ishikawa<6><STEAM_ID_LAN><CT>" say "koter?"
RECV <41.212.144.207:27015>: L 10/07/2007 - 13:25:51: "- IceBladder -<4><STEAM_ID_LAN><CT>" say "OLI"
RECV <41.212.144.207:27015>: L 10/07/2007 - 13:25:54: "oLi<3><STEAM_ID_LAN><TERRORIST>" say "wa?"
RECV <41.212.144.207:27015>: L 10/07/2007 - 13:25:54: "- IceBladder -<4><STEAM_ID_LAN><CT>" say "apache dir toi rode to team"
RECV <41.212.144.207:27015>: L 10/07/2007 - 13:25:57: "- IceBladder -<4><STEAM_ID_LAN><CT>" say "li vini dan 5min"
RECV <41.212.158.233:27015>: L 10/07/2007 - 13:23:50: "TaZ<4><STEAM_ID_LAN><CT>" killed "x3non<2><STEAM_ID_LAN

druuna 10-08-2007 06:27 AM

Hi,

Something like this?

Code:

#!/bin/bash

INFILE="$1"

sort ${INFILE} |\
awk '
BEGIN { FS = "[<>:]" }
{
 print $0 >> $2
}'

This sorts the infile first, the awk will put a specific line into a file named $2 (which is the ip adres). Awk can handle multiple field seperators (I used <> and : in this example).

A sample run looks like this:
Code:

$ cat sort.split.infile
RECV <41.212.144.207:27015>: L 10/07/2007 - 13:25:22: Team "TERRORIST" triggered "Terrorists_Win" (CT "4") (T "2")
RECV <41.212.144.207:27015>: L 10/07/2007 - 13:25:22: World triggered "Round_End"
RECV <41.212.144.207:27015>: L 10/07/2007 - 13:25:26: "(o.0)_.!.<7><STEAM_ID_LAN><CT>" say "aaaa ti p envi coupe oli" (dead)
RECV <41.212.144.207:27015>: L 10/07/2007 - 13:25:27: "theEnd<5><STEAM_ID_LAN><TERRORIST>" triggered "Spawned_With_The_Bomb"
RECV <41.212.144.207:27015>: L 10/07/2007 - 13:25:29: "(o.0)_.!.<7><STEAM_ID_LAN><CT>" say "salope vans"
RECV <41.212.158.233:27015>: L 10/07/2007 - 13:23:22: "x3non<2><STEAM_ID_LAN><TERRORIST>" killed "noob<8><STEAM_ID_LAN><CT>" with "galil"
RECV <41.212.158.233:27015>: L 10/07/2007 - 13:23:22: "TaZ<4><STEAM_ID_LAN><CT>" killed "x3non<2><STEAM_ID_LAN><TERRORIST>" with "awp"
RECV <41.212.144.207:27015>: L 10/07/2007 - 13:25:33: World triggered "Round_Start"
RECV <41.212.144.207:27015>: L 10/07/2007 - 13:25:34: "payne<2><STEAM_ID_LAN><TERRORIST>" say " kill oli get 1000 pt XD"
RECV <41.212.144.207:27015>: L 10/07/2007 - 13:25:34: "(o.0)_.!.<7><STEAM_ID_LAN><CT>" say "ti p atane oli"
RECV <41.212.158.233:27015>: L 10/07/2007 - 13:23:27: "TaZ<4><STEAM_ID_LAN><CT>" killed "NuLL<9><STEAM_ID_LAN><TERRORIST>" with "awp"
RECV <41.212.158.233:27015>: L 10/07/2007 - 13:23:29: "TaZ<4><STEAM_ID_LAN><CT>" killed "Emo|Jpol<6><STEAM_ID_LAN><TERRORIST>" with "deagle"
RECV <41.212.158.233:27015>: L 10/07/2007 - 13:23:29: Team "CT" triggered "CTs_Win" (CT "13") (T "5")
RECV <41.212.158.233:27015>: L 10/07/2007 - 13:23:29: World triggered "Round_End"
RECV <41.212.144.207:27015>: L 10/07/2007 - 13:25:39: "theEnd<5><STEAM_ID_LAN><TERRORIST>" say "twa vans?"
RECV <41.212.158.233:27015>: L 10/07/2007 - 13:23:32: "x3non<2><STEAM_ID_LAN><TERRORIST>" say_team "ok" (dead)
RECV <41.212.158.233:27015>: L 10/07/2007 - 13:23:33: "noob<8><STEAM_ID_LAN><CT>" say "wawa" (dead)
RECV <41.212.158.233:27015>: L 10/07/2007 - 13:23:34: "x3non<2><STEAM_ID_LAN><TERRORIST>" say_team "aster" (dead)
RECV <41.212.158.233:27015>: L 10/07/2007 - 13:23:35: World triggered "Round_Start"
RECV <41.212.144.207:27015>: L 10/07/2007 - 13:25:45: "oLi<3><STEAM_ID_LAN><TERRORIST>" say "lol lash sa? :P"
RECV <41.212.158.233:27015>: L 10/07/2007 - 13:23:36: "NuLL<9><STEAM_ID_LAN><TERRORIST>" say_team "pa P bon ditou :S"
RECV <41.212.144.207:27015>: L 10/07/2007 - 13:25:51: "la vie | Ishikawa<6><STEAM_ID_LAN><CT>" say "koter?"
RECV <41.212.144.207:27015>: L 10/07/2007 - 13:25:51: "- IceBladder -<4><STEAM_ID_LAN><CT>" say "OLI"
RECV <41.212.144.207:27015>: L 10/07/2007 - 13:25:54: "oLi<3><STEAM_ID_LAN><TERRORIST>" say "wa?"
RECV <41.212.144.207:27015>: L 10/07/2007 - 13:25:54: "- IceBladder -<4><STEAM_ID_LAN><CT>" say "apache dir toi rode to team"
RECV <41.212.144.207:27015>: L 10/07/2007 - 13:25:57: "- IceBladder -<4><STEAM_ID_LAN><CT>" say "li vini dan 5min"
RECV <41.212.158.233:27015>: L 10/07/2007 - 13:23:50: "TaZ<4><STEAM_ID_LAN><CT>" killed "x3non<2><STEAM_ID_LAN


$ ./sort.split.sh sort.split.infile

$ ls -l 4*
-rw-r----- 1 druuna internet 1631 Oct  8 12:24 41.212.144.207
-rw-r----- 1 druuna internet 1370 Oct  8 12:24 41.212.158.233

$ cat 41.212.144.207
RECV <41.212.144.207:27015>: L 10/07/2007 - 13:25:22: Team "TERRORIST" triggered "Terrorists_Win" (CT "4") (T "2")
RECV <41.212.144.207:27015>: L 10/07/2007 - 13:25:22: World triggered "Round_End"
RECV <41.212.144.207:27015>: L 10/07/2007 - 13:25:26: "(o.0)_.!.<7><STEAM_ID_LAN><CT>" say "aaaa ti p envi coupe oli" (dead)
RECV <41.212.144.207:27015>: L 10/07/2007 - 13:25:27: "theEnd<5><STEAM_ID_LAN><TERRORIST>" triggered "Spawned_With_The_Bomb"
RECV <41.212.144.207:27015>: L 10/07/2007 - 13:25:29: "(o.0)_.!.<7><STEAM_ID_LAN><CT>" say "salope vans"
RECV <41.212.144.207:27015>: L 10/07/2007 - 13:25:33: World triggered "Round_Start"
RECV <41.212.144.207:27015>: L 10/07/2007 - 13:25:34: "(o.0)_.!.<7><STEAM_ID_LAN><CT>" say "ti p atane oli"
RECV <41.212.144.207:27015>: L 10/07/2007 - 13:25:34: "payne<2><STEAM_ID_LAN><TERRORIST>" say " kill oli get 1000 pt XD"
RECV <41.212.144.207:27015>: L 10/07/2007 - 13:25:39: "theEnd<5><STEAM_ID_LAN><TERRORIST>" say "twa vans?"
RECV <41.212.144.207:27015>: L 10/07/2007 - 13:25:45: "oLi<3><STEAM_ID_LAN><TERRORIST>" say "lol lash sa? :P"
RECV <41.212.144.207:27015>: L 10/07/2007 - 13:25:51: "- IceBladder -<4><STEAM_ID_LAN><CT>" say "OLI"
RECV <41.212.144.207:27015>: L 10/07/2007 - 13:25:51: "la vie | Ishikawa<6><STEAM_ID_LAN><CT>" say "koter?"
RECV <41.212.144.207:27015>: L 10/07/2007 - 13:25:54: "- IceBladder -<4><STEAM_ID_LAN><CT>" say "apache dir toi rode to team"
RECV <41.212.144.207:27015>: L 10/07/2007 - 13:25:54: "oLi<3><STEAM_ID_LAN><TERRORIST>" say "wa?"
RECV <41.212.144.207:27015>: L 10/07/2007 - 13:25:57: "- IceBladder -<4><STEAM_ID_LAN><CT>" say "li vini dan 5min"

 $ cat 41.212.158.233
RECV <41.212.158.233:27015>: L 10/07/2007 - 13:23:22: "TaZ<4><STEAM_ID_LAN><CT>" killed "x3non<2><STEAM_ID_LAN><TERRORIST>" with "awp"
RECV <41.212.158.233:27015>: L 10/07/2007 - 13:23:22: "x3non<2><STEAM_ID_LAN><TERRORIST>" killed "noob<8><STEAM_ID_LAN><CT>" with "galil"
RECV <41.212.158.233:27015>: L 10/07/2007 - 13:23:27: "TaZ<4><STEAM_ID_LAN><CT>" killed "NuLL<9><STEAM_ID_LAN><TERRORIST>" with "awp"
RECV <41.212.158.233:27015>: L 10/07/2007 - 13:23:29: "TaZ<4><STEAM_ID_LAN><CT>" killed "Emo|Jpol<6><STEAM_ID_LAN><TERRORIST>" with "deagle"
RECV <41.212.158.233:27015>: L 10/07/2007 - 13:23:29: Team "CT" triggered "CTs_Win" (CT "13") (T "5")
RECV <41.212.158.233:27015>: L 10/07/2007 - 13:23:29: World triggered "Round_End"
RECV <41.212.158.233:27015>: L 10/07/2007 - 13:23:32: "x3non<2><STEAM_ID_LAN><TERRORIST>" say_team "ok" (dead)
RECV <41.212.158.233:27015>: L 10/07/2007 - 13:23:33: "noob<8><STEAM_ID_LAN><CT>" say "wawa" (dead)
RECV <41.212.158.233:27015>: L 10/07/2007 - 13:23:34: "x3non<2><STEAM_ID_LAN><TERRORIST>" say_team "aster" (dead)
RECV <41.212.158.233:27015>: L 10/07/2007 - 13:23:35: World triggered "Round_Start"
RECV <41.212.158.233:27015>: L 10/07/2007 - 13:23:36: "NuLL<9><STEAM_ID_LAN><TERRORIST>" say_team "pa P bon ditou :S"
RECV <41.212.158.233:27015>: L 10/07/2007 - 13:23:50: "TaZ<4><STEAM_ID_LAN><CT>" killed "x3non<2><STEAM_ID_LAN

Hope this helps.

michaeljoser 10-08-2007 09:12 AM

thanks a lot ... wow that's nice script

michaeljoser 10-08-2007 11:26 AM

the script is very nice but i get a problem when there's a blank line in the log file.... i tried this:
Code:

sed '/^$/d' myFile > tt
without success :(

any tip on how to remove the blank lines first. oh a last little thing...

how can we remove the "RECV <xx.xxx.xxx.xxx:99999>:" bit from the new files being created??

thx a lot again

druuna 10-08-2007 11:53 AM

Hi,

Code:

#!/bin/bash

INFILE="$1"

sort ${INFILE} |\
awk '
BEGIN { FS = "[<>:]" }
 /RECV/ { print substr($0,30) >> $2 }
'

The awk statement is altered. It now searches for lines that have RECV in them, ignoring all others. If a line matches, everything from character 30 to end is printed (removing the RECV <xx.xxx.xxx.xxx:99999>: part).

Hope this helps.

michaeljoser 10-08-2007 01:37 PM

what should i say.... PERFECT!!! :p

thanks a lot!!!

plus the script teaches me a lot more that can be done now. thanks!!!

michaeljoser 10-18-2007 08:11 AM

I modified the script this way:
Code:

#!/bin/bash

INFILE="$1"

sort ${INFILE} |
awk '
BEGIN { FS = "[<>:]" }
 /RECV/ { print substr($0,16+length($2)) >> $2.".log" }
'

but i want to add a number infront of the logfiles so that the output files are like this:
10000.111.111.11.11.log
10001.200.111.11.21.log
10002.111.111.11.31.log
10003.111.111.11.41.log
10004.111.111.11.51.log
10005.111.111.11.61.log

anyone?

thanks for the help

colucix 10-18-2007 10:10 AM

You may use an array in AWK whose indices are the IP addresses and whose elements are progressive numbers, like this:
Code:

#!/bin/bash

INFILE="$1"

sort ${INFILE} |
awk '
BEGIN { FS = "[<>:]" ; COUNT = 10000 }
 /RECV/ { if ( ! ($2 in prefix) ) prefix[$2] = COUNT++ ;
          print substr($0,16+length($2)) >> prefix[$2]"."$2.".log" }
'

Every time a new IP adress is encountered a new number is computed and assigned to the log filename.

michaeljoser 10-19-2007 02:50 AM

thanks a lot :P it did the job very nicely


All times are GMT -5. The time now is 12:27 AM.