LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   Extracting strings (https://www.linuxquestions.org/questions/linux-newbie-8/extracting-strings-682148/)

tt1ect 11-09-2008 09:36 AM

Extracting strings
 
Hello friends,

I captured and saved to a file in plain txt from wireshark, i am trying to use awk or any command to extract the time, source ip address and destination address, i tried using grep it gives me all the line containg ip address, pls can anybody help me on this

this is how the information looks when you open the file:

No. Time Source Destination Protocol Info
1617 14.30 10.49.48.95 64.191.203.30 HTTP POST /login HTTP/1.1 (application/x-www-form-urlencoded)

tt1ect

pixellany 11-09-2008 09:52 AM

This looks like a simple task for AWK. Assuming that the format is always the same, you simply have to extract by field, using " " (space) as the delimiter.

Example---this print the 2nd, 3rd and 4th field, with tabs:

awk '{print $2"\t"$3"\t"$4}' filename

Really good AWK tutorial (and much more) here:

http://www.grymoire.com/Unix/Awk.html

jan61 11-09-2008 10:02 AM

Moin,

you don't need grep in this case. A simple way to get the fields 2 - 4 is cut:
Code:

jan@jack:~/tmp> echo '1617 14.30 10.49.48.95 64.191.203.30 HTTP POST /login HTTP/1.1 (application/x-www-form-urlencoded)' | cut -f2-4 -d' '
14.30 10.49.48.95 64.191.203.30

Another possibility is awk:
Code:

jan@jack:~/tmp> echo '1617 14.30 10.49.48.95 64.191.203.30 HTTP POST /login HTTP/1.1 (application/x-www-form-urlencoded)' | awk ' { print $2, $3, $4 } '
14.30 10.49.48.95 64.191.203.30

Or sed:
Code:

jan@jack:~/tmp> echo '1617 14.30 10.49.48.95 64.191.203.30 HTTP POST /login HTTP/1.1 (application/x-www-form-urlencoded)' | sed 's/^[0-9]* \([^ ]*\) \([^ ]*\) \([^ ]*\) .*/\1 \2 \3/'
14.30 10.49.48.95 64.191.203.30

I would prefer cut for the given input format - it's the simpliest tool.

Jan

pixellany 11-09-2008 10:15 AM

Please don't post the same thing twice----(perhaps it was an error)
I'll ask that the two be merged since both have replies

jan61 11-10-2008 12:26 PM

Moin,

Quote:

Originally Posted by pixellany (Post 3336174)
Please don't post the same thing twice----(perhaps it was an error)
I'll ask that the two be merged since both have replies

sorry - it really was an error - think I accidently double clicked the submit button.

Jan

Nylex 11-10-2008 12:43 PM

To be honest, I'd just save the data from Wireshark/tcpdump as a binary file and then write a C(++) program using libpcap to get the info you want.

unSpawn 11-10-2008 12:52 PM

Quote:

Originally Posted by Nylex (Post 3337355)
I'd just save the data from Wireshark/tcpdump as a binary file and then write a C(++) program using libpcap to get the info you want.

Nice answer! Could you please post some code then?

Nylex 11-10-2008 12:56 PM

Quote:

Originally Posted by unSpawn (Post 3337365)
Nice answer! Could you please post some code then?

Will do so when I've got a spare minute :).

pixellany 11-10-2008 01:42 PM

Quote:

Originally Posted by jan61 (Post 3337335)
Moin,



sorry - it really was an error - think I accidently double clicked the submit button.

Jan

My comment was not directed at you!!

OP posted this twice and each one had responses, so the threads were merged.

Nylex 11-10-2008 03:07 PM

When I was doing my MSc project, I had to analyse tcpdump data and initially tried string parsing. It was very messy, so I just went and learned to use libpcap. Makes life easier!

As requested, here's a C++ program making use of libpcap to get source and destination addresses and ports and timestamps:

Code:

#include <pcap.h>
#include <netinet/ip.h>
#include <netinet/in.h>
#include <netinet/ether.h>
#include <netinet/tcp.h>
#include <arpa/inet.h>
#include <iostream>
using namespace std;

void packet_handler(u_char *args, const struct pcap_pkthdr *packet_header, const u_char *packet);

int main(int argc, const char *argv[])
{
  const char *file = "tcpdump_file";
  char error_buffer[PCAP_ERRBUF_SIZE];
  pcap_t *handle = pcap_open_offline(file, error_buffer);

  pcap_loop(handle, -1, packet_handler, NULL); // Loop until we reach the end of the file
  return 0;
}

void packet_handler(u_char *args, const struct pcap_pkthdr *packet_header,
                    const u_char *packet)
{

  // The IP packet is the payload of the Ethernet frame, so we need to skip the
  // Ethernet frame's header to get to it.
  // The TCP packet is the payload of the IP packet, so we do a similar thing.
  struct ip *ip_packet = (struct ip*)(packet + sizeof(struct ether_header));
  struct tcphdr  *tcp_packet = (struct tcphdr*)(packet + sizeof(struct ether_header)
                                                + sizeof(struct ip));

  cout << "Source address: " << inet_ntoa(ip_packet->ip_src) << endl;
  cout << "Destination address: " << inet_ntoa(ip_packet->ip_dst) << endl;

  cout << "Source port: " << ntohs(tcp_packet->source) << endl;
  cout << "Destination port: " << ntohs(tcp_packet->dest) << endl;
 
  struct timeval ts = (*packet_header).ts;
  cout << "Seconds: " << ts.tv_sec << " Microseconds: " << ts.tv_usec << endl;
}

I suggest reading the pcap.h man page ("man pcap") and possibly man pages for other headers too.

unSpawn 11-10-2008 04:30 PM

Thanks for the -lpcap example.


All times are GMT -5. The time now is 10:06 PM.