LinuxQuestions.org
Help answer threads with 0 replies.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 03-01-2012, 05:37 AM   #1
fishy
LQ Newbie
 
Registered: Sep 2003
Posts: 15

Rep: Reputation: 0
Pulling text from a text file and append to CSV


Hi All,

I don't post here often, but scripting really isn't my thing but I have had an idea that may well save a lot of time !

We have a lot of staff who complete weekly timesheets on paper, and they are all returned here to HQ to process. One of the workers came up with a great idea - send in your timesheet details via SMS on a daily basis. We had a quick chat about formats and I have enough information to get started and prove the concept.

I have installed and configured SMS Server Tools3 on my SIM enabled netbook and have tested it's ability to receive SMS messages in a specific directory.

What I would like to do is create a script that will check the specified directory regularly, and from each file it finds, extract the information into a CSV file. The incoming format is like this:

Code:
From: 447123456789
Sent: 12-03-01 09:07:55
Received: 12-03-01 09:08:01
Subject: GSM1
Modem: GSM1
Report: no
Alphabet: ISO
Length: 25

1130:010312:SW:0930:1345
The last line is the bit that creates the timesheet, the colon being the separator in this example, I can use anything. The format in this example is:

Code:
EMPLOYEE_NO:DATE:CLIENT:START_TIME:ENDTIME
I would want the From, Sent, Received fields extracted as well.

So could someone point in the right direction with the tools I would need for a shell script? Ideally this would also create new CSV's every monday for the next seven days.

I am familiar with the linux shell, just not had to do any scripting in the past!

Any advice greatly appreciated!
 
Click here to see the post LQ members have rated as the most helpful post in this thread.
Old 03-01-2012, 06:55 AM   #2
Dark_Helmet
Senior Member
 
Registered: Jan 2003
Posts: 2,786

Rep: Reputation: 370Reputation: 370Reputation: 370Reputation: 370
Let me first say this: what you describe is probably best suited for a scripting language like python or perl for readability. I may come back and edit this post or add a response with a python script.

That said, I took this as an opportunity to teach myself a little more about using sed. The command is long, but I managed to have sed do everything you asked (I think) when it comes to saving and formatting the data.

Here's the command:
Code:
sed -srn '/^[0-9]+/ { s@:@,@g ; H } ; /^From:/ { s@From: (.*)$@\1,@ ; H } ; /^Sent:/ { s@Sent: (.*)$@\1,@ ; H } ; /^Received:/ { s@Received: (.*)$@\1,@ ; H } ; $ { g ; s@\n@@g ; p }' filename.txt
Given the sample data you provided (and saved in a file named testdata.txt:
Code:
user@localhost$ sed -srn '/^[0-9]+/ { s@:@,@g ; H } ; /^From:/ { s@From: (.*)$@\1,@ ; H } ; /^Sent:/ { s@Sent: (.*)$@\1,@ ; H } ; /^Received:/ { s@Received: (.*)$@\1,@ ; H } ; $ { g ; s@\n@@g ; p }' testdata.txt
447123456789,12-03-01 09:07:55,12-03-01 09:08:01,1130,010312,SW,0930,1345
So, the first field is the "From," second field is the "Sent," third field is "Received," and the remaining fields match one-to-one the data in the last line with the colons replaced by commas.

If you do not want the colons replaced by commas, delete the part of the command in blue.

The idea is, you run the command on each file and redirect the output (with >>) to some central CSV file. For instance (I made some sample copies of your data file, modified them a bit, and left them all with a ".txt" extension):
Code:
user@localhost$ sed -srn '/^[0-9]+/ { s@:@,@g ; H } ; /^From:/ { s@From: (.*)$@\1,@ ; H } ; /^Sent:/ { s@Sent: (.*)$@\1,@ ; H } ; /^Received:/ { s@Received: (.*)$@\1,@ ; H } ; $ { g ; s@\n@@g ; p }' *.txt
447123456789,12-03-01 09:07:55,12-03-01 09:08:01,1130,010312,SW,0930,1345
447123456789,01-12-11 09:07:55,01-12-11 09:09:22,1130,010312,SW,0930,1345
447987654321,12-03-01 09:07:55,12-03-01 09:08:01,1130,010312,NE,1212,1345
You'd probably want to alias that command (if you choose to use it), because typing that sucker every time would be... error prone.

Like I said, I might come along and post a python script a little later. Right now: sleep.
 
2 members found this post helpful.
Old 03-01-2012, 07:36 AM   #3
fishy
LQ Newbie
 
Registered: Sep 2003
Posts: 15

Original Poster
Rep: Reputation: 0
Wow!

Thank you so much - that will definitely get me going :-)

Have a great sleep!
 
Old 03-01-2012, 07:56 AM   #4
catkin
LQ 5k Club
 
Registered: Dec 2008
Location: Tamil Nadu, India
Distribution: Debian
Posts: 8,576
Blog Entries: 31

Rep: Reputation: 1195Reputation: 1195Reputation: 1195Reputation: 1195Reputation: 1195Reputation: 1195Reputation: 1195Reputation: 1195Reputation: 1195
Here's a pure bash solution for the parsing and output
Code:
#!/bin/bash

while read -r line
do
    case $line in
        'From:'* )
            from=${line#From: }
            ;;  
        'Sent:'* )
            sent=${line#Sent: }
            ;;  
        'Received:'* )
            received=${line#Received: }
            ;;  
        'Subject:'* | 'Modem:'* | 'Report:'* | 'Alphabet:'* | 'Length:'* | '' )
            ;;  
        * ) 
            array=( ${line//:/ } ) 
            employee_no=${array[0]}
            date=${array[1]}
            client=${array[2]}
            start_time=${array[3]}
            endtime=${array[4]}
    esac
done < input.txt

echo "'$from','$sent','$received','$employee_no','$date','$client','$start_time','$endtime'"
 
1 members found this post helpful.
Old 03-01-2012, 03:07 PM   #5
Dark_Helmet
Senior Member
 
Registered: Jan 2003
Posts: 2,786

Rep: Reputation: 370Reputation: 370Reputation: 370Reputation: 370
Here's a modified form of the command... Still does the same thing. It just does it in fewer characters

Code:
sed -srn '/^[0-9]+/ { s@:@,@g ; H } ; /^(From|Sent|Received):/ { s@[^:]*:[[:space:]]*(.*)$@\1,@ ; H } ; $ { g ; s@\n@@g ; p }' *.txt
This one might be slightly easier to understand in that, it's clear from the "(From|Sent|Received)" used that sed is concerned with those lines

Just as an FYI, the above command triggers on four kinds of lines in the file. Lines that begin with:
1. From:
2. Sent:
3. Received:
4. <one or more digits>

The From, Sent and Received lines can have any number or type of space after the colon. Anything after the space(s) will be printed at the end. The line that starts with one or more digits will be printed regardless.

For the python script... it's more or less similar to catkin's shell script. This python script would be invoked as:
Code:
/path/to/script.py /path/to/sms_files/*.txt
The script:
Code:
#!/usr/bin/python

import fileinput
import re

lastFilename = ""
outputLine = ""

for inputLine in fileinput.input():
    if( fileinput.filename() != lastFilename ):
        lastFilename = fileinput.filename()
        if( outputLine != "" ):
            print( outputLine[1:]  )
            outputLine = ""
    if( inputLine[:5] == "Sent:" or inputLine[:5] == "From:" ):
        outputLine += "," + inputLine[6:].rstrip()
    elif( inputLine[:9] == "Received:" ):
        outputLine += "," + inputLine[10:].rstrip()
    else:
        reMatches = re.match( r'[0-9]+', inputLine )
        if( reMatches != None ):
            outputLine += "," + ','.join( inputLine.split( ':' ) ).rstrip()

print( outputLine[1:] )
As for the rest of your questions...

Quote:
What I would like to do is create a script that will check the specified directory regularly, and from each file it finds, extract the information into a CSV file.
"check ... regularly" = cron job
You just need to determine how often. Cron can schedule a task once per minute all the way up to once per year.

"extract the information to a CSV" = use the command (or one of the scripts provided) and use redirection (e.g. "sed -srn <pattern> /path/to/specific/dir/*.txt >> /path/to/dir/with/file.csv")

Quote:
Ideally this would also create new CSV's every monday for the next seven days
Cron can also run a job on a specific day of the week (e.g. "Monday," Tuesday," etc.). All you would need to do for this task would be to mv the csv file to another name and then touch a new csv file with the same name as previously (so the commands/scripts can refer to the same generic name). For example:
Code:
mv /path/to/dir/with/file.csv /path/to/archive/dir/file_$(date +%Y%m%d).csv ; touch /path/to/dir/with/file.csv

Last edited by Dark_Helmet; 03-01-2012 at 04:07 PM.
 
1 members found this post helpful.
Old 03-02-2012, 08:32 AM   #6
fishy
LQ Newbie
 
Registered: Sep 2003
Posts: 15

Original Poster
Rep: Reputation: 0
Hats off to you chaps! I had already sorted uploading the csv to Google apps spreadsheets and manipulating in the middle, all the info here will now fill the gaps for me - and more to the point will get me started in scripting which is what I really want to do to develop other items here.

Really, really appreciate your contributions - extra kudos to you both for donating some of your time to help a charity! :-)

I'll post back here what I end up with!

Regards

Ifan
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
How to append text to the second line of a file SentralOrigin Programming 23 11-06-2010 08:19 PM
append text to file cyberrate Linux - Software 3 12-02-2009 08:25 AM
How can I append text to a string in a file HGeneAnthony Linux - Newbie 4 03-01-2007 01:16 PM
Pulling only additional information from a text file Centinul Programming 40 09-19-2006 08:17 PM
SH SCRIPT +pulling information from a text file chrisfirestar Linux - General 1 01-30-2004 08:14 AM


All times are GMT -5. The time now is 12:37 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration