[SOLVED] Pulling text from a text file and append to CSV
Linux - NewbieThis Linux forum is for members that are new to Linux.
Just starting out and have a question?
If it is not in the man pages or the how-to's this is the place!
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
I don't post here often, but scripting really isn't my thing but I have had an idea that may well save a lot of time !
We have a lot of staff who complete weekly timesheets on paper, and they are all returned here to HQ to process. One of the workers came up with a great idea - send in your timesheet details via SMS on a daily basis. We had a quick chat about formats and I have enough information to get started and prove the concept.
I have installed and configured SMS Server Tools3 on my SIM enabled netbook and have tested it's ability to receive SMS messages in a specific directory.
What I would like to do is create a script that will check the specified directory regularly, and from each file it finds, extract the information into a CSV file. The incoming format is like this:
Code:
From: 447123456789
Sent: 12-03-01 09:07:55
Received: 12-03-01 09:08:01
Subject: GSM1
Modem: GSM1
Report: no
Alphabet: ISO
Length: 25
1130:010312:SW:0930:1345
The last line is the bit that creates the timesheet, the colon being the separator in this example, I can use anything. The format in this example is:
Code:
EMPLOYEE_NO:DATE:CLIENT:START_TIME:ENDTIME
I would want the From, Sent, Received fields extracted as well.
So could someone point in the right direction with the tools I would need for a shell script? Ideally this would also create new CSV's every monday for the next seven days.
I am familiar with the linux shell, just not had to do any scripting in the past!
Any advice greatly appreciated!
Click here to see the post LQ members have rated as the most helpful post in this thread.
Let me first say this: what you describe is probably best suited for a scripting language like python or perl for readability. I may come back and edit this post or add a response with a python script.
That said, I took this as an opportunity to teach myself a little more about using sed. The command is long, but I managed to have sed do everything you asked (I think) when it comes to saving and formatting the data.
Here's the command:
Code:
sed -srn '/^[0-9]+/ { s@:@,@g ; H } ; /^From:/ { s@From: (.*)$@\1,@ ; H } ; /^Sent:/ { s@Sent: (.*)$@\1,@ ; H } ; /^Received:/ { s@Received: (.*)$@\1,@ ; H } ; $ { g ; s@\n@@g ; p }' filename.txt
Given the sample data you provided (and saved in a file named testdata.txt:
Code:
user@localhost$ sed -srn '/^[0-9]+/ { s@:@,@g ; H } ; /^From:/ { s@From: (.*)$@\1,@ ; H } ; /^Sent:/ { s@Sent: (.*)$@\1,@ ; H } ; /^Received:/ { s@Received: (.*)$@\1,@ ; H } ; $ { g ; s@\n@@g ; p }' testdata.txt
447123456789,12-03-01 09:07:55,12-03-01 09:08:01,1130,010312,SW,0930,1345
So, the first field is the "From," second field is the "Sent," third field is "Received," and the remaining fields match one-to-one the data in the last line with the colons replaced by commas.
If you do not want the colons replaced by commas, delete the part of the command in blue.
The idea is, you run the command on each file and redirect the output (with >>) to some central CSV file. For instance (I made some sample copies of your data file, modified them a bit, and left them all with a ".txt" extension):
Code:
user@localhost$ sed -srn '/^[0-9]+/ { s@:@,@g ; H } ; /^From:/ { s@From: (.*)$@\1,@ ; H } ; /^Sent:/ { s@Sent: (.*)$@\1,@ ; H } ; /^Received:/ { s@Received: (.*)$@\1,@ ; H } ; $ { g ; s@\n@@g ; p }' *.txt
447123456789,12-03-01 09:07:55,12-03-01 09:08:01,1130,010312,SW,0930,1345
447123456789,01-12-11 09:07:55,01-12-11 09:09:22,1130,010312,SW,0930,1345
447987654321,12-03-01 09:07:55,12-03-01 09:08:01,1130,010312,NE,1212,1345
You'd probably want to alias that command (if you choose to use it), because typing that sucker every time would be... error prone.
Like I said, I might come along and post a python script a little later. Right now: sleep.
Here's a modified form of the command... Still does the same thing. It just does it in fewer characters
Code:
sed -srn '/^[0-9]+/ { s@:@,@g ; H } ; /^(From|Sent|Received):/ { s@[^:]*:[[:space:]]*(.*)$@\1,@ ; H } ; $ { g ; s@\n@@g ; p }' *.txt
This one might be slightly easier to understand in that, it's clear from the "(From|Sent|Received)" used that sed is concerned with those lines
Just as an FYI, the above command triggers on four kinds of lines in the file. Lines that begin with:
1. From:
2. Sent:
3. Received:
4. <one or more digits>
The From, Sent and Received lines can have any number or type of space after the colon. Anything after the space(s) will be printed at the end. The line that starts with one or more digits will be printed regardless.
For the python script... it's more or less similar to catkin's shell script. This python script would be invoked as:
What I would like to do is create a script that will check the specified directory regularly, and from each file it finds, extract the information into a CSV file.
"check ... regularly" = cron job
You just need to determine how often. Cron can schedule a task once per minute all the way up to once per year.
"extract the information to a CSV" = use the command (or one of the scripts provided) and use redirection (e.g. "sed -srn <pattern> /path/to/specific/dir/*.txt >> /path/to/dir/with/file.csv")
Quote:
Ideally this would also create new CSV's every monday for the next seven days
Cron can also run a job on a specific day of the week (e.g. "Monday," Tuesday," etc.). All you would need to do for this task would be to mv the csv file to another name and then touch a new csv file with the same name as previously (so the commands/scripts can refer to the same generic name). For example:
Hats off to you chaps! I had already sorted uploading the csv to Google apps spreadsheets and manipulating in the middle, all the info here will now fill the gaps for me - and more to the point will get me started in scripting which is what I really want to do to develop other items here.
Really, really appreciate your contributions - extra kudos to you both for donating some of your time to help a charity! :-)
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.