LinuxQuestions.org
Latest LQ Deal: Complete CCNA, CCNP & Red Hat Certification Training Bundle
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 11-05-2011, 12:48 PM   #1
totaluser
LQ Newbie
 
Registered: Nov 2011
Posts: 29

Rep: Reputation: Disabled
[Help] format text file for import into SQL table (with sed or similar)


Hello everybody,

I am working on a project where I need to import gathered data into SQLite database table
the textfile I am working with looks like this:
Code:
NAME: "2811 chassis", DESCR: "2811 chassis"
PID: CISCO2811         , VID: V06 , SN: XXXXXXXXXXX

NAME: "WAN Interface Card - Serial 2A/S on Slot 0 SubSlot 0", DESCR: "WAN Interface Card - Serial 2A/S"
PID: WIC-2A/S          , VID: V01, SN: XXXXXXXX   

NAME: "WAN Interface Card - Serial 2T on Slot 0 SubSlot 2", DESCR: "WAN Interface Card - Serial 2T"
PID: WIC-2T            , VID: V01, SN: XXXXXXXX
it can be anything from 1 two-line entry to as many as 15-20 two-line entries

what I want it to look like

Code:
2811 chassis|2811 chassis|CISCO2811|V06|XXXXXXXXXXX
WAN Interface Card - Serial 2A/S on Slot 0 SubSlot 0|WAN Interface Card - Serial 2A/S|WIC-2A/S|V01|XXXXXXXX
WAN Interface Card - Serial 2T on Slot 0 SubSlot 2|WAN Interface Card - Serial 2T|WIC-2T|V01|XXXXXXXX
as for the 1st line of each entry I managed to extract needed information using this sed command
Code:
sed -e "s/^.*\"\(.*\)\".*\"\(.*\)\".*$/\1\|\2\|/" sample.txt
I am struggling with the 2nd line of the entry
it always starts with 'PID: ' then comes the word I want to extract (it never contains any space) then comes variable amount of spaces followed by (,) and another space
after that always comes 'VID: ' followed by 3 characters I want to extract (it never contains any space)
that is followed by ', SN: ' and a word I want to extract (it never contains any space) after that may or may not follow variable amount os spaces. then END-OF-LINE


that is the structure of the file I need your help to process


thanks you very much in advance.
 
Old 11-05-2011, 02:34 PM   #2
totaluser
LQ Newbie
 
Registered: Nov 2011
Posts: 29

Original Poster
Rep: Reputation: Disabled
OK,
I kludged together a very ugly piece of code but it does the job done (at least partly)

sed -e "s/^.*\"\(.*\)\".*\"\(.*\)\".*$/\1\|\2\|/" file.input | sed "s/ */ /g" | sed "s/SN: /|/" | sed "s/ , /,/g" | sed -e "s/^.* \(.*\),.* \(.*\)\,/\1\|\2/" | sed "s/ |/|/g" > file.output

I had some problems because some files had some extra spaces but this one extracts all the data I need

after executing this ugly long command the output looks like this


Code:
2811 chassis|2811 chassis|
CISCO2811|V06|FHXXXXXXX

WAN Interface Card - Serial 2A/S on Slot 0 SubSlot 0|WAN Interface Card - Serial 2A/S|
WIC-2A/S|V01|35XXXXX 

WAN Interface Card - Serial 2T on Slot 0 SubSlot 2|WAN Interface Card - Serial 2T|
WIC-2T|V01|35XXXXX
now all I need is to concatenate the 2 lines together so I achieve this output
Code:
2811 chassis|2811 chassis|CISCO2811|V06|FHXXXXXXX

WAN Interface Card - Serial 2A/S on Slot 0 SubSlot 0|WAN Interface Card - Serial 2A/S|WIC-2A/S|V01|35XXXXX 

WAN Interface Card - Serial 2T on Slot 0 SubSlot 2|WAN Interface Card - Serial 2T|WIC-2T|V01|35XXXXX
at the end of each st line there is character 13 (0x0D) if it helps


anybody that did not get a headache from my sed command willing to help?
 
Old 11-05-2011, 02:50 PM   #3
tronayne
Senior Member
 
Registered: Oct 2003
Location: Northeastern Michigan, where Carhartt is a Designer Label
Distribution: Slackware 32- & 64-bit Stable
Posts: 3,516

Rep: Reputation: 1039Reputation: 1039Reputation: 1039Reputation: 1039Reputation: 1039Reputation: 1039Reputation: 1039Reputation: 1039
This looks like more of a job for sed and awk.

First thing I'd do is clean out all the junk in the input file with a file of sed editing instructions:
Code:
cat sedit
s/",/|/g
s/"/|/g
s/,/|/g
s/  */ /g
s/  *$//
s/PID: /PID: |/g
s/VID: /VID: |/g
s/SN: /SN: |/g
s/| /|/g
/^$/d
s/|$//g
s/$/|/g
Yeah, you can have sed instructions in a file; you'd run it
Code:
sed -f sedit sample.txt > junk.txt
Then look at what you got:
Code:
cat junk.txt
NAME: |2811 chassis|DESCR: |2811 chassis|
PID: |CISCO2811 |VID: |V06 |SN: |XXXXXXXXXXX|
NAME: |WAN Interface Card - Serial 2A/S on Slot 0 SubSlot 0|DESCR: |WAN Interface Card - Serial 2A/S|
PID: |WIC-2A/S |VID: |V01|SN: |XXXXXXXX|
NAME: |WAN Interface Card - Serial 2T on Slot 0 SubSlot 2|DESCR: |WAN Interface Card - Serial 2T|
PID: |WIC-2T |VID: |V01|SN: |XXXXXXXX|
Pretty slick, eh? We got rid of the quotes, the extra spaces, and we've got a field delimiter that we can use in awk.

Here's the awk program to read and format:
Code:
cat sample.awk
BEGIN {
        FS = "|"
}
{
        //      get a NAME line
        if ($1 == "NAME: " && $3 == "DESCR: ") {
                printf ("%s|%s|", $2, $4);
        }
        //      subsequent lines have PID
        if ($1 == "PID: " && $3 == "VID: ") {
                printf ("%s|%s|%s|\n", $2, $4, $6);
        }
}
And, finally, we run it
Code:
awk -f sample.awk junk.txt
2811 chassis|2811 chassis|CISCO2811 |V06 |XXXXXXXXXXX|
WAN Interface Card - Serial 2A/S on Slot 0 SubSlot 0|WAN Interface Card - Serial 2A/S|WIC-2A/S |V01|XXXXXXXX|
WAN Interface Card - Serial 2T on Slot 0 SubSlot 2|WAN Interface Card - Serial 2T|WIC-2T |V01|XXXXXXXX|
I dunno, looks pretty good to me. You could run it
Code:
awk -f sample.awk junk.txt > sample.unl
if you wanted to.

Hope this helps some.

Last edited by tronayne; 11-05-2011 at 04:29 PM. Reason: Not Informix, dummy, SQLite! Oof.
 
1 members found this post helpful.
Old 11-05-2011, 04:57 PM   #4
totaluser
LQ Newbie
 
Registered: Nov 2011
Posts: 29

Original Poster
Rep: Reputation: Disabled
Thank you very much for your effort

it does the job quite well.
I modified it a bit to also remove spaces before |.
modified awk a bit so it works after this modification, added another variable to print and it does the job excellently.



One huge thanks tronayne,

have a nice day
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
replacement with sed on a large .sql db file duzap Linux - General 3 04-19-2010 05:40 PM
How to import redhat users to ClearOS using CSV file format. alfaedh Linux - Server 1 01-31-2010 04:48 PM
parsing text using sed/awk or similar??? freeindy Programming 5 07-24-2008 05:04 AM
import .sql file through php exec secretlydead Programming 1 08-16-2007 03:12 AM
SED - display text on specific line of text file 3saul Linux - Software 3 12-29-2005 05:32 PM


All times are GMT -5. The time now is 12:45 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration