LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   [Help] format text file for import into SQL table (with sed or similar) (https://www.linuxquestions.org/questions/linux-newbie-8/%5Bhelp%5D-format-text-file-for-import-into-sql-table-with-sed-or-similar-912034/)

totaluser 11-05-2011 11:48 AM

[Help] format text file for import into SQL table (with sed or similar)
 
Hello everybody,

I am working on a project where I need to import gathered data into SQLite database table
the textfile I am working with looks like this:
Code:

NAME: "2811 chassis", DESCR: "2811 chassis"
PID: CISCO2811        , VID: V06 , SN: XXXXXXXXXXX

NAME: "WAN Interface Card - Serial 2A/S on Slot 0 SubSlot 0", DESCR: "WAN Interface Card - Serial 2A/S"
PID: WIC-2A/S          , VID: V01, SN: XXXXXXXX 

NAME: "WAN Interface Card - Serial 2T on Slot 0 SubSlot 2", DESCR: "WAN Interface Card - Serial 2T"
PID: WIC-2T            , VID: V01, SN: XXXXXXXX

it can be anything from 1 two-line entry to as many as 15-20 two-line entries

what I want it to look like

Code:

2811 chassis|2811 chassis|CISCO2811|V06|XXXXXXXXXXX
WAN Interface Card - Serial 2A/S on Slot 0 SubSlot 0|WAN Interface Card - Serial 2A/S|WIC-2A/S|V01|XXXXXXXX
WAN Interface Card - Serial 2T on Slot 0 SubSlot 2|WAN Interface Card - Serial 2T|WIC-2T|V01|XXXXXXXX

as for the 1st line of each entry I managed to extract needed information using this sed command
Code:

sed -e "s/^.*\"\(.*\)\".*\"\(.*\)\".*$/\1\|\2\|/" sample.txt
I am struggling with the 2nd line of the entry
it always starts with 'PID: ' then comes the word I want to extract (it never contains any space) then comes variable amount of spaces followed by (,) and another space
after that always comes 'VID: ' followed by 3 characters I want to extract (it never contains any space)
that is followed by ', SN: ' and a word I want to extract (it never contains any space) after that may or may not follow variable amount os spaces. then END-OF-LINE


that is the structure of the file I need your help to process


thanks you very much in advance.

totaluser 11-05-2011 01:34 PM

OK,
I kludged together a very ugly piece of code but it does the job done (at least partly)

sed -e "s/^.*\"\(.*\)\".*\"\(.*\)\".*$/\1\|\2\|/" file.input | sed "s/ */ /g" | sed "s/SN: /|/" | sed "s/ , /,/g" | sed -e "s/^.* \(.*\),.* \(.*\)\,/\1\|\2/" | sed "s/ |/|/g" > file.output

I had some problems because some files had some extra spaces but this one extracts all the data I need

after executing this ugly long command the output looks like this


Code:

2811 chassis|2811 chassis|
CISCO2811|V06|FHXXXXXXX

WAN Interface Card - Serial 2A/S on Slot 0 SubSlot 0|WAN Interface Card - Serial 2A/S|
WIC-2A/S|V01|35XXXXX

WAN Interface Card - Serial 2T on Slot 0 SubSlot 2|WAN Interface Card - Serial 2T|
WIC-2T|V01|35XXXXX

now all I need is to concatenate the 2 lines together so I achieve this output
Code:

2811 chassis|2811 chassis|CISCO2811|V06|FHXXXXXXX

WAN Interface Card - Serial 2A/S on Slot 0 SubSlot 0|WAN Interface Card - Serial 2A/S|WIC-2A/S|V01|35XXXXX

WAN Interface Card - Serial 2T on Slot 0 SubSlot 2|WAN Interface Card - Serial 2T|WIC-2T|V01|35XXXXX

at the end of each st line there is character 13 (0x0D) if it helps


anybody that did not get a headache from my sed command willing to help?

tronayne 11-05-2011 01:50 PM

This looks like more of a job for sed and awk.

First thing I'd do is clean out all the junk in the input file with a file of sed editing instructions:
Code:

cat sedit
s/",/|/g
s/"/|/g
s/,/|/g
s/  */ /g
s/  *$//
s/PID: /PID: |/g
s/VID: /VID: |/g
s/SN: /SN: |/g
s/| /|/g
/^$/d
s/|$//g
s/$/|/g

Yeah, you can have sed instructions in a file; you'd run it
Code:

sed -f sedit sample.txt > junk.txt
Then look at what you got:
Code:

cat junk.txt
NAME: |2811 chassis|DESCR: |2811 chassis|
PID: |CISCO2811 |VID: |V06 |SN: |XXXXXXXXXXX|
NAME: |WAN Interface Card - Serial 2A/S on Slot 0 SubSlot 0|DESCR: |WAN Interface Card - Serial 2A/S|
PID: |WIC-2A/S |VID: |V01|SN: |XXXXXXXX|
NAME: |WAN Interface Card - Serial 2T on Slot 0 SubSlot 2|DESCR: |WAN Interface Card - Serial 2T|
PID: |WIC-2T |VID: |V01|SN: |XXXXXXXX|

Pretty slick, eh? We got rid of the quotes, the extra spaces, and we've got a field delimiter that we can use in awk.

Here's the awk program to read and format:
Code:

cat sample.awk
BEGIN {
        FS = "|"
}
{
        //      get a NAME line
        if ($1 == "NAME: " && $3 == "DESCR: ") {
                printf ("%s|%s|", $2, $4);
        }
        //      subsequent lines have PID
        if ($1 == "PID: " && $3 == "VID: ") {
                printf ("%s|%s|%s|\n", $2, $4, $6);
        }
}

And, finally, we run it
Code:

awk -f sample.awk junk.txt
2811 chassis|2811 chassis|CISCO2811 |V06 |XXXXXXXXXXX|
WAN Interface Card - Serial 2A/S on Slot 0 SubSlot 0|WAN Interface Card - Serial 2A/S|WIC-2A/S |V01|XXXXXXXX|
WAN Interface Card - Serial 2T on Slot 0 SubSlot 2|WAN Interface Card - Serial 2T|WIC-2T |V01|XXXXXXXX|

I dunno, looks pretty good to me. You could run it
Code:

awk -f sample.awk junk.txt > sample.unl
if you wanted to.

Hope this helps some.

totaluser 11-05-2011 03:57 PM

Thank you very much for your effort

it does the job quite well.
I modified it a bit to also remove spaces before |.
modified awk a bit so it works after this modification, added another variable to print and it does the job excellently.



One huge thanks tronayne,

have a nice day


All times are GMT -5. The time now is 10:09 AM.