LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Software (https://www.linuxquestions.org/questions/linux-software-2/)
-   -   wanting to process v-card data (https://www.linuxquestions.org/questions/linux-software-2/wanting-to-process-v-card-data-4175434101/)

SaintDanBert 10-25-2012 05:46 PM

wanting to process v-card data
 
I have v-card files (VCF) that have contact pictures encoded in them. The spec says that these are "base64" encoded if they are not a URL to the photo file. The original photo files are who knows where.

Can someone tell me how to take these photo-blocks and decode them into images?

Does anyone know of a linux-based contact manager or address book that can do all or part of this Processing summary:
  1. Open a VCF file
  2. decode any photo block into its image
  3. save the image into a file
  4. write the VCF fields to comma-separated record
    using "normalized" field names and logging "broken" v-cards (too much missing detail, etc)
  5. name the image as a URL in the record
  6. repeat for all contacts in the file
  7. store the CSV data into mySQL table
  8. repeat for all VCF files
  9. spindle and mutilate the mySQL table to remove duplicates

Thanks in advance,
~~~ 0;-Dan

schneidz 10-25-2012 06:15 PM

do the usual suspects handle your vcf format (thunderbird, evolution, ...) ?

else you may need to do something custom with dd or write a c-program to grab the bytes you want.

the rest of the stuff seems script-able; whats does "normalized" (#4) mean ?

chrism01 10-25-2012 07:48 PM

I'd definitely use Perl for that using eg http://search.cpan.org/~llap/Text-vC...Addressbook.pm, although I'd also ask over at perlmonks.org (its where the Perl gurus hang out), in case there are even better/easier modules/techniques available.

Perl has the module http://search.cpan.org/~capttofu/DBD...b/DBD/mysql.pm for MySQL, for which you'll also need the DBI module.
If you have installed MySQL, you may already have those available. You should be able to get them from your repo, not CPAN direct.

schneidz 10-26-2012 07:57 AM

i havent used any of these programs but they may be of interest to you:
Code:

[schneidz@hyper ide-34]$ yum search vcard
Loaded plugins: refresh-packagekit
BlueBubble                                                                                                      | 3.6 kB    00:00   
fedora-chromium                                                                                                  | 3.4 kB    00:00   
rpmfusion-free-updates                                                                                          | 3.3 kB    00:00   
rpmfusion-nonfree-updates                                                                                        | 3.3 kB    00:00   
updates/metalink                                                                                                |  17 kB    00:00   
========================================================== N/S Matched: vcard ==========================================================
perl-Text-vCard.noarch : Package to edit and create a single vCard (RFC 2426)
trytond-party-vcarddav.noarch : party-vcarddav module for Tryton
python-vobject.noarch : A python library for manipulating vCard and vCalendar files

  Name and summary matches only, use "search all" for everything.

also, according to wikipedia the pic is prepended by the tag PHOTO so maybe you can manually scrape from that point till you hit the next tag and redirect the result to a file.

good luck.

SaintDanBert 10-26-2012 10:25 AM

Quote:

Originally Posted by schneidz (Post 4815017)
do the usual suspects handle your vcf format (thunderbird, evolution, ...) ?

else you may need to do something custom with dd or write a c-program to grab the bytes you want.

the rest of the stuff seems script-able; whats does "normalized" (#4) mean ?

When I say, "normalized," I mean that I would take the various different names for the vcard data items and select one set of names. It really is surprising how many different ways the vcards store first and last names, multiple phone numbers, multiple email addresses ... and then there are various extended field variations.

Thanks,
~~~ 0;-Dan

SaintDanBert 10-26-2012 10:28 AM

Quote:

Originally Posted by schneidz (Post 4815429)
...
Name and summary matches only, use "search all" for everything.[/code]also, according to wikipedia the pic is prepended by the tag PHOTO so maybe you can manually scrape from that point till you hit the next tag and redirect the result to a file.
...

Yes, and there are X-something fields for photos and company logos and such.
My troublees start once I have the base64 block stripped to a file.
What do I do next?

Is it simply 'uudecode' of the block?
Which photo format do I get for the decode results?
If it restores the original format, how do I know what that was?

(laugh) That's why this is called a project.

Thanks,
~~~ 0;-Dan

SaintDanBert 10-26-2012 10:30 AM

Somewhere out there is package land there must be an address book or contact list that will read most of the popular vCard file formats...

Grrr Arrgghh,
~~~ 8d;-< Dan

SaintDanBert 10-26-2012 10:36 AM

Wikipedia offerss the following explanation of Base64 http://en.wikipedia.org/wiki/Base64

Quote:

A quote from Thomas Hobbes' Leviathan:
Code:

Man is distinguished, not only by his reason, but by this singular passion from other animals, which is a lust of the mind, that by a perseverance of delight in the continued and indefatigable generation of knowledge, exceeds the short vehemence of any carnal pleasure.
represented as a byte sequence of 8-bit-padded ASCII characters is encoded in MIME's Base64 scheme as follows:
Code:

TWFuIGlzIGRpc3Rpbmd1aXNoZWQsIG5vdCBvbmx5IGJ5IGhpcyByZWFzb24sIGJ1dCBieSB0aGlz
IHNpbmd1bGFyIHBhc3Npb24gZnJvbSBvdGhlciBhbmltYWxzLCB3aGljaCBpcyBhIGx1c3Qgb2Yg
dGhlIG1pbmQsIHRoYXQgYnkgYSBwZXJzZXZlcmFuY2Ugb2YgZGVsaWdodCBpbiB0aGUgY29udGlu
dWVkIGFuZCBpbmRlZmF0aWdhYmxlIGdlbmVyYXRpb24gb2Yga25vd2xlZGdlLCBleGNlZWRzIHRo
ZSBzaG9ydCB2ZWhlbWVuY2Ugb2YgYW55IGNhcm5hbCBwbGVhc3VyZS4=

In the above quote, the encoded value of Man is TWFu. Encoded in ASCII, the characters M, a, and n are stored as the bytes 77, 97, and 110, which are the 8-bit binary values 01001101, 01100001, and 01101110. These three values are joined together into a 24-bit string, producing 010011010110000101101110. Groups of 6 bits (6 bits have a maximum of 26 = 64 different binary values) are converted into individual numbers from left to right (in this case, there are four numbers in a 24-bit string), which are then converted into their corresponding Base64 character values.
Code:

Text content        M        a        n
ASCII        77        97        110
Bit pattern        0        1        0        0        1        1        0        1        0        1        1        0        0        0        0        1        0        1        1        0        1        1        1        0
Index        19        22        5        46
Base64-encoded        T        W        F        u

As this example illustrates, Base64 encoding converts 3 octets into 4 encoded characters.

The Base64 index table:
Code:

Value        Char                Value        Char                Value        Char                Value        Char
0        A        16        Q        32        g        48        w
1        B        17        R        33        h        49        x
2        C        18        S        34        i        50        y
3        D        19        T        35        j        51        z
4        E        20        U        36        k        52        0
5        F        21        V        37        l        53        1
6        G        22        W        38        m        54        2
7        H        23        X        39        n        55        3
8        I        24        Y        40        o        56        4
9        J        25        Z        41        p        57        5
10        K        26        a        42        q        58        6
11        L        27        b        43        r        59        7
12        M        28        c        44        s        60        8
13        N        29        d        45        t        61        9
14        O        30        e        46        u        62        +
15        P        31        f        47        v        63        /

When the number of bytes to encode is not divisible by 3 (that is, if there are only one or two bytes of input for the last block), then the following action is performed: Add extra bytes with value zero so there are three bytes, and perform the conversion to base64. If there was only one significant input byte, only the first two base64 digits are picked, and if there were two significant input bytes, the first three base64 digits are picked. '=' characters might be added to make the last block contain four base64 characters.

As a result: When the last group contains one octet, the four least significant bits of the final 6-bit block are set to zero; and when the last group contains two octets, the two least significant bits of the final 6-bit block are set to zero.


SaintDanBert 11-17-2012 12:07 AM

It appears that regardless of which image format (jpg, png, etc) one supplies as the contact photo, its bits get encoded and stored and written into the corresponding vCard file.

When you decode the vCard file, how do you know which image format to use when storing the results?

~~~ 0;-Dan

Wim Sturkenboom 11-17-2012 12:48 AM

Not sure if this is of help

man base64; that should get you going with the decoding part
To determine what kind of file it is, you can use the file command

With some bash scripting, you should be good to go.

I did not research it, but I'm quite sure that you can find a library for the programming language of your choice to do the base64 decoding. The same might apply for functionalities provided by 'file'.

SaintDanBert 11-19-2012 02:26 AM

Quote:

Originally Posted by Wim Sturkenboom (Post 4831186)
Not sure if this is of help

man base64; that should get you going with the decoding part
To determine what kind of file it is, you can use the file command

I regret that I failed to adequately state the question.

A vCard has a photo blob (binary large object) encoded as Base64.
Process the vCard file and extract the blob characters, then decode them from Base64. Now you have a blob that is binary. Write the binary to disk separated from the vCard file.

At this point we have someName.vcf and namePhoto.dat.
(I call it a DAT file, because I do not know which photo image format was supplied during the original encoding process.)
Is there some vCard item that tells me that the photo is JPG vs. PNG vs. GIF vs. ??? Alternately, am I face with reading the DAT file bits looking for a file format signature and then guessing at the source format supplied to the original encoding? Are you telling me that 'file' will read and interpret those DAT file bits?

Hoping I'm clear this time.
~~~ 0;-/ Dan

Wim Sturkenboom 11-19-2012 03:47 AM

Quote:

Are you telling me that 'file' will read and interpret those DAT file bits?
Yes; you can have an image with a txt extension and file will tell you that it's a jpg

Code:

wim@aa0:~/images$ cp IMGP8955.JPG IMGP8955.txt
wim@aa0:~/images$ file IMGP8955.txt
IMGP8955.txt: JPEG image data, EXIF standard 2.21
wim@aa0:~/images$ file IMGP8955.JPG
IMGP8955.JPG: JPEG image data, EXIF standard 2.21
wim@aa0:~/images$


schneidz 11-19-2012 07:14 AM

Quote:

Originally Posted by SaintDanBert (Post 4832394)
I regret that I failed to adequately state the question.

A vCard has a photo blob (binary large object) encoded as Base64.
Process the vCard file and extract the blob characters, then decode them from Base64. Now you have a blob that is binary. Write the binary to disk separated from the vCard file.

At this point we have someName.vcf and namePhoto.dat.
(I call it a DAT file, because I do not know which photo image format was supplied during the original encoding process.)
Is there some vCard item that tells me that the photo is JPG vs. PNG vs. GIF vs. ??? Alternately, am I face with reading the DAT file bits looking for a file format signature and then guessing at the source format supplied to the original encoding? Are you telling me that 'file' will read and interpret those DAT file bits?

Hoping I'm clear this time.
~~~ 0;-/ Dan

where are you stuck ? do you know the byte offset where the picture begins and ends. it seems like a simple c program can scrape off the necessary bytes (fgetc()). not sure how base64 fits into this but if you need to convert something there seems to be a standard program for it. then the file command will tell you the picture format.

regrads,
schneidz

SaintDanBert 11-19-2012 10:44 AM

Quote:

Originally Posted by schneidz (Post 4832546)
where are you stuck ? do you know the byte offset where the picture begins and ends. it seems like a simple c program can scrape off the necessary bytes (fgetc()). not sure how base64 fits into this but if you need to convert something there seems to be a standard program for it. then the file command will tell you the picture format.

regrads,
schneidz

A vCard, or VCF data file, has a PHOTO attribute:
Code:

BEGIN:VCARD
VERSION:3.0
LABEL;TYPE=HOME:10511 Weller Drive\nAustin\, TX\n78750-2566\nUSA
TEL;TYPE=CELL;X-EVOLUTION-UI-SLOT=2:512-413-5611
TEL;TYPE=HOME,VOICE;X-EVOLUTION-UI-SLOT=1:512-331-8217
X-MOZILLA-HTML:FALSE
X-EVOLUTION-VIDEO-URL:
FBURL:
X-EVOLUTION-BLOG-URL:
NOTE:
X-EVOLUTION-SPOUSE:
X-EVOLUTION-ASSISTANT:
CALURI:
TITLE:General Manager
X-EVOLUTION-MANAGER:
ROLE:Technical Writer\, Educator\, Coach
ORG:The GRILLON Group;;
REV:2010-11-30T20:38:31Z
PHOTO;ENCODING=b;TYPE="X-EVOLUTION-UNKNOWN":iVBORw0KGgoAAAANSUhEUgAAAIoAAAC
 WCAIAAACO8YfTAAAAA3NCSVQICAjb4U/gAAAgAElEQVR42uy9d5hcx3Unek5V3Xv7dk4zPXkwy
 DkDBAkmMQdRVCQpybJl2U/Bb71eB63t7+2zbD/Zlr226WdbWmXJkkiJokhKVGAmwQAGAETOAww
 mx57p3DdV1dk/ejAcUaSlJUGIsllf48M3+Xb96qTf+VUVaq3hZwYiKqWCIPA8Lx6Pw1vjlzTw1
 eAhIkRsfEhEb83Umwiet8abZLBXxQ3xLaN5y3reGq/Jet4ab8Hz1ngLnrfgeWu8Bc9b8Lw13oL
...
 LiOj7fiVKDQCdToeZPc+r1WpRFJVl2el0lFJhGCZJ0u12G41GkiQVOVIcx9baKKytHDk6n8+A/
 Ga743keo4hbi+vHjkkpR/1ngbnV6QAgCJnO58bYWnthOB7XgqhZb5j6pF6vW2ezLO2ErtPpWGs
 X26Lm14NuOJ3OgHQY+Mu9pXma+mHUarWIZKu9EASBp9RSb9nzPERsNpsV3eUD7HU9zGxm/9f4V
 znGoii2trbiOO71el8Uevf/15PML/BHJ6oKoyqzaDQa8LAuuX3q638CQGJndxFrAZ8AAAAASUV
 ORK5CYII=

UID:pas-id-4F4EBB80000001A3
N:St.Andre;Dan;;;
FN:Dan St.Andre
NICKNAME:saint
X-EVOLUTION-FILE-AS:Saint-Andre\, Dan
ADR;TYPE=HOME:;;10511 Weller Drive;Austin;TX;78750-2566;USA
URL:
EMAIL;TYPE=HOME:saint@grillongroup.org
EMAIL;TYPE=WORK:dan.st.andre@grillongroup.org
END:VCARD

When you supply an image vile, JPG, PNG, GIF, etc to your address book program, it encodes the image
and stores it. When you save that contact as a vCard, the encoded image data gets written into the PHOTO attribute.

I can decode the PHOTO attribute back to its original binary. However, I have yet to discover
how to programmatically learn with photo image format to declare when I name the saved binary
after the decode.

~~~ 0;-Dan

schneidz 11-19-2012 12:11 PM

Quote:

Originally Posted by SaintDanBert (Post 4832667)
...
I can decode the PHOTO attribute back to its original binary. However, I have yet to discover
how to programmatically learn with (sic: did you mean what) photo image format to declare when I name the saved binary
after the decode.

~~~ 0;-Dan

^huh ? use either file or identify (if imagemagick is installed) to identify the file type.

also, is that your real address ?


All times are GMT -5. The time now is 12:53 AM.