LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (http://www.linuxquestions.org/questions/programming-9/)
-   -   How to translate data from a CSV file to a MySQL Database? (http://www.linuxquestions.org/questions/programming-9/how-to-translate-data-from-a-csv-file-to-a-mysql-database-828718/)

wowsignal 08-26-2010 10:14 PM

How to translate data from a CSV file to a MySQL Database?
 
I have a CSV called source1 with data like:

Date 12:00:00 AM 12:30:00 AM 01:00:00 AM 01:30:00 AM 02:00:00 AM 02:30:00 AM 03:00:00 AM ...
07/31/10 51.52 52.48 52.64 52.8 63.36 63.84 45.6 ...
08/01/10 49.44 49.92 61.28 60.16 52.64 49.12 47.52 ...
08/02/10 49.12 48.96 48.96 50.72 60.16 58.08 51.04 ...
08/03/10 60.32 60.64 59.2 60.96 37.92 33.28 32.8 ...

I need to get this data into a MySQL database table with columns source, timestamp and data, the translated data looking like:

source timestamp data
source1 10/07/31 12:00:00 51.52
source1 10/07/31 12:30:00 52.48
source1 10/07/31 01:00:00 52.64
source1 10/07/31 01:30:00 52.8
source1 10/07/31 02:00:00 63.36
source1 10/07/31 02:30:00 63.84
source1 10/07/31 03:00:00 45.6
.
.
.
10/08/01 03:30:00 49.44
..
..
..

Also, the CSV file is updates every few hours and I need to load any new data from the file to the database without creating duplicates of data that has already been loaded to the database.

I was trying this using a shell script, but it was starting to get messy. I would appreciate advice about what language/approaches to use to get an elegant solution.

This is my first post, and any help is greatly appreciated in advance.

14moose 08-26-2010 11:55 PM

Hi -

1. Data import

There are actually many different approaches you could take, including writing a quick'n'dirty Perl or Python script.

I would recommend you take a look at "mysqlimport":
http://dev.mysql.com/doc/refman/5.0/en/mysqlimport.html

2. Avoiding duplicate data

Again, there are many different approaches you could take.

I would recommend a "two pronged" approach:

a) write a shell script that periodically renames the original datafile (e.g. "myfile.csv") to some other name (e.g. "myfile.001.csv"). The new data will be written to a (new) "myfile.csv"; your "import program" will only read from the renamed files (myfile.001.csv, myfile.002.csv, etc). You'll probably want a second script that will do the import, then roll off the processed file (for example, move it into a "done/" folder).

b) Another way to avoid "duplicates" is possibly to make the combination of date/time and sequence number (if your data has a sequence number) as the primary key. Primary keys, by definition, won't allow duplicates. If you try, you'll get a SQL error on "insert".

Again - you've got many different options. Take my suggestions as a starting point.

Good luck!

wowsignal 08-27-2010 06:50 AM

Thanks 14moose

Never came across mysqlimport before, but it looks extremely useful.

Each file that I download into the application (approximately every hour) will have a copy of the previous data, and some new entries at the bottom, hence the need for duplication avoidance. The idea of using MySQL to reject duplicates sounds appealing, as this will save me programming time. I will have a sourceid to identify the source of the data, and at date/time to identify when the data is from. There should not be two rows with the same "sourceid" and "date/time" entry in the table. I will attempt to set up a combination of these columns as primary key.

Never used perl, but I have some PHP experience. I'll try and give it a go with that unless anyone can think of a good reason not to do it in PHP.

TB0ne 08-27-2010 10:34 AM

Quote:

Originally Posted by wowsignal (Post 4079419)
Thanks 14moose

Never came across mysqlimport before, but it looks extremely useful.

Each file that I download into the application (approximately every hour) will have a copy of the previous data, and some new entries at the bottom, hence the need for duplication avoidance. The idea of using MySQL to reject duplicates sounds appealing, as this will save me programming time. I will have a sourceid to identify the source of the data, and at date/time to identify when the data is from. There should not be two rows with the same "sourceid" and "date/time" entry in the table. I will attempt to set up a combination of these columns as primary key.

Never used perl, but I have some PHP experience. I'll try and give it a go with that unless anyone can think of a good reason not to do it in PHP.

No, PHP will work just fine. However, here's a little Perl-snippet to get you going, in case you want that path:
Code:

#!/usr/bin/perl

# The DataBaseInterface module.
use DBI;

# Set database variables here
$dbh = DBI->connect("DatabaseName", "DB User ID", "DB Password",{ RaiseError => 1})      # Open the database
  or die "Could not connect to database! $DBI::errstr";

open (INFILE, "/path/to/your.csv") or die "Can't open file!";
while (<INFILE>)
  {
  @words = split(',', $_);
  $var1 = $words[0];  # First data field in CSV
  $var2 = $words[1];  # Second, etc....
  $dbh->do("INSERT INTO TABLENAME values (\"$var1\",\"$var2\"");
  }

Pretty straightforward, and very generic. All you've got to change is the table/db names, and the # of variables, and you can import ANY csv to any MySQL table.

Wim Sturkenboom 08-27-2010 11:22 AM

You can also look at the MySQL statement load data infile if you want to do it using MySQL statements (e.g. from within PHP)

stevephillips79 09-02-2010 12:57 AM

Hello,
You have to make sure that you provide structure information along with you CSV data. The first line of the csv file should contain the list of field names in order to deliver necessary database structure information. Here are two samples of valid CSV data. Thank you.

[crabboy] removed spam [/crabboy]

crabboy 09-06-2010 10:34 PM

stevephillips79: The forum rules do not permit advertising. Please visit http://www.linuxquestions.org/advertising/ for more information on advertising. Feel free to contact the forum admin if you have any questions about this policy.


All times are GMT -5. The time now is 11:36 PM.