Anything can go here, in any language... except my native language Sinhala. Be cool... anybody is warmly welcomed! :)

Data Migration – From Access 2007 to MySQL

Recently, I was assigned for a project that required one-way synchronization of data, from MS Access to MySQL. This was required to be fully automated to make the task executed once a day. It has already been implemented by someone else before, using PHP as the programming language, and SOAP protocol as the underlying technology. The implementation had several issues, so I was asked to do it in a different approach.

So, roughly, my plan was to dump the Access database into text, and then upload the dump file to the server, and then import to MySQL. So how to do that?

Microsoft Office Access 2007 provides it's own methods to export table content as CSV, and it's really nice that at the MySQL's end the CSV files can be directly read into the MySQL database. Roughly, my approach looked like the following,

Synchronize necessary tables into MySQL
  • Using VBA code, export necessary tables into CSV
  • Compress and upload the CSV files, preferably in a single file
  • At the MySQL's end, import the data upon the upload completes
Make the whole task fully automated
  • Using a MS Access 2007 macro, trigger the VBA code
  • Set up Windows task scheduler
Now, let us explore step-by-step... :-)


Synchronization of Necessary Tables into MySQL

I remember that I was doing good with Microsoft Visual Basic 6.0 six-seven years before. Almost forgotten some, I started working with VBA. Basically it's the TransferText method provided by the DoCmd object in MS Access. In brief, following is the code.

I initially wrote this part of code as a Sub in VB, but later I understood that, to be automated with a macro, I have to make this a Function. (Sub does not return a value but a Function does)

Function ExportData ()
 Dim tables (1 To 3) As String
 Dim table As String
 Dim exportpath As String

 exportpath = "C:\temp\"

 table (1) = "tblStudents"
 table (2) = "tblMarks"
 table (3) = "tblSubjects"

 For Each table In Tables
  DoCmd.TransferText acExportDelim, , table, exportpath + table + ".csv"
 Next table

 ExportData = True ' Return True to indicate (in case if necessary) all done without breaking in the middle
End Function

In my work I had to export several tables. However it's that simple. Now, the next task is to upload the exported CSV to the server, through FTP. As far as I know, VBA does not provide any native/in-built methods to handle FTP. The solution is accessing Windows API. I have programmed with Windows API for classic Visual Basic before so I know how hard it can be without a proper documentation. I have found plenty of nice articles over the Internet on how to do it, but as I do not know how it would be working in later Windows releases such as Vista and Seven and how would Internet security software handle that, I decided to use the PHP command line instead of Visual Basic. By installing the XAMPP package in Windows, you can get PHP CLI work. So, the latter part of the above VBA code goes like this,

Dim cmdline
cmdline = "C:\xampp\php\php.exe c:\uploadcsv.php "
For Each table In Tables
 DoCmd.TransferText ()
 cmdline = cmdline + Chr$ (34) + exportpath + table + ".csv" + Chr$ (34) + " "
Next table

Shell cmdline, vbNormalFocus

As you can see, all the exported file paths are passed to a PHP script called uploadcsv.php. Work to be done with VBA code ends here. Now, the next task is to handle the file upload. PHP provides very easy and nice methods to do that.

First, let us examine the command line, the cmdline variable in our VBA code. It will look like the following. Note that Chr$ (34) returns a double quotation mark (ASCII 34).

C:\xampp\php\php.exe c:\uploadcsv.php "C:\temp\tblStudents.csv" "C:\temp\tblMarks .csv" "C:\temp\tblSubjects.csv"

So, the contents of uploadcsv.php goes here...
I'll explain PHP code wherever necessary, in PHP comments.

<?php
/* 
$argv handles command line input
The first element of the array is the php script filename itself.
So we'll remove it. 
*/
array_shift ($argv);

/* 
We're going to upload several tables in different files.
To upload three files we need three FTP connections.
Archiving makes the three one. And finally we need one FTP connection.
So it reduces the time spent for the work to be done.
For archiving I used the Archive Tar library
which is available at http://pear.php.net/package/Archive_Tar/ 
*/
require_once ("Tar.php");
$tar = "exportcsv.tar";
$tarfile = new Archive_Tar ($tar);
$tarfile->create ($argv);
unset ($tarfile);

/*
Now we can delete the temporary CSV files
*/
foreach ($argv as $file)
{
 unlink ($file);
}

/*
Archiving isn't just enough when file size becomes quite large to upload.
We have several compression methods available
that can drastically reduce the file size.
PHP provides methods to compress data into bzip2 format,
where very high compression ratios are achievable.

When you compress a set of files into one in your computer,
it automatically performs the two steps of archiving and
compressing as a single step.

In my work, archive was 19.xx MB and after compression
it took just 3.xx MB
*/
$finale = $tar . ".bz2";
$bz2 = fopen ($finale, "w");
$bz2data = 

unlink ($tar);

$upload_md5 = md5_file ($finale);
/*
As we are uploading this file,
we need to verify it's integrity at the 
remote end before processing. I used MD5 to
verify the uploaded file's integrity.
Please read http://en.wikipedia.org/wiki/MD5 for more information.
*/

/*
Our data is ready now.
Next is to upload it using FTP
*/
$remote_path = "/home/example/ftp/" . $finale;
/*this is the remote directory where we're going to upload the file.
For security reasons this should be outside of the website's document root (/home/user/public_html) */
$ftp_host = "example.com";
$ftp_user = "example";
$ftp_pass = "p4ssw0r!";

$conn = ftp_connect ($ftp_host,);
ftp_login ($conn, $ftp_user, $ftp_pass);
ftp_pasv ($conn, true);

if (!ftp_put ($conn, $remote_path, $finale, FTP_BINARY))
{
 die ("Upload failed!"); // Abort if upload failed.
}

ftp_close ($conn);

/*
After the upload has finished, we need to tell
the website that the uploaded file is ready to be 
proceed.

We can do this with a HTTP GET but some part of the actual system inspired me to use SOAP.
For simplicity of this article I may use a HTTP request here.

*/

$final_result = file_get_contents ("http://www.example.com/importer.php?file=" . $finale . "&checksum=" . $upload_md5);
/* where www.example.com is the remote location where we keep the script to import data from CSV files */

if ($final_result == "Success!")
{
 echo "One way sync successful!";
}
else
{
 echo "Synchronization failed!\n";
 echo $final_result;
}

Alright. Finally, one more step ahead... data is ready to be imported to MySQL.
It's that importer.php that we called in the earlier script. We need to keep this file inside the document root of the website so it can be called over HTTP.
I'll put the code here, explaining each piece, in the same way as the above.

<?php

require_once ('Tar.php');
 
/*
MySQL Database configuration
*/
$db_server = 'localhost';
$db_user  = 'example';
$db_pass  = 'p4ssw0r!';
$db     = 'example_com';

/*
Directory path (on the remote server) where the uploaded file is kept
*/
$basedir = "/home/example/ftp/";

/*
Connect to the MySQL database
*/
$conn = mysql_connect ($db_server, $db_user, $db_pass);
if (!$conn) { die ("MQSQL database connection failed! Can't continue."); }
mysql_select_db ($db, $conn);


/*
Validate the upl;oaded file against the MD5 checksum
*/
$filename = trim ($_GET['file']);
$checksum = trim ($_GET['checksum']);

$md5 = md5_file ($basedir . $filename);
if ($md5 != $checksum)
{
 die ("File checksum mismatch. Please try again!");
}


/*
Decompress the uploaded file and extract the archive.
Whenever a file becomes no longer necessary,
it is deleted using unlink ()
*/
$arch = $basedir . substr ($filename, 0, -4);
$bz2data = implode ("", file ($basedir . $filename));
$data = bzdecompress ($bz2data, true);
$fp = fopen ($arch, "w");
fwrite ($fp, $data);
fclose ($fp);
unlink ($basedir . $filename);

$tar = new Archive_Tar ($arch);
$tar->extract ($basedir);
unlink ($arch);


/*
Now we have CSV files back, in the server space.
Next task is to find them, and import to MySQL
using a LOAD DATA LOCAL query.

When a particular directory is scanned with scandir () for files,
it returns an array containing the current directory and one level
upper directory as the first two elements. We have to
skip these two before using the return value.
*/
$fl = scandir ($basedir);
array_shift ($fl);
array_shift ($fl); 

/*
A simple loop is used to import each table.
Here each filename (without extension) is equvalant to the
table name in MySQL. That means, the table names are identicle
to the table names in the original Access database.
Otherwise we need to do a small workaround to fix any mismatch.
*/
foreach ($fl as $file)
{
 if ($file != $filename) {
  $src = realpath ($basedir . $file);

  /*
  Strip '.csv' from the filename to get the exact table name
  */
  $tblname = str_replace (".csv", "", $tblname);
  $tblname = strtolower ($tblname);

  /*
  Empty the table before importing
  to prevent duplicate records.
  */
  $q = "TRUNCATE TABLE $tblname";
  mysql_query ($q);
  /*
  This query handles importing CSV to MySQL directly.
  If this functionality was not there with MySQL
  then it would be a certain workaround to import the tables.
  */
  $q = "LOAD DATA LOCAL INFILE '$src' REPLACE INTO TABLE $tblname FIELDS TERMINATED BY ','
  OPTIONALLY ENCLOSED BY '\"' LINES TERMINATED BY '\r\n'";
  $ret = mysql_query ($q);  
  unlink ($src);

  if ($ret != true)
  {
   echo "MySQL query failed!";
  }
 }
}

/*
Everything is complete.
Close the MySQL database connection
*/
mysql_close ($conn);

Finished! Now, additionally we can automate the whole task to make sure the one way synchronization happens time to time.


Making the whole task fully automated

This actually doesn't need anything to do with coding. We need to set up one macro on Access database, and a scheduled task on Windows.

Although I have prepared a clean user manual for our client, I do not wish to do it here in detail here as I am writing this note to technical people. The macro on Access database should go like this,


ActionArgumentsComment
RunCodeExportData ()-
QuitPrompt-

Give the macro a meaningful name. Here I use Sync.

What we're going to do is, periodically open the database file with Microsoft Access, and then automatically execute the macro so it can trigger the function inside the VBA module. But, by default Microsoft Office blocks macros.

We need to prevent Microsoft Access from blocking the macro. Unless we can't automate the task. To do this, create a trusted location and place the database file there.

Finally, we need to set up a scheduled task on Windows, and the program/ command line should go like this,

msaccess.exe "C:\Database.accdb" /ro /nostartup /x Sync


This is my approach. Now, the leftover work is to apply security if necessary, and then sit back, relax, and watch it go! :-)



Security:
  • This implementation is not secure if tables carry sensitive data. Someone can sniff the data in the middle and study the pattern how it works. For additional security we can use SFTP instead of FTP upload. To do this, we can use SSH2 functions for PHP.
  • Also, when calling the 'importer' script we can use SSL encryption. Also it is possible to attach this functionality to the index.php file of the website.

Gotchas:
  • If the target MySQL tables have DATE fields, this will import the dates incorrectly as MySQL expects the date exactly in four digit year-two digit month- two digit date  (ex. "2010-12-25") format.
  • I haven't tested how this would handle primary keys and relationships in the target database. In my case the data in MySQL database was for just displaying only. So it didn't have any keys.

References:

---

This was my approach. If you have a different approach, or any comments, below you have space to write. Comments and expert advice are warmly welcome. :-)

Thanks for reading!



PS:
This wouldn't be a much workaround if there were timestamps in the source tables. But the business client was not willing to change his existing database structure. But, this is what I had and how I solved it. They ask and we got to shut up and do.

Followers

Subscription Options

 Subscribe in a reader

or

Enter your email address:

and
to inscrease

and make me happy. :)

Other Blogs by the Author

The Author

www.flickr.com
ශාකුන්තල | Shaakunthala's items Go to ශාකුන්තල | Shaakunthala's photostream
free counters
TopOfBlogs
Related Posts with Thumbnails