Ok, let me preface this thread by saying that I cannot provide a real sample of the data that I am working with. The reason is, because it contains a lot of private, confidential information (for work) that I cannot divulge. But, I can provide the details of the database coding that I have been working on. I have been racking my brains the last few days with this and I am at a point that I believe I need some guidance.
I am working on a project for work that will be a database of all of our client information. It is quite an ambitious project, but it is something that is desperately needed by my group. We have a couple thousand clients (currently) and that is growing by leaps and bounds at the moment, so getting this project completed sooner rather than later is a bit critical.
The initial load of client data was provided to me in an excel spreadsheet. It has 39 columns of information and almost 2000 rows, 1 row of information per client. In order to work with the data, I initially started using the Spreadsheet::Pa rseExcel module, but after attempting to pull out the data, noticed there were only 1300 lined of data in the file, instead of the almost 2000 lines that exist in the file. So, I changed my tactics. I saved the spreadsheet out to a file, but chose .csv as the format. I chose a ":" as the field delimiter and it seemed to export fine, showing the correct number of lines when I checked.
I then wrote the below code to go through the file, line by line, put each value in a variable, prepare the INSERT statement, and then execute. Here is the code I have:
Now, I have done some tweaking here and there, but at this point, when I run the above script, I get:
The data itself, by the way, has some issues that I have to find a way to work around. There are a log of names in there that are in the format of "lastname, firstname middlename". Notice the comma after the last name, it is a reason I did not use a comma as a delimiter. I know that there are also names that have an apostrophe in them, such as those Irish names in the format of "O'Somethin g". (sorry, cannot give real names, but you get the idea).
So, the data has characters in it that I am wondering if they are messing up the data load. There are fields that contain the phrase "#EMPTY" and some that are literally empty and don't even have that. They are empty fields all together.
I really need some help as this data really needs to get loaded, but is for some reason really causing a problem.
As an FYI, line 67 that is mentioned in the error is the prepare statement in the above code. When I went into MySQL and played with a "problem line" of data, I found that the statement would not work with the empty fields unless I put something in them. I need it to put nothing in the entry.
I did notice that about 5 entries our of the almost 2000, did, for some reason, load, but that is a far cry from being finished. The entries even had some entry fields, so I am not sure what the problem is. Any ideas are absolutely welcome.
Regards,
Jeff
I am working on a project for work that will be a database of all of our client information. It is quite an ambitious project, but it is something that is desperately needed by my group. We have a couple thousand clients (currently) and that is growing by leaps and bounds at the moment, so getting this project completed sooner rather than later is a bit critical.
The initial load of client data was provided to me in an excel spreadsheet. It has 39 columns of information and almost 2000 rows, 1 row of information per client. In order to work with the data, I initially started using the Spreadsheet::Pa rseExcel module, but after attempting to pull out the data, noticed there were only 1300 lined of data in the file, instead of the almost 2000 lines that exist in the file. So, I changed my tactics. I saved the spreadsheet out to a file, but chose .csv as the format. I chose a ":" as the field delimiter and it seemed to export fine, showing the correct number of lines when I checked.
I then wrote the below code to go through the file, line by line, put each value in a variable, prepare the INSERT statement, and then execute. Here is the code I have:
Code:
#!/usr/bin/perl
use strict;
use warnings;
use DBI;
# Connect to database
my $dbh = DBI->connect('DBI:mysql:dsmdb', 'username', 'password') or die "Connection failed: $DBI::errstr";
# Open data file
open(DATAFILE, "<file.csv") or die "Canont open data file: $!";
my $sth;
while(<DATAFILE>){
# Take the current line of data and split it out into variables
my ($PartnerName, $CorporationName, $ContactGivenName, $ContactPhone, $ContactEmailID, $CommunicationProtocol, $SecurityType, $ECID, $Source, $GEID, $GEMatchType, $CrossoverWithGTI, $ECIDv1, $CustomerName, $UltimateECID, $UltimateName, $CostCenter, $DomicileRegion, $SalesRegion, $SalesHead, $SalesManager, $TMO1SID, $TMO1Name, $TMO2SID, $TMO2Name, $BankerSID, $BankerName, $CSOMgrSID, $CSOMgrName, $CSOSID, $CSOName, $CSPSID, $CSPName, $LOB, $SubLOB, $Segment, $Market, $Industry, $ClientType, $ClientStatus) = split(/:/, $_);
# Strip leading white space from the values in all the variables
$PartnerName =~ s/^\s*//;
$CorporationName =~ s/^\s*//;
$ContactGivenName =~ s/^\s*//;
$ContactPhone =~ s/^\s*//;
$ContactEmailID =~ s/^\s*//;
$CommunicationProtocol =~ s/^\s*//;
$SecurityType =~ s/^\s*//;
$ECID =~ s/^\s*//;
$Source =~ s/^\s*//;
$GEID =~ s/^\s*//;
$GEMatchType =~ s/^\s*//;
$CrossoverWithGTI =~ s/^\s*//;
$ECIDv1 =~ s/^\s*//;
$CustomerName =~ s/^\s*//;
$UltimateECID =~ s/^\s*//;
$UltimateName =~ s/^\s*//;
$CostCenter =~ s/^\s*//;
$DomicileRegion =~ s/^\s*//;
$SalesRegion =~ s/^\s*//;
$SalesHead =~ s/^\s*//;
$SalesManager =~ s/^\s*//;
$TMO1SID =~ s/^\s*//;
$TMO1Name =~ s/^\s*//;
$TMO2SID =~ s/^\s*//;
$TMO2Name =~ s/^\s*//;
$BankerSID =~ s/^\s*//;
$BankerName =~ s/^\s*//;
$CSOMgrSID =~ s/^\s*//;
$CSOMgrName =~ s/^\s*//;
$CSOSID =~ s/^\s*//;
$CSOName =~ s/^\s*//;
$CSPSID =~ s/^\s*//;
$CSPName =~ s/^\s*//;
$LOB =~ s/^\s*//;
$SubLOB =~ s/^\s*//;
$Segment =~ s/^\s*//;
$Market =~ s/^\s*//;
$Industry =~ s/^\s*//;
$ClientType =~ s/^\s*//;
$ClientStatus =~ s/^\s*//;
$sth = $dbh->prepare qw(INSERT INTO client_info (PartnerName CorporationName ContactGivenName ContactPhone ContactEmailID CommunicationProtocol SecurityType ECID Source GEID GEMatchType CrossoverWithGTI ECIDv1 CustomerName UltimateECID UltimateName CostCenter DomicileRegion SalesRegion SalesHead SalesManager TMO1SID TMO1Name TMO2SID TMO2Name BankerSID BankerName CSOMgrSID CSOMgrName CSOSID CSOName CSPSID CSPName LOB SubLOB Segment Market Industry ClientType ClientStatus) VALUES (?)) or die "Prepare failed: " . $dbh->errstr();
$sth->execute qw($PartnerName $CorporationName $ContactGivenName $ContactPhone $ContactEmailID $CommunicationProtocol $SecurityType $ECID $Source $GEID $GEMatchType $CrossoverWithGTI $ECIDv1 $CustomerName $UltimateECID $UltimateName $CostCenter $DomicileRegion $SalesRegion $SalesHead $SalesManager $TMO1SID $TMO1Name $TMO2SID $TMO2Name $BankerSID $BankerName $CSOMgrSID $CSOMgrName $CSOSID $CSOName $CSPSID $CSPName $LOB $SubLOB $Segment $Market $Industry $ClientType $ClientStatus) or die "Execute failed: " . $dbh->errstr();
}
$ ./parse_csv_to_db .pl
DBI prepare: invalid number of arguments: got handle + 45, expected handle + between 1 and 2
Usage: $h->prepare($state ment [, \%attr]) at ./parse_csv_to_db .pl line 67, <DATAFILE> line 1.
DBI prepare: invalid number of arguments: got handle + 45, expected handle + between 1 and 2
Usage: $h->prepare($state ment [, \%attr]) at ./parse_csv_to_db .pl line 67, <DATAFILE> line 1.
So, the data has characters in it that I am wondering if they are messing up the data load. There are fields that contain the phrase "#EMPTY" and some that are literally empty and don't even have that. They are empty fields all together.
I really need some help as this data really needs to get loaded, but is for some reason really causing a problem.
As an FYI, line 67 that is mentioned in the error is the prepare statement in the above code. When I went into MySQL and played with a "problem line" of data, I found that the statement would not work with the empty fields unless I put something in them. I need it to put nothing in the entry.
I did notice that about 5 entries our of the almost 2000, did, for some reason, load, but that is a far cry from being finished. The entries even had some entry fields, so I am not sure what the problem is. Any ideas are absolutely welcome.
Regards,
Jeff
Comment