Working a database with Perl
Working with a database enabled programs in Perl requires little extra work thanls to the DBI module. This well written and stable module makes database connectivity easy. You should have no problem installing or finding instructions on how to install the module. For streamlining purposes, I'll not cover that here.
To make sure the DBI module is installed and functioning correctly, and also to see what database drivers are available on your system, let's execute the following script:
#!/usr/bin/perl -w # connect_test.pl use strict; use DBI; print "\nAvailable Database Drivers:\n"; print "*" x40, "\n"; print join("\n", DBI->available_drivers()), "\n\n";
We are primarily interested in the presence of mysql database driver. Got it? Let's move on then.
Connecting to a Database
To read or write a plain text file using Perl, you need to first open a filehandle. In this same way, to access a DBI database in Perl, you need to create a 'database' handle. A database handle is what you use to maintain a connection to a database. It is how you reference each particular connection to a database (you can have more than one open at a time) when you call functions or make SQL queries. To connect to multiple databases as mentioned, you will obviously create multiple database handles. Database handles don't just provide connectivity to 'local' databases, but 'remote' as well. The database handle is normally of the form $dbh or to allow for multiples, $dbh_sports, $dbh_news, etc. Let's take a look at a program that connects to some data source, grabs a bit of data, prints it and disconnects.
#!/usr/bin/perl -w # # dbh.pl use strict; use DBI; my $dbh_sport = DBI->connect("DBI:mysql:host=localhost;database=sportwire", "user", "password") or die("Cannot connect:$DBI::errstr"); my $sql = qq/select * from baseball/; my $sth_sport = $dbh_sport->prepare($sql); $sth_sport->execute; while( my $p = $sth_sport->fetch) { print "MySQL: @$p\n"; } $dbh_sport->disconnect;
In the first line of code we declare a database handle named $dbh_sport to connect to the sport database. We then call the DBI's connect() method and pass in some arguments. All of the connect arguments are passed by the Data Source Name (DSN). This string passes the following to the database interface: DBI, the particular database driver 'mysql', the hostname of the server, and the particular database which we'll connect to. Lastly, it passes the username and password combination.
The second line of code in the program creates a variable named $sql and sets it to the string on the right. The variable stores the SQL statement we are going to run against the database. The statement tells the database server to select all data from the baseball table.
The third line of code declares a new scalar variable that will be used as a handle to the statement being prepared on the right side. The prepare() method takes the SQL statement and stores it in an internal, compiled format that the database can work with. The $sth handle is similar to the database handle and stands for statement handle.
In the fourth line of code we call the execute method on the statement handle. This takes the prepared SQL statement and runs it on the database engine.
The fifth line of code begins a while loop. This while loop fetches the data that results from the executed statement on the line above; $sth_sport->execute;. The fetch method grabs one row (aka record) or returned data and returns a reference (aka pointer) to an array containing the data. The pointer to this array is stored in the @$p variable. We'll talk more about the fetch statement througout this document. The print statement just prints the fetched record. Working with Database Tables Database tables represent a two dimensional view of some data. It is very similar to a spreadsheet in that it contains rows and columns. The columns make up the database-table definition and the rows make up each record in the table. We'll take a look at three basic table operations in the following paragraphs; creating a table, modifying a table and deleting a table.
What is a Table
As just mentioned, a database table is a two dimensional view of a portion of database data. For smaller applications, having data stored in a single table is quite common. There is a problem with this, however, when the database starts to get larger we get redundant data. For database applications that have more than a few fields of redundant data, it is wiser to store the data in several different tables, each holding a specific set of information. Breaking the data down into tables to minimize redundant data is called normalization.
+---------+-------+---------+-------+----------+----------------+--------------+ | Item | Color | Stock # | Price | Vendor | Vendor Addr. | Vendor Phone | +---------+-------+---------+-------+----------+----------------+--------------+ |Snarplet |Blue |4529 |$12.50 |Bob's Inc.|123 New Street |123-987-3030 | +---------+-------+---------+-------+----------+----------------+--------------+ |Tiblet |Grey |4333 |$8.42 |Hansel's |439 Old Street |324-324-3232 | +---------+-------+---------+-------+----------+----------------+--------------+ |Hitnitt |Yellow |4334 |$7.42 |Hansel's |439 Old Street |324-324-3232 | +---------+-------+---------+-------+----------+----------------+--------------+ |Opertoop |Green |4335 |$6.42 |Hansel's |439 Old Street |324-324-3232 | +---------+-------+---------+-------+----------+----------------+--------------+
Consider the above table of products that contains the product item number, color, stock #, price, vendor, vendor address and vendor phone. Even a table this small can contain alot of redundant data. Let's assume this table has 100 products, with 10 different vendors. If we put all the data in one table, we would have the same vendor information in several records. An average of 10 times for each vendor, which a very inefficient manner to store and retrieve information. A more sensible approach is to split the vendor information into a different table and just have a vendor id number stored in the products table. The vendor id will serve as a key to relate the two tables together. Let's see the above table reconfigured as discussed.
+---------+-------+---------+-------+----------+ | Item | Color | Stock # | Price | VendorDI | +---------+-------+---------+-------+----------+ |Snarplet |Blue |4529 |$12.50 |48838 |-----------+ +---------+-------+---------+-------+----------+ | |Tiblet |Grey |4333 |$8.42 |98009 |-----------------+ +---------+-------+---------+-------+----------+ | | |Hitnitt |Yellow |4334 |$7.42 |98009 |-----------------+ +---------+-------+---------+-------+----------+ | | |Opertoop |Green |4335 |$6.42 |98009 |-----------------+ +---------+-------+---------+-------+----------+ | | | | | | +----------+----------+----------------+--------------+ | | | VendorID | Vendor | Vendor Addr. | Vendor Phone |----+ | +----------+----------+----------------+--------------+ | |48838 |Bob's Inc.|123 New Street |123-987-3030 | | +----------+----------+----------------+--------------+ | |98009 |Hansel's |439 Old Street |324-324-3232 |----------+ +----------+----------+----------------+--------------+
The above example demonstrate the use of two separate tables to greatly reduce redundant data. The larger each table becomes, the more redundancy is reduced.
Data Types
In programming languages, you have data types such as strings, integers, floating-point numbers, big integers, characters and so on. The same is true of database data types: There are different data descriptors for different types of data. Every column in the database is for storing a particular kind of data: a column for storing last names would be of type "varchar", a column for storing birthdates should be of type date, and so on. This helps the database store the information more effeciently and helps programmers in understanding the various data types to expect.
Creating Tables
A database must be created before a table is added. One way to perform database creation is from the command line with:
# mysqladmin create databasename -u username -p password
or from within the SQL control console:
mysql> create database databasename
After changing to the new database with use databasename; we're ready to make some tables, let's look at the commands are needed. Later, we'll write a short Perl script to do this, but in the meantime, we'll use SQL:
CREATE TABLE products ( Pid INT NOT NULL, Item VARCHAR(50), Descr VARCHAR(255), Price REAL, Vid INT NOT NULL );
The above SQL statement creates a table named products with five items in it. Next, we need to create a vendor table, so we can relate a product to a vendor.
CREATE TABLE vendors ( Vid INT NOT NULL, Name VARCHAR(100), Address VARCHAR(100), City VARCHAR(50), State CHAR(2), Zip VARCHAR(10), Phone VARCHAR(25) );
The above SQL statement creates a table named vendors with seven items in it. Both of the above SQL statements are all that is needed to create two tables that can be used for an item database with relations between each item and its vendor.
Altering Tables
Let's say we need to make some changes to the tables we created in the above discussion. We need to add a size field to the products table, add an Address2 field to the vendors table, and change the Item field in the products database to 100 characters instead of 50. Here's the SQL statement used to affect the change:
ALTER TABLE products ADD Size VARCHAR(20); ALTER TABLE vendors ADD Address2 VARCHAR(100); ALTER TABLE products MODIFY Item VARCHAR(100);
Try to modify tables only when you must, as this operation is always a risk in terms of losing data. If the database contains significant data, be sure to back it up first.
Program Example
Let's turn our attention to a simple program that adds a table to an existing database.
#!/usr/bin/perl -w # addtable.pl use strict; use DBI; my $dbh = DBI->connect("DBI:mysql:widgets", "root", "easytocrack") or die( "Cannot connect: $DBI::errstr"); my $sql = qq/CREATE TABLE products ( Pid INT NOT NULL, Item VARCHAR(255), Descr VARCHAR(255), Price REAL, Vid INT NOT NULL) )/; # when you are not expecting data to be returned # the do method replaces the DBI prepare and execute commands my $return = $dbh->do($sql); if ( $return ) { print "Table addition successful!\n"; } else { print ""\n\nERROR! $DBI::errstr\n"; }
The above discussion has provided a simple overview of how to create tables using both MySQL command prompt and Perl. Next, we'll be learning about the SQL SELECT keyword.
Fetching Data from the Database
All queries you perform on a database will begin with the SELECT keyword. Another important keyword is FROM. SELECT tells the DBMS what information you are looking for, and FROM tells the DBMS where to look for the data. When you create a query using the SQL keywords and table fields, the finished product is called a SELECT statement.
Simple Select
Let's look at a simple program that fetches some data from a table called players from our mariners database. The players table is filled with players stats; name, position, bat_avg, etc.
#!/usr/bin/perl -w # select.pl use strict; use DBI; my $dbh = DBI->connect("DBI:mysql:mariners", "root","easytocrack") or die("Can't connect to database: $DBI::errstr"); my $sql = qq/SELECT * FROM players/; my $sth = $dbh->prepare($sql); $sth->execute; my $counter = 1; while(my $record = $sth->fetch){ print "$counter: "; for my $field (@$record){ print "$field - "; } $counter++; print "\n\n"; }
Hopefully by now you are familiar with what a database handle is ($dbh) and what a data source name is (everything inside the connect method). Initially we create a database handle to the mariners database. We use the $sql variable to encapsulate our SELECT statement for further use. The prepare statement compiles our query for faster execution by the database engine. Execution occurs by calling the execute method, which runs the statement in compiled form on the database. The results are stored as a reference in the statement handle ($sth). Inside the while loop, we declare a variable $record to be used as a pointer. Using $sth->fetch, each iteration of the while loop produces a reference to an array (record) and stores this in the $record variable.
This reference busines may be somewhat hard to grasp at the moment. Just try to rememeber that $record stores a reference to an array. This is like saying $record = \@foo; when storing a reference to an array in regular Perl code. The for loop goes through each element of the array @$record. Which could have been written as @{record}. If this is Chinese to you, consider reading the perlreftut man page for background.
Lastly, we print the current field value ($field) from the loop and a dash with a space to separate the record data for easier reading. Each time through the for loop, the current record's field data gets stored in the $field variable.
SELECT with WHERE
Now we know how to display (to stdout) every record in a database, but what if you only want to display a certain few records of interest. We can slightly modify the previous example and enable it to look for only a particular record. The only portion modified is the SQL query, so that's all we'll show here:
my $sql = qq/SELECT * FROM players WHERE position = 'LF'/;
The output should contain only those records with a position of LF.
SELECT with ORDER
The DBMS can do more than just fetch the data; it can perform operations on the data so that it is displayed exactly how you want it. Again, all code is exact with exception of the SELECT statent, which now uses the ORDER BY statement:
my $sql = qq/SELECT * FROM players ORDER BY name/;
The above code will output the results in alphabetical order using the name column data.
Other DBI Retieval Functions
With Perl and the DBI, you are not limited to using only the fetch function ($sth->fetch) to retrieve data. There are also; fetchrow_array, fetchrow_arrayref, fetchrow_hashref, and fetchall_arrayref. All are available for you to use of course, but it is probably best to settle on one that suits you best. Just for background, fetch and fetchrow_arrayref perform the exact same operation. As a result, we'll skip discussion of this method.
fetchrow_array
Since we've been working with array references up to this point, let's look at an actual array without the reference. The following code shows the fetchrow_array function:
#!/usr/bin/perl -w # fetcharray.pl use strict; use DBI; my $dbh = DBI->connect("DBI:mysql:mariners","root","easytocrack") or die("Cannot connect: $DBI::errstr"); my $sql = qq/SELECT player, position, weight FROM players/; my $sth = $dbh->prepare($sql); $sth->execute; while(my @array = $sth->fetchrow_array) { no warnings; my $index = 0; for my $data(@array) { print "\$array[$index] contains $data\n"; $index++; } print "\n"; }
Each time through the while loop, we fetch one row of data. Each field of the current gets stored in the array. If you look at the output of this example, you can see that element 0 contains the player name, element 1 contains the player position, element 2 contains the player weight. Remember that fetchrow_array returns an actual array as opposed to a reference or pointer to an array.
fetchall_arrayref
This function is a bit trickier to work with than the fetchrow_array, since you end up with a reference to an array. Each row of the referenced array is a reference to another array containing the data. Let's attempt to clear this up a bit with the following:
#!/usr/bin/perl -w # fetchallarrayref.pl use strict; use DBI; my $dbh = DBI->connect("DBI:mysql:mariners","root","easytocrack") or die("Cannot connect: $DBI::errstr"); my $sql = qq/SELECT * FROM players/; my $sth = $dbh->prepare($sql); $sth->execute; my $ref = $sth->fetchall_arrayref; my $record_idx = 0; for my $rec (@$ref) { no warnings; my $field_idx = 0; for my $field (@$rec) { print "ref[$record_idx][$field_idx] = $field\n"; $field_idx++; } $record_idx++; print "\n"; } print "\n"; print "$ref->[57][1]\n\n";
We'll pickup where the code differs from previous examples, namely at the fetchall_arrayref portion. Notice that the call to fetchall_arrayref doesn't use a loop to call fetch several times. The $record_idx variable is our record index and is the first of two counters for the program. The initial for loop iterates through the array referenced in $ref. Each time through the loop, $rec gets set to a reference to an array that holds one record, or row of data. At this point, we have data that is like the data produced from the fetch method. The $field_idx varialbe is the second counter, used to keep track of each element in the array. The second for loop iterates through the current array reference that we got from the initial for loop (oboy!). Finally we print the current data along with where it is referenced at.
fetchrow_hashref
This particular method is actually one of the best. Instead of having to deal with arrays and array indexing, you can access the data via a hash; the keys of the hash are the field names of the data.
#!/usr/bin/perl -w # fetchrow_hashref.pl use strict; use DBI; my $dbh = DBI->connect("DBI:mysql:mariners","root","easytocrack") or die("Cannot connect: $DBI::errstr"); my $sql = qq/SELECT * FROM players); my $sth = $dbh->prepare($sql); $sth->execute; my $counter = 1; while(my $record = $sth->fetchrow_hashref) { no warnings; print "Book #$counter\n"; print "Player: $record->{name}\n"; print "Position: $record->{position}\n"; print "Height: $record->{height}\n"; print "Weight: $record->{weight}\n"; print "\n"; $counter++; }
Using ADD
Let's try adding data to our database. We'll use an interface which allows a user to enter one item into the database. Not very functional, but we'll progress to more features soon enough.
#!/usr/bin/perl -w # addrecord.pl use strict; use DBI; my $conn = DBI->connect("DBI:mysql:mariners","root","easytocrack") or die("Cannot connect: $DBI::errstr"); my ($sql, %mariners, @keys); Get_Data(); Execute_Transaction(); sub Get_Data { $mariners{'name'} = Get_Input("Enter Player Name"); $mariners{'position'} = Get_Input("Enter Player Position"); $mariners{'height'} = Get_Input("Enter Player Height"); $mariners{'weight'} = Get_Input("Enter Player Weight"); return 1; } sub Get_Input { print $_[0], "\n"; return scalar ; } sub Execute_Transaction { @keys = keys %mariners; @vals = values %mariners; chomp(@vals); @vals = map{$conn->quote($_)} @vals; $sql = "INSERT INTO library (". join(", ", @keys). ") VALUES (". join(", ", @vals). ")"; my $query = $conn->prepare($sql); $query->execute or die("\nError executing SQL statement! $DBI::errstr"); print "Record added to the database.\n"; return 1; }
Let's briefly run through the above code starting with what is new to us. Get_Data and Execute_Transaction are calls to the subroutines which are define as sub Get_Data and sub Execute_Transaction. The Get_Data subroutine gathers the player information from the user. It calls Get_Input which will both prompt user and returns a value to be stored in the %mariners hash. return 1 is a programmer's way to verify the subroutine has ended correctly (1 = true).