Skip to content

Monthly Archives: June 2010

MySQL – Should you put an index on a Boolean field to help query performance?

I am often asked if it makes sense to place an index on a Boolean field in order to improve query performance.  In general, because a boolean value can only have three values (True, False, Null), this low cardinality would suggest that adding an index will not help performance, as the query optimizer will still usually perform a table-scan if you have an even distribution of values within your DB.

One situation in which an index on a boolean field (or other low cardinality field) might be useful is if there are relatively few of one of the values, for example 5 True values in a table of millions of records and you are searching for those few values on a regular basis.

However, you might index a boolean value on a combination of fields. Indexing on a single Boolean might be pointless, because there’s only 2 (or 3) values.  However, indexing on 16 boolean values has the potential of  2^16 values.  It might help to make a combined index but you should understand how the combined index can and cannot be used and be aware that the order of the columns matters.

In general, you should always profile your system to see if there are queries that are too slow and consider adding another index to handle those queries. Sometimes a single combined index can be used for multiple queries and others time you will need to make an index for each type of query. Remember that adding indexes slows down modifications to the data so it is possible to have too many indexes. There is always a  trade-off when creating multiple indexes.

MySQL – DateTime vs. TimeStamp – When to use?

This question often comes up, and while there is often a heated debate, these are my thoughts on the subject.

I typically use a timestamp when I need to record a fixed point in time. For example when a record was inserted into the database or when some useraction took place that resulted in a row being updated.   The Timestamp data type has various features that allow it to automatically perform this function.   The default value for the Timestamp data type results in it being set to the current_date when a new row is added.  If you as the On Update syntax, this value will also be updated whenever the row is updated.

I use a datetime field when the date/time can be set and changed arbitrarily. For example when the field can be updated based on specific events or actions, besides a simple row Update.  In addition, a TimeStamp field can only store dates since 1970, so if you need to store dates in the past, such as a birthday, you must use DateTime.

A couple other things to keep in mind, TimeStamp fields support the TimeZone setting on your server.  For example, if I have a database in Europe, and take a dump of that database to syncronize/populate a database in America, then the timestamp would update to reflect the real time of the event in the new time zone, while datetime would still reflect the time of the event in the European timezone.  By default, the current time zone for each connection is the server’s time, however the time zone can be set on a per-connection basis.

Lastly, TIMESTAMP stores its value in 4 bytes, while DATETIME uses 8 bytes.  This is the main reason why TimeStamp has a lower limit of 1970.

MySQL – SQL Injection, and how to Prevent it

Stop SQL Injection

Bobby Tables and the lesson he teaches

SQL injection vulnerabilities are often been described as the most serious threat for Web applications, regardless of what language they are written in . Web applications that are vulnerable to SQL injection may allow an attacker to gain complete access to their underlying databases, regardless of if they are using MySQL Server, SQL Server, or Oracle.

SQL-injection attacks are those in which data provided by the user is included in an SQL query in such a way that part of the user’s input is treated as SQL code that is executed on the server.   By using this technique, an attacker can submit SQL commands directly to the database.   These attacks are a serious threat to any Web application that receives input from users and passes it into SQL queries to an underlying database server.  If  user input is not santised properly, web applications may result in SQL Injection attacks that allow hackers to view information from the database and/or even wipe it out.

To defend against SQL Injection attacks, user input must not directly be embedded into SQL statements that are executed on the server.  Instead, you must use parameterized statements, and Escaping functions to check user input.

Various resource for addressing SQL Injection are as follows:

Bobby Tables provides real-world practical code for addressing SQL Injection - http://bobby-tables.com/
Michal Daw’s Blog Page outlines various SQL Injection vectors - http://michaeldaw.org/sql-injection-cheat-sheet


MySQL – Disabling Binary Logging for INSERT and UPDATE

I have run into many situations where there would be a benefit to not replicate a large INSERT or UPDATE from the MASTER to SLAVE in order to not block replication for a long period of time. Luckily MySQL provides a means to disable binary logging for your current session. Simply execute SET SQL_BIN_LOG=0 before the SQL Statement that you do not want logged to the BINLOG. This is a session variable, meaning it will be re-enabled when you close the session, or you can set it back to 1.

In order to keep the Slave in-sync, you will need to execute the same query on the Slave. You can also use mk-table-sync from the Maatkit toolkit to re-sync your table data at a later time.

MySQL – Constraints on Hierarchical Data in a Self-Referential Table

I recently had a client who was using a table structure that utilized a self-referential structure using an ID field and Parent_ID fields. After upgrading to Innodb, his DBA was trying to utilize a foreign key constraint to perform cascading UPDATE and DELETE statements. After much heart-ache, they called me in to find out why it was not working. Unfortunately, I had to share with them the fact that MySQL does not support this type of use of constraints on self-referential tables.

This deviation from SQL standards results affects an ON UPDATE CASCADE or ON UPDATE SET NULL that recurses to update the same table it has previously updated during the cascade. Instead of cascading, it acts like RESTRICT. This means that you cannot use self-referential ON UPDATE CASCADE or ON UPDATE SET NULL operations in MySQL. According to the Innodb developers, this is to prevent infinite loops resulting from cascaded updates, although I would think this could be addressed in future versions. A self-referential ON DELETE SET NULL, on the other hand, is still possible, as is a self-referential ON DELETE CASCADE.