Monthly Archives: June 2013

MySQL : Schema Optimization

Good logical and physical database design is essential for high performance of a system. In this post, I will list down some of the best practices that you may use while designing your schema.

Choose the right data type

Choosing the right data type goes a long way in improving the performance of a website. The following guidelines may help you in making better choices:

  • Try to use the smallest data types that can correctly store and represent the data. The reason is obvious. Smaller data types use less disk space, memory and CPU cache.
  • Use simpler data types. For example, use integers where you can instead of strings.
  • Avoid Null if possible. Most of the times we do not make a column as NOT NULL, even though there is no NULL value in that column. This happens because columns are null-able by default. We should avoid null-able columns, because they makes optimization harder for database.

Guidelines for choosing right data type

It is not hard to choose a data type on a broad level. For example, it is not hard to decide whether to use String, Date, Numeric type. However, you may follow the guidelines mentioned below to choose a specific data type.

Whole Numbers

MySQL supports TINYINT, SMALLINT, MEDIUMINT, INT, and BIGINT requiring 1,2,3,4 and 8 bytes of storage space, respectively. While using these data types, if you add an UNSIGNED attribute, the storage capacity of these datatypes, effectively doubles. Both, signed and unsigned attributes use the same amount of storage.

Real Numbers

Numbers having fractional part are real numbers. MySQL support FLOAT and DOUBLE for approximate floating point math and DECIMAL to store exact fractional numbers. DECIMAL can also be use to store numbers that are so large that they do not fit into BIGINT.

FLOAT and DOUBLE use lesser amount of space than DECIMAL. Use DECIMAL only in case when exact calculations are required. However, before using DECIMAL, explore the option to storing the values as multiple of the smallest fraction of currency you need to handle. For example, if you want to store the data up to two decimal places, you can  store the value multiplying it my 100 and use BIGINT. It will result in better performance

String Types

VARCHAR and CHAR

VARCHAR stores variable-length string and is the most common string data type. It requires less storage space because it uses as much space as needed.  VARCHAR also uses 1 or 2 extra bytes to record the value’s length. For example, VARCHAR(11) will use up to 11 bytes of storage space.

VARCHAR helps performance because it uses less space. However, it may lead to fragmentation. So, use VARCHAR when the average column size is much smaller than the maximum column length, or when updates to a particular column is rare. MySQL preserves the trailing spaces when you store and retrieve values.

CHAR is fixed-length. CHAR is useful if you want to store very short strings or if the values are nearly the same length. For example, CHAR is a good choice for MD5 values of user passwords, which are always the same length. CHAR is also not prone to fragmentation.

BINARY and VARBINARY

These datatypes store binary strings. These are useful to store binary data and want MySQL to compare the values as bytes instead of characters. Byte comparisons are faster than string comparisons.

BLOB and Text

These datatypes are designed to store large amounts of data as either binary or character strings. Characeter types haves TINYTEXT, SMALLTEXT, TEXT, MEDIUMTEXT, and LONGTEXT, and the binary types are TINYBLOB, SMALLBLOB, BLOB, MEDIUMBLOB and LONGBLOB. Interestingly, TEXT is a synonym for SMALLTEXT and BLOB is a synonym for SMALLBLOB. The difference between binary and character types is that binary types stores binary data with no collation or charecter set, but TEXT types have a charecter set and collation.

One important point to note about regarding sorting BLOB and TEXT columns is that MySQL instead of sorting the full length of the string, it sorts only the first max_sort_length of such columns.

ENUM

ENUM columns can be used to store values which are predefined for a particular type. For example, if you have a table with a column name status, then ENUM can be used to store values for that column. The benefit of doing this because MySQL stores ENUM columns very efficiently. It used just one or two bytes as internally it stores the values as integers.

Please note that by default MySQL sorts fields by internal integer values, not by the string themselves. So, it is advisable to specify ENUM members in the order in which you want them to sort.

One downside of using the ENUM is that the list of strings is fixed, and adding or removing strings requires the use of ALTER table. One more downside of using ENUM is that it is slower to join CHAR or VARCHAR with ENUM than with another CHAR or VARCHAR.

Date and Time types

MySQL supports various kinds of date and time values, such as YEAR and DATE and most of the times there is not two choices about which one to choose. The question of choice arise when you need to store both the date and the time. MySQL offers two very similar data types for this purpose: DATETIME and TIMESTAMP.

The main difference between the DATETIME and TIMESTAMP types are:

  • DATETIME uses 8 bytes of storage space and TIMESTAMP uses only 4 bytes of space.
  • DATETIME store the date and time into an integer in YYYYMMDDHHMMSS format and TIMESTAMP stores the number of seconds elapsed since Midnight, Jan 1, 1970.
  • TIMESTAMP preserves the timezone.
  • TIMESTAMP columns are not null be default.
  • MySQL sets the values of first TIMESTAMP column during insertion and updation if it’s value is not specified.

By default one should you TIMESTAMP to store date and time. Also, when you need to store time upto subsecond resolution, you can use a BIGINT column and the store then values as a timestamp in microseconds.