Thursday, October 2, 2008

Multivalue / Multivalued Fields vs Subtabling

A coworker of mine recently asked me how I would redesign a part of a our product's database if I were given the opportunity to start from scratch. Currently we have a security field that is a column in everyone of our user's records. It contains a string of Y's, N's, and R's. Y's stand for read and write, R's for read only, and N's for no rights at all. Each character's position is linked to a right for a particular section of our product. For example let's say our product was designed to manage a Burger Shop and there were there sections of the product to which we wanted to assign separate rights. Let's call these three sections burger inventory, payroll, and income. The way our product is currently designed we'd assign inventory to position 1 in the string, payroll to position two, and income to position three. The string would end up looking something like this:

RYN
(In this case this string would mean that the user had read rights to inventory, read and write rights to payroll, and no rights at all to income)

This design works nicely for some things and quite miserably for others. I'll start with the good:

1. When checking a right it is very easy to write a SQL statement to grab the contents of one column in a user's record.

2. It is very easy to update just one column in a user's record.

The Bad:

1. When writing code a programmer will often have to isolate one of the characters in this security string, which means they'll have to do something like so Substring(securityString, 2, 1). There are two problems with this:

a. For future programmers looking at this code, there's nothing there that tells the user which right the programmer was originally trying to examine. In fact, even if the programmer has access to the database and looks at the securityString column in the user's record they still won't be able to tell which right has been assigned to area of the product. This means that a separate list has to be maintained somewhere that the programmer can reference, so that he/she knows which string position is assigned to which right. This approach is cumbersome though and requires that all programmers and personnel update the list whenever a new right is added, even though the list is nonessential for the structure to function (it is essential for readability though).

b. Secondly, when even one right is update/modified the whole string must be updated which leaves room for error (i.e. accidentally changing one of the other string positions).

This leads me to the alternative design. In this design we subtable out the rights like so. First we create a master list of all the rights in a table called Rights. The rights table has a key field, a description field. In our case our rights table might look something like this:


Rights:

Key     |Description       

INCOME  |Income area right

PAYROLL |Payroll area right

INV     |Inventory area right






Then we create another table called USER_RIGHTS which has three fields. One with a foreign key to USER table's primary key, one with a value of Y, R, or N, and one with foreign key to the RIGHTS table's primary key. This table would look as follows:



L_USER |VALUE |RIGHT   

1      |R     |INV
1      |Y     |PAYROLL
1      |N     |PAYROLL



The above example gives us the same results as having a multivalue field populated with RYN. The great part about this design is that now it becomes very obvious as to what each right represents. It also becomes very easy to add new rights and to access existing rights. As a programmer, if I can't remember which code a particular area uses for it's right I can just look in the RIGHTS table and check the descriptions and hopefully I can find it there (assuming that good descriptions were used).



Drawbacks:



1. For every right added that another record for every user. For instance if you have 100 users and a 100 rights that's 10000 records in the USER_RIGHTS table. However this could be cut down considerable by not including the records that a user has no rights to, in the USER_RIGHTS table. For instance in the above example we stated the user had no rights to the income area by adding a record that said N for INCOME. Instead we could just take the absence of a record with the INC key in it, as meaning that the user has no rights to that particular area. This makes it even easier to add new rights to the RIGHTS table because not every USER_RIGHTS table would have to be updated.



2. When updating the USER_RIGHTS table if you are updating more than one right at a time, it would require more than one update statement. In fact, it would require one update statement for Y, R, and N (or a delete statement for N, if you decided that the absence of a record means that the user has no rights). In the multivalue design only one update is ever required because it is all stored in one field. The upshot to having to right individual update statements is that it becomes much more difficult to accidentally update rights for other areas.



All in all the subtable method is much more readable and much more flexible. At the end of our conversation we both decided that it was the best choice.