Enumeration Maps (C#)
As many who know me will attest, I am a big fan of type safety. That might seem passé nowaday and I might risk seeming a little dated to say so, but I learned how to write code by reading such seminal works as Scott Meyers “Effective C++” et al. The design guidelines carefully proscribed in this text and the others of similar ilk became something of commandments back in the day and one would run the risk of appearing gauche by committing a transgression against any of these rules. Regarding downcasts, Meyers even draws parallels to biblical commandments by stating “Casts are to C++ programmers what the apple was to Eve”1
The reason they are to be avoided in a type safe language like C++, C#, Java etc. is they become a maintenance nightmare. This is because the developer who uses the downcast is really taking on the work of the compiler. Afterall, a primary reason we create object-oriented designs with virtual functions is to offload the work of figuring out what function to call onto the compiler and relieve the developer of this burden. Downcasts undo that.
Now, what does all this talk about type safety have to do with mapping enumerations fields to values? As it turns out, in C# enums can be considered types derived from their underlying representation. That’s not really true, though. Under the covers, all enumerations inherit from the pseudo (or special) class System.Enumeration.† Enumerations are types which exist to enhance the maintainability of one’s code, to increase portability and to reduce the number of potential defects. They exist as types so that the compiler can flag a problem if a developer should mistype the value of an string or number being used to make a decision. Consider the following code:
if (databaseResponse == "SUCCESS") { /*Do something*/}
Here, databaseResponse should only be allowed to contain a limited number of values, arguably no more than three: undefined, success or failure. Enumerations were created to ensure that the developer doesn’t inadvertently mistype the string “SUCCESS” or to ensure that if the stringified value ever changes, the code should not have to change. That’s why this code is better:
enum DbResponse { Undefined, Success, Failure };
/* ... */
if (databaseResponse == DbResponse.Success) { /*Do Something*/}
The Problem With Enumerations
The problem with using enumerations, however, has always been the fact that data returned from external sources, including databases, remote APIs and external files, must unmarshal the external representation of the enumerated value from a string or number into the enumerated type.
C# *is* able to take a string value and parse it to return the enumeration, but this is a very limited approach. First, the string value must match the enumerated value. Often times, the string values are defined by external organizations, including standards bodies that are little concerned about whether the string value can be coerced into a legitimate C# enumeration.
An example of this issue emegered while I was writing some code to deal with the North American Industry Classification System, which defines codes for various industries. These codes are used to communicate the context of a credit transaction, so that a business rule might be applied to the transaction for determining whether or not the transaction is legitimate. In this standard, each code has a name and an associated number. “Bituminous Coal and Lignite Surface Mining” is an industry that is enumerated by the integer value 212111. If one were to create a system by which the string “Bituminous Coal and Lignite Surface Mining” could be made into a text value, one might substitute the spaces for underscores and get “”Bituminous_Coal_and_Lignite_Surface_Mining” and write code using this value instead of the number 21211, which is not terribly descriptive.
enum Naic {Undefined, Bituminous_Coal_and_Lignite_Surface_Mining = 212111, /*...*/}
if (naic == Naic.Bituminous_Coal_and_Lignite_Surface_Mining) { /*...*/}
This seems to work out alright until one comes to some of the other industry names like “Flower, Nursery Stock, and Florists’ Supplies Merchant Wholesalers US”. The issue here is that commas and apostrophes can’t be used to define an enumeration. The thing that typically happens is that the developer comes up with some mapping from these real-world values to code. In the process, information is lost, or worse, duplicate values are encountered.
The other problem not apparent in this particular example, is that the enumerations might not be enumerations of integer values like 212111. Often times, the enumerated values are strings. Let’s go back to the database stored procedure example for a moment. It might be the case that one is tasked with maintaining a legacy applications that calls stored procedures which return more than one string representing success or failure. As ghastly as this seems, it happens. “Success”, rather than being represented by a boolean true, might instead be represented by numerous strings, which need to be parsed like “Ok”, “OK”, “Success: 03103″, “SUCCESS” or “YIPPEE!”. Nevermind that this is an awful way to write stored procedures- it’s legacy code and one must deal with it. Alas, these values do not make for very good enumerations. One cannot simply generated an instance of an enumeration from the responses and apply conditional logic:
DatabaseResponse dbResponse = (DatabaseResponse)(Enum.Parse(typeof(DatabaseResponse), callStoredProcedure()));
The DatabaseResponse enum cannot define “Success: 03103″ or “YIPPEE!”. Both of those values contain characters that cannot be enumerated. Also, there is a downcast here. For the reasons that were already mentioned above, downcasts should be avoided. The downcast exists because Enum.Parse cannot be made to return different types and is therefore restricted to returning objects. This breaks the typesafety in the design.
Finally, the language definition of enumerations is broken with regard to how the language permits a single enumerated value to represent more than one numeric value. It’s not that an enumeration shouldn’t be mapped to more than one numeric value, it’s that it’s defined very poorly indeed. Simply relying on the mechanics of enumerations, one is able to map an enumerated field to more than one numeric value. The following example uses the Flags attribute to allows the destination enumeration to contain more than one value. When rules are applied, the other values can be filtered out by applying a bit-wise “&” operation.
[Flags]
enum Destination
{
Undefined = 0,
Europe = 1,
Africa = 2,
America = 4
}
...
if ((destination & Destination.Africa) == Destination.Africa) { /*...*/}
This is just bad. Nasty bad. Here, in the modern age, one is expected to know the bit-wise representation of an enumerated value! One is asked to apply binary arithmetic to enumerations to perform conditional checks! Further, this “solution” only works for enumerations that have a very limited number of values, since the number of bits available will decrease rather rapidly. Our NAIC example above certainly wouldn’t fit with its 440+ fields.