Sunday, February 15, 2009

The Value of Commenting Code

Never Changing Constants

Our legacy product started life about 10 years ago, about a year ago before I started here. Not including “designer” generated stuff, the entire code base is about 500,000 lines of VB6 code, spread across 35 projects. Considering that those of us who worked on the project from the beginning did not have much combined experience making a Windows application of this size and scope, you can imagine that a lot of mistakes were made. I don’t think there is anything in the code base today that is worthy of The Daily WTF, but that wasn’t always the case.

I’ve seen many things in the code that made me laugh, cringe, or cry, but my favorite is probably this comment:

	'Never changing constants.
	Public Const AUTO_ID_NAME As String = "RecordID"

I guess the commenter wanted to distinguish the above from those constants that do change...

What Is Your Opinion about Code Comments?

I remember being asked about code comments in my interview to work here.

Interviewer: What is your opinion about code comments?

Me: Uh…they’re good?

I guess that’s what they wanted to hear. Since then my thinking around code commenting has become a lot more nuanced. I’ll summarize it this way:

  • Good comments are good.
  • Pointless comments are bad.
  • Bad comments are worse.
  • Most comments (in my experience) are not “good”.

It makes me wonder: What would have been the interviewer’s opinion of my nuanced answer?

At Least It Has Comments

I recently came across this post on The Daily WTF. Here’s the reader’s digest version: The code has been thoroughly commented, but the comments are pretty much the code translated to English. Helpful, I suppose, if you’re not familiar with the programming language being used. Oh well, we’ve all seen worse. Then I starting reading the comments (the comments responding to the post, not the code comments). There was this one:

“H*ll, at least it has comments...”

And this one:

“More comments the better, even if they are "obvious" ones.”

And a bunch more along the same lines. This was very enlightening for me. Apparently trying to write good comments about the intent of the code has been a waste of my time. I’m going to start littering my code with personal anecdotes in the comments. No…even better, I’m going to create a “Comment-ifier” that will automatically improve code quality through randomly generated comments. Patent pending, so don’t even think about it.

The Problem with the Software Development Industry…

I see a link between this attitude about code comments and the recent kerfuffle about some heretical statements uttered by a one Joel Spolsky on Hanselminutes. From the sounds of things, Joel doesn’t think much of the SOLID principles. Gallingly, Joel’s cohort, Jeff Atwood was right there to lend Joel some moral support.

I’m not ready to get on Jeff and Joel’s bandwagon (I do not wish to be excommunicated), but I think they have a point. Programmers Human beings have a strong tendency to latch on to Ideas to make themselves a better programmer human being. I put this down to intellectual laziness. Being better at anything requires continuous effort. Ideas are much easier to implement.

Learning coding “best practices” will (hopefully) make you a better coder, but that is not the key to being a good coder. For one thing, I think it is vital to understand the “why” of those “best practices”. Thinking back to that WTF post, maybe if the person who wrote that code had ever thought about why commented code is important, he or she would’ve come up with some better comments.

The Secret to Good Code

So there are probably at least as many coding “best practices” or methodologies out there as there are fad diets. Do any of them work? All the time? Everywhere? Obviously not, otherwise why are we still talking about it. For what it’s worth, here’s what’s been working for me so far:

  1. Care
  2. Cheat

Care means care about the quality of your code. Care about the next guy who has to use or maintain what you did. Care about whether your code still works after you’ve retired. Care about your future self, who will probably have to do the same thing again in a different project.

Cheat means copy from the smart kids. Someone somewhere has succeeded at doing something very similar to what you’re working on right now. You just need to find it and copy it as best you can. Of course, to be a good cheat, you’ll need to get your eyes on as much good code or concepts as you can. Even if you don’t need it today, you may at some point in the future. A case in point: One of my all-time favorite books in my library is Writing Compilers and Interpreters by Ronald Mak. I read* it so much, the book is now literally in pieces. Have I ever written a compiler or interpreter? No. Have the techniques outlined in the book helped me with other tasks? Absolutely yes, from simple things like how to properly handle CSV formats with embedded quotes to bigger ideas like how to take a relatively massive problem and break it down into manageable pieces.

Honestly, I don’t think I can say enough about the value of being exposed to as many programming languages, frameworks, architectures, or projects as possible. This is good because:

  1. You have a much better chance of knowing the right tool for the job.
  2. Seeing the same problem solved in multiple different ways will expand your mind.
  3. All those ideas bouncing around in your brain will eventually collide and stick together to make new ideas (not all of them will be good, of course).

* “read” here is intended as past tense. Don’t you just love the English language?

Sunday, January 18, 2009

Comparing Functional to Imperative, Part 2: Economies of Syntax

This is part 2 of a series (consisting of at least two posts) comparing imperative programming languages with functional programming languages. Here is Part 1: Context determines meaning. By the way, Haskell is my prototypical functional language because it’s the purest functional language about which I know anything. C# is my prototypical imperative language because I know it pretty well.

It seems that functional algorithms and programs are quite a bit shorter than their imperative counterparts. For example, here is a possible definition of a function to calculate square values in Haskell:

	square x = x * x

For sake of comparison, here an equivalent function defined in C#:

	static double Square (double x) { return (x * x); }

The Haskell version is less than a third as long as the C# version. As far as I’m concerned, this is a win for Haskell. I much prefer a syntax that does not get in the way of what a program actually does. This is mostly why I gravitated to C# even after several years of nothing but VB. VB syntax may be easier to learn for the beginner, but I find it slows me down because I have to read (or worse write) past a bunch of words before I get to the meaningful part. Of course, C# may be concise compared to VB, but it’s hard to imagine a more concise syntax than is afforded by Haskell.

How can a functional language like Haskell get by with such an economy of syntax? One reason is that Haskell was designed to create functions, first and foremost. Do one thing and do it well, as the saying goes. Another reason is that Haskell is very good about inferring the types of things from other things. You can give things an explicit type in Haskell, but often the only reason to do so is to help the compiler help you when you screw up. For example, Haskell knows that ‘square’ is a function that returns a numeric value. It knows this because it knows that the ‘*’ operator returns a numeric value. It also knows that parameter ‘x’ must be a numeric value, likewise from the definition of ‘*’. In other words, Haskell will not let you do something like this:

	y = square "foo"

Of course, it is possible to define a function in Haskell that does not, in its definition, imply a particular return type at all. In such a case, Haskell will define the function in a generic fashion. Think C# generics without all the syntactic fluff. Consider Haskell’s ‘foldl’ (fold left) function:

	foldl f z [] = z
	foldl f z (x:xs) = foldl f (f z x) xs

Basically, ‘foldl’ applies an operation to each item in a list, accumulating the result of the operation along the way. The accumulated value is the return value of ‘foldl’. As of version 3.5, the .NET Frameworks has a function that serves the same purpose: Enumerable.Aggregate. What does it look like? Hang on to your hat…

	public static TAccumulate Aggregate(
		this IEnumerable source,
		TAccumulate seed,
		Func func)
	{
		if (source == null)
		{
			throw Error.ArgumentNull("source");
		}
		if (func == null)
		{
			throw Error.ArgumentNull("func");
		}
		TAccumulate local = seed;
		foreach (TSource local2 in source)
		{
			local = func(local, local2);
		}
		return local;
	}

Why is Haskell’s version so much shorter? One obvious answer is Haskell’s penchant for exceedingly short parameter names (something I don’t care for, actually). It’s much more than that, though. In Haskell’s version, there are no explicit types, no explicit generic parameters, and much less ‘punctuation’. Does this mean that Haskell is weakly-typed? Not at all. Haskell’s version is at least as strongly-typed as the .NET version. In fact, considering that Haskell will not let you pass invalid parameters to ‘foldl’, it can be considered to be more strongly typed than the .NET function, and also eliminates the need for parameter validation in the body of the function.

I’m not sure what part 3 of the series will be (or even if there will be a part 3), but I’m considering a post about why IO is a challenge for pure functional languages and how Monads make it much better.

Comparing Functional and Imperative, Part 1: Context determines meaning

Over the past year, I’ve taken a strong interest in functional programming. I’m not sure where it started, but it may have been Brian Beckman’s Don’t fear the Monads on Channel 9. It went pretty much completely over my head, but my interest was definitely piqued. It has been in my mind to do a series of posts on functional programming once I became an expert on the subject. Well, I am so not an expert at this point, but I’ve decided to consider that an advantage. Once something makes perfect sense to you, it’s often hard to remember why it didn’t make perfect sense before.

Functional programming is an alternate universe of programming where hard things (e.g. concurrency) become easier and easy things (e.g. writing to a file) become at least a bit harder. Besides the fact that functional programming is just plain different from imperative programming, another challenge for an imperative programmer learning functional programming is the fact that the same or similar syntax is used to represent quite different things. Consider:

	a = 3
	b = a + 2

In an imperative language context, this would probably mean “assign the value 3 to variable ‘a’, then evaluate a + 2 and assign the resulting value to variable ‘b’”. In a functional programming language such as Haskell, ‘a’ and ‘b’ can be thought of as being functions. This may not technically be true, particularly in the case of ‘a’, but I think it works to think of them that way. Just to drive the point home, here is an equivalent definition in C#:

	int a() { return 3; }
	int b() { return a() + 2; }

One concept of functional programming that is quite different from imperative programming is the idea that once you define ‘a’, you can’t change it. Of course, if ‘a’ is actually a function, then this is to be expected. Even in imperative programming languages, the definitions of functions generally don’t change in a compiled program. The fact that ‘a’ is a function also explains something else that is at first peculiar about functional programming, which is the order in which statements get evaluated. Consider:

	a = 3
	b = a + 2
	c = b + 1

In an imperative program, the first line would be evaluated first, followed by the second line and so on. But what if ‘a’, ‘b’ and ‘c’ are actually functions? The equivalent C# code might look something like this:

	int a() { return 3; }
	int b() { return a() + 2; }
	int c() { return b() + 1; }

If there were some code that referenced the ‘c’ function, program execution would jump to function ‘c’, then to function ‘c’ and so on. Furthermore, if no other code in the program referenced functions ‘b’ or ‘c’, then neither of those functions would ever execute. So it is in functional programming. The point is that if we think of syntax like “foo = ...” as defining a function, it’s easier to grasp as opposed to thinking of it as a statement with special rules.

When I learned that pure functional programming has no mutable variables and no “side-effects”, my first thought was “How can I actually do anything?”. Well, Haskell (a pure functional language if there ever was one) does have side-effects, depending on how you choose to define them. For example, there are functions that perform IO (reading and writing to file streams). On the other hand, these side-effects are definitely not incidental, as they are in imperative languages. In Haskell, you know when a function may have side-effects. Generally, Haskell treats side-effecting code rather like nuclear waste, encapsulating it in a special container called a “Monad” and posting lots of scary warning signs around it.

Can a pure functional programming language be useful without side-effects? I think the answer is yes. I work for a company that creates LOBs (Line of Business apps) first and foremost. When you take all the layered architecture and technology concepts out of the picture, an LOB looks something like this:

image

That yellow arrow in the middle of the diagram is a transformation. It transforms the shape of the data as it comes from the user interface into the shape it has in the database. You could imagine that it’s a pipeline, of sorts, constructed of all manner of logical loops and branches. There is no reason that yellow arrow needs to have any mutable state within itself. In fact, for reasons of scalability and consistency, it would be better if that yellow arrow didn’t have any mutable state inside it. As such functional programs are uniquely suited to constructing this sort of pipeline.

Continue to Part 2: Economies of Syntax.

Thursday, January 1, 2009

A Confession: I don’t have a degree

I ran into a blogo-sphere discussion recently about the merits of having a CS degree. I found this particularly interesting because, well, I don’t have a degree. Shocking, I know, coming from someone who claims to be a “Software Architect” (well, that is my job title). I’ll give you a minute to recover.

To be very cathartically honest, I wish it were not true. I have a full-time+ job and a full-time+ family, but if I had an opportunity to pursue a degree, I’d seriously consider it. So there you have it. There is at least one non-degree-possessing person in this world who thinks having a degree is a good thing.

On the other hand, a lot of what I read about the purported merits of having a degree is…just crap, in my opinion. I believe and accept the fact that a lot of people have learned a lot of things in a college or university setting, things that they would never learn in “real” life. I believe and accept the fact that for most people, there isn’t a better way to get a well-rounded education. I do not believe those things are true for me.

It has been well established that there are at least a few different ways that people learn best. I think it is also reasonable to conclude that what any person has learned or been exposed to in his or her life is going to be unique to that person. For these and other reasons, concept “A” might really “click” with Alice, while Bob just doesn’t get it. For concept “B”, it may be just the opposite. In a classroom setting, the extra time it takes to teach concept “A” to Bob is a waste of time for Alice. The extra time it takes to teach concept “B” to Alice is a waste of time for Bob. This leads me to conclude that classroom learning is not particularly time-efficient.

I think the primary benefit of learning in a college/university classroom setting is the fact that most people do not have the motivation or, in some cases, the ability to educate themselves.

I love to learn. It might be more correct to say that I’m addicted to information…of any sort, really. Truly, I’m not sure I’ve ever encountered any subject matter or concept that didn’t interest me, at least a little bit. People have told me that they are intimidated to have a conversation with a subject matter expert concerning said subject matter. They’re afraid of asking an ignorant question. I can’t relate to that. I’ll happily make a fool of myself, asking that person a thousand stupid questions, just for the chance to learn something.

One idea I often encounter is the idea that a self-educated person is going to have a “narrow” education. Maybe they can become competent in their field, but they’ll be weak in other areas, which might hamper their overall performance. I think this idea “rings true” with a lot of folks. It seems logical and fair-minded. On the other hand, I’ve never encountered “hard numbers” to support this idea.

Of course, I can’t give you any hard numbers, either. In fact I can only offer one data-point: myself. If I were to be given a general knowledge test covering various aspects of Math, Science, History, English, etc., I feel confident that my score would compare very favorably with scores obtained by administering the same test to a random selection of college educated people.

Having said all that, why do I wish I had gotten a degree when I had the chance? Two reasons: First, I did attend college for three semesters. It did not go well. I spent too much time in the computer lab when I should have been in the classroom or completing assignments. Basically, I feel that I failed at college and that bothers me. Second, “Perception Is Reality”. The vast majority of people I work with and, I would guess, the vast majority of people in my field have a degree. I would venture that the majority of these people “perceive” that having a degree is better than not having one (the example set by Bill Gates notwithstanding).

I know of at least one case where I was not even considered for a position on the basis of not having a degree. I doubt that was an isolated case. If you’re dealing with an overwhelming glut of resumes, you may very well decide to trash all the resumes that don’t appear to “measure up” in all respects.

This reminds me of the recent experience I had doing the initial interview of several candidates for a Software Developer position in my division. This was a first for me, and the experience was…fascinating. I am thinking of one candidate in particular. This person came with all the right credentials. Not only did this person know everything, but this person had been working for several years at a “big name” company, and before that another “big name” company (we verified this person’s employment history). This person had an MBA.

On that basis, I was shocked, truly, at what this person did not know about Software Development. The point in the interview that I began to understand the situation was when I asked “It says on your resume that you have experience with technology ‘X’. Can you tell me more about how you used that technology?” This person’s answer was something along the lines of “Well, I didn’t use technology ‘X’ directly. My job was to review and approve other people’s work and send it up the chain.” Imagine several more responses along the same lines. Finally I asked this person to demonstrate how to solve a particular (fairly simple) problem using the language of their choice. Imagine a blank stare. I found myself describing the problem in simpler and simpler terms. The blank stare remained. I eventually realized that this person could not tell me what a “string” was. Yikes!

Friday, December 19, 2008

Oxite: A good example after all?

I haven’t been doing much web development these days, but I’m anticipating that will change in the near future. Consequently, I’ve been keeping an eye on the progress of the various web development frameworks, including ASP.NET MVC. When I heard about the Oxite project, I was very interested. Most example projects I find are just that: examples. Oxite is running a production website. I immediately downloaded the project sources and…still haven’t even looked at them.

In the meantime, the community response to Oxite has been…interesting. The general consensus seems to be that they got it all wrong. The last thing I’ve read on the topic (so far) is Glenn Block’s On Oxite post. Ouch! Right now I’m just picturing him running into the Oxite guys at the Microsoft Christmas party…

Once upon a time, I was a minor presence in the MS NNTP forums. This situation reminds me of something I noticed back then: The surest way to get a relevant and timely answer to your question would be to post a completely ridiculous solution, don your asbestos underwear, and wait for the responses to roll in.

Which gets me back to the matter at hand. These days, you can expect that any framework you’d consider using is going to come with with plenty of samples, examples, and documentation. General guidance, especially in the area of “what not to do”, is harder to come by. This is understandable. I’m sure most of us have had the experience of a customer (or co-worker) taking something we’ve done and use it in a manner that makes us cringe. It’s just very difficult to anticipate all the wrong directions that someone might go.

I think this makes the Oxite project almost invaluable as an example. I’ve bookmarked this post and you can be sure that if and when I get around to actually using ASP.NET MVC, I’ll read it again. Of course, this is no consolation for the Oxite folks, who probably feel a bit chastised at this point. No one wants to be thought of as the Gigli of…anything. Well guys, all I can say is thanks for sticking your neck out there on behalf of the rest of us.

kick it on DotNetKicks.com

Thursday, December 18, 2008

Why LINQ is better than SQL, Part 3: Query Composition

A LINQ query is like a SQL view in at least two ways:

  1. No data is retrieved until you really need it.
  2. LINQ queries can be composed with other LINQ queries.

Now I will show you that LINQ queries are better than SQL views because LINQ queries are more modular than views.

The Problem

Suppose that you are creating a dashboard for your business app (which happens to use the same database schema as Microsoft’s AdventureWorks sample database) and one of the data points you need is the sum of total sales by year and month.

A Solution Using SQL Views

If you implemented a SQL view for this purpose, it might look like this:

create view vSalesByYearAndMonth
as
    select
        SUM(so.SubTotal) SubTotal
        , DATEPART(yyyy, so.OrderDate) [Year]
        , DATEPART(m, so.OrderDate) [Month]
    from
        [Sales].[SalesOrderHeader] so
    group by
        DATEPART(yyyy, so.OrderDate), DATEPART(m, so.OrderDate)

Now we can take our view and compose it with other views or queries like this:

create view vSalesByMonth2004
as
    select
        SubTotal
        , [Month]
    from
        vSalesByYearAndMonth
    where
        [Year] = 2004

This is all very good, but now imagine that you’d really like to filter the results by some other aspect of a sales order: the region, the sales person, a product category, etc…You’d need a view for each scenario! Your best bet (assuming you’re using SQL Server) is probably to make a function. That way you can create a bunch of parameters for everything you might conceivably want to filter by and then make the query itself a lot more complicated with a whack of “…where (isnull(@x, x) = x) and (isnull(@y, y) = y) and…”. Whee!

A Solution Using LINQ

Now I’m going to do the same thing using LINQ and Entity Framework (EF). The source of my queries is an ADO.NET Entity Data Model that I generated directly from the AdventureWorks database (if you’re not sure how to do this, here is an example). If you generate a model in this way and never change it, you’re probably missing out on some of the goodness EF has to offer, but for this example I’m sticking with what the wizard gives me.

The first thing I’m going to do is define a class to contain the result of the query (thanks to anonymous types, this step isn’t always necessary):

        class YearMonthTotal
        {
            public int Year { get; set; }
            public int Month { get; set; }
            public decimal Total { get; set; }
        }

Now I’m going to write the LINQ query and encapsulate it in a function (all C# functions must have an explicit return type, which is why the previous step is necessary):

        IEnumerable<YearMonthTotal> GetSalesByYearAndMonth()
        {
            return
            (
                from so in context.SalesOrderHeader
                group so by new { so.OrderDate.Year, so.OrderDate.Month } into sog
                select new YearMonthTotal { Year = sog.Key.Year, Month = sog.Key.Month, Total = sog.Sum(so => so.SubTotal) }
            );
        }

And for the sake of completeness, here’s what the second view looks like in LINQ:

        IEnumerable<YearMonthTotal> GetSalesByMonth2004()
        {
            return (from sbm in GetSalesByYearAndMonth() where sbm.Year == 2004 select sbm);
        }

Fantastic! But, uh, I haven’t done anything in LINQ that I couldn’t easily do with SQL…yet.

Making It Modular

Essentially, I want to be able to pass a “where” clause as a parameter of my function. The new parameter will have the same type as the predicate parameter from Queryable.Where, but I’ll specify SalesOrderHeader as the generic parameter because that is what I’m filtering. Here is the revised function:

        IEnumerable<YearMonthTotal> GetSalesByYearAndMonth(Expression<Func<SalesOrderHeader, bool>> salesOrderFilter)
        {
            var filteredSalesOrders = context.SalesOrderHeader.Where(salesOrderFilter);

            return
            (
                from so in filteredSalesOrders
                group so by new { so.OrderDate.Year, so.OrderDate.Month } into sog
                select new YearMonthTotal { Year = sog.Key.Year, Month = sog.Key.Month, Total = sog.Sum(so => so.SubTotal) }
            );
        }

Wonderfully strange, is it not? How does one use such a beast? Here’s one example (sales for the Northeast):

            var neSalesSummary =
                GetSalesByYearAndMonth(so => so.SalesTerritory.Name == "Northeast");

And something more convoluted (sales for the great state of Maine):

            var maineSalesSummary =
                GetSalesByYearAndMonth(so => so.SalesTerritory.StateProvince.Any(state => state.StateProvinceCode == "ME"));

The Result

Retrieving the results of that last query causes the entire composed LINQ query to be translated into a single SQL query that looks like this:

    SELECT 
    1 AS [C1], 
    [GroupBy1].[K1] AS [C2], 
    [GroupBy1].[K2] AS [C3], 
    [GroupBy1].[A1] AS [C4]
    FROM ( SELECT 
        [Filter2].[K1] AS [K1], 
        [Filter2].[K2] AS [K2], 
        SUM([Filter2].[A1]) AS [A1]
        FROM ( SELECT 
            DATEPART (year, [Extent1].[OrderDate]) AS [K1], 
            DATEPART (month, [Extent1].[OrderDate]) AS [K2], 
            [Extent1].[SubTotal] AS [A1]
            FROM [Sales].[SalesOrderHeader] AS [Extent1]
            WHERE  EXISTS (SELECT 
                cast(1 as bit) AS [C1]
                FROM [Person].[StateProvince] AS [Extent2]
                WHERE ([Extent1].[TerritoryID] = [Extent2].[TerritoryID]) AND (N'ME' = [Extent2].[StateProvinceCode])
            )
        )  AS [Filter2]
        GROUP BY [K1], [K2]
    )  AS [GroupBy1]

It may not be the most readable query, but I can't find any obvious flaws in its logic.

kick it on DotNetKicks.com

Tuesday, December 16, 2008

Why LINQ is better than SQL, Part 2: From comes first

[Note: I have another of this “series” on stand-by, but it’s long and slightly complicated, so it made sense to me to post this one first.]

I’ve recently upgraded my developer workstation to SQL Server 2008 (from 2005). For me the killer feature, hands-down, is IntelliSense. Of course, there’s still room for improvement:

image

Great. How about telling me what I can never remember without firing up the documentation, namely which “expression” is the thing I’m searching for and which is what I’m searching in? Oh well, they’ll probably address this sooner or later. Then there is this: How can IntelliSense be provided for field names when you haven’t specified the source? It can’t, of course. If you want IntelliSense for field names you can type something like “select from tableA as a [join tableB as b]…” then go back and complete the “select” clause. I know I should be grateful, but I find this mildly annoying.

LINQ doesn’t have this problem because…“from” comes first.

IntelliSense is not the only reason I prefer that “from” come first. The biggest reason is the way that I think about queries. Basically, I think of them as a pipeline, with data flowing from one end to the other, getting transformed and filtered along the way:

select statement flow

Coincidentally, this is pretty much the same order of elements in a LINQ query.

Granted, putting the “select” clause first does put SQL slightly closer to natural English sentence construction. Does that make SQL easier to understand? I think not.