Friday, December 19, 2008

Oxite: A good example after all?

I haven’t been doing much web development these days, but I’m anticipating that will change in the near future. Consequently, I’ve been keeping an eye on the progress of the various web development frameworks, including ASP.NET MVC. When I heard about the Oxite project, I was very interested. Most example projects I find are just that: examples. Oxite is running a production website. I immediately downloaded the project sources and…still haven’t even looked at them.

In the meantime, the community response to Oxite has been…interesting. The general consensus seems to be that they got it all wrong. The last thing I’ve read on the topic (so far) is Glenn Block’s On Oxite post. Ouch! Right now I’m just picturing him running into the Oxite guys at the Microsoft Christmas party…

Once upon a time, I was a minor presence in the MS NNTP forums. This situation reminds me of something I noticed back then: The surest way to get a relevant and timely answer to your question would be to post a completely ridiculous solution, don your asbestos underwear, and wait for the responses to roll in.

Which gets me back to the matter at hand. These days, you can expect that any framework you’d consider using is going to come with with plenty of samples, examples, and documentation. General guidance, especially in the area of “what not to do”, is harder to come by. This is understandable. I’m sure most of us have had the experience of a customer (or co-worker) taking something we’ve done and use it in a manner that makes us cringe. It’s just very difficult to anticipate all the wrong directions that someone might go.

I think this makes the Oxite project almost invaluable as an example. I’ve bookmarked this post and you can be sure that if and when I get around to actually using ASP.NET MVC, I’ll read it again. Of course, this is no consolation for the Oxite folks, who probably feel a bit chastised at this point. No one wants to be thought of as the Gigli of…anything. Well guys, all I can say is thanks for sticking your neck out there on behalf of the rest of us.

kick it on DotNetKicks.com

Thursday, December 18, 2008

Why LINQ is better than SQL, Part 3: Query Composition

A LINQ query is like a SQL view in at least two ways:

  1. No data is retrieved until you really need it.
  2. LINQ queries can be composed with other LINQ queries.

Now I will show you that LINQ queries are better than SQL views because LINQ queries are more modular than views.

The Problem

Suppose that you are creating a dashboard for your business app (which happens to use the same database schema as Microsoft’s AdventureWorks sample database) and one of the data points you need is the sum of total sales by year and month.

A Solution Using SQL Views

If you implemented a SQL view for this purpose, it might look like this:

create view vSalesByYearAndMonth
as
    select
        SUM(so.SubTotal) SubTotal
        , DATEPART(yyyy, so.OrderDate) [Year]
        , DATEPART(m, so.OrderDate) [Month]
    from
        [Sales].[SalesOrderHeader] so
    group by
        DATEPART(yyyy, so.OrderDate), DATEPART(m, so.OrderDate)

Now we can take our view and compose it with other views or queries like this:

create view vSalesByMonth2004
as
    select
        SubTotal
        , [Month]
    from
        vSalesByYearAndMonth
    where
        [Year] = 2004

This is all very good, but now imagine that you’d really like to filter the results by some other aspect of a sales order: the region, the sales person, a product category, etc…You’d need a view for each scenario! Your best bet (assuming you’re using SQL Server) is probably to make a function. That way you can create a bunch of parameters for everything you might conceivably want to filter by and then make the query itself a lot more complicated with a whack of “…where (isnull(@x, x) = x) and (isnull(@y, y) = y) and…”. Whee!

A Solution Using LINQ

Now I’m going to do the same thing using LINQ and Entity Framework (EF). The source of my queries is an ADO.NET Entity Data Model that I generated directly from the AdventureWorks database (if you’re not sure how to do this, here is an example). If you generate a model in this way and never change it, you’re probably missing out on some of the goodness EF has to offer, but for this example I’m sticking with what the wizard gives me.

The first thing I’m going to do is define a class to contain the result of the query (thanks to anonymous types, this step isn’t always necessary):

        class YearMonthTotal
        {
            public int Year { get; set; }
            public int Month { get; set; }
            public decimal Total { get; set; }
        }

Now I’m going to write the LINQ query and encapsulate it in a function (all C# functions must have an explicit return type, which is why the previous step is necessary):

        IEnumerable<YearMonthTotal> GetSalesByYearAndMonth()
        {
            return
            (
                from so in context.SalesOrderHeader
                group so by new { so.OrderDate.Year, so.OrderDate.Month } into sog
                select new YearMonthTotal { Year = sog.Key.Year, Month = sog.Key.Month, Total = sog.Sum(so => so.SubTotal) }
            );
        }

And for the sake of completeness, here’s what the second view looks like in LINQ:

        IEnumerable<YearMonthTotal> GetSalesByMonth2004()
        {
            return (from sbm in GetSalesByYearAndMonth() where sbm.Year == 2004 select sbm);
        }

Fantastic! But, uh, I haven’t done anything in LINQ that I couldn’t easily do with SQL…yet.

Making It Modular

Essentially, I want to be able to pass a “where” clause as a parameter of my function. The new parameter will have the same type as the predicate parameter from Queryable.Where, but I’ll specify SalesOrderHeader as the generic parameter because that is what I’m filtering. Here is the revised function:

        IEnumerable<YearMonthTotal> GetSalesByYearAndMonth(Expression<Func<SalesOrderHeader, bool>> salesOrderFilter)
        {
            var filteredSalesOrders = context.SalesOrderHeader.Where(salesOrderFilter);

            return
            (
                from so in filteredSalesOrders
                group so by new { so.OrderDate.Year, so.OrderDate.Month } into sog
                select new YearMonthTotal { Year = sog.Key.Year, Month = sog.Key.Month, Total = sog.Sum(so => so.SubTotal) }
            );
        }

Wonderfully strange, is it not? How does one use such a beast? Here’s one example (sales for the Northeast):

            var neSalesSummary =
                GetSalesByYearAndMonth(so => so.SalesTerritory.Name == "Northeast");

And something more convoluted (sales for the great state of Maine):

            var maineSalesSummary =
                GetSalesByYearAndMonth(so => so.SalesTerritory.StateProvince.Any(state => state.StateProvinceCode == "ME"));

The Result

Retrieving the results of that last query causes the entire composed LINQ query to be translated into a single SQL query that looks like this:

    SELECT 
    1 AS [C1], 
    [GroupBy1].[K1] AS [C2], 
    [GroupBy1].[K2] AS [C3], 
    [GroupBy1].[A1] AS [C4]
    FROM ( SELECT 
        [Filter2].[K1] AS [K1], 
        [Filter2].[K2] AS [K2], 
        SUM([Filter2].[A1]) AS [A1]
        FROM ( SELECT 
            DATEPART (year, [Extent1].[OrderDate]) AS [K1], 
            DATEPART (month, [Extent1].[OrderDate]) AS [K2], 
            [Extent1].[SubTotal] AS [A1]
            FROM [Sales].[SalesOrderHeader] AS [Extent1]
            WHERE  EXISTS (SELECT 
                cast(1 as bit) AS [C1]
                FROM [Person].[StateProvince] AS [Extent2]
                WHERE ([Extent1].[TerritoryID] = [Extent2].[TerritoryID]) AND (N'ME' = [Extent2].[StateProvinceCode])
            )
        )  AS [Filter2]
        GROUP BY [K1], [K2]
    )  AS [GroupBy1]

It may not be the most readable query, but I can't find any obvious flaws in its logic.

kick it on DotNetKicks.com

Tuesday, December 16, 2008

Why LINQ is better than SQL, Part 2: From comes first

[Note: I have another of this “series” on stand-by, but it’s long and slightly complicated, so it made sense to me to post this one first.]

I’ve recently upgraded my developer workstation to SQL Server 2008 (from 2005). For me the killer feature, hands-down, is IntelliSense. Of course, there’s still room for improvement:

image

Great. How about telling me what I can never remember without firing up the documentation, namely which “expression” is the thing I’m searching for and which is what I’m searching in? Oh well, they’ll probably address this sooner or later. Then there is this: How can IntelliSense be provided for field names when you haven’t specified the source? It can’t, of course. If you want IntelliSense for field names you can type something like “select from tableA as a [join tableB as b]…” then go back and complete the “select” clause. I know I should be grateful, but I find this mildly annoying.

LINQ doesn’t have this problem because…“from” comes first.

IntelliSense is not the only reason I prefer that “from” come first. The biggest reason is the way that I think about queries. Basically, I think of them as a pipeline, with data flowing from one end to the other, getting transformed and filtered along the way:

select statement flow

Coincidentally, this is pretty much the same order of elements in a LINQ query.

Granted, putting the “select” clause first does put SQL slightly closer to natural English sentence construction. Does that make SQL easier to understand? I think not.

Tuesday, December 9, 2008

Overriding LINQ

As I’ve said previously, I love LINQ. I love the fact that so much interesting and useful functionality has been built up around a very simple interface (that being IEnumerable, of course). But, as the saying goes, not everything that can be done should be done.

Take for example Enumerable.Count (in case you hadn’t noticed, the LINQ-to-objects extension methods are hiding in the “System.Linq.Enumerable” class). Given an IEnumerable<T>, it returns the number of items in the sequence. How does it do that when IEnumerator<T> does not have a Count property or method? One way would be to enumerate over the entire sequence, incrementing a counter along the way. Of course, that’s potentially very inefficient and fortunately Enumerable.Count will only take that approach as a last resort. If the source container implements ICollection<T>, then Enumerable.Count will simply return ICollection<T>.Count.

The strategy LINQ uses makes a lot of sense, I’m sure you’ll agree, but I think there is room for improvement. For one thing, the relationships between LINQ and interfaces like ICollection<T> are not documented (probably because they are tenuous). If you were doing something “weird”, LINQ operations might not behave as you expect. Furthermore, the primary reason we have so many data containers to choose from is that different data containers are optimized for different operations. As it is currently implemented, the onus is on LINQ itself to know and choose the best implementation for each LINQ operation. Of course, if the data container in question did not come “in the box”, this is not merely difficult, but impossible.

I think there should be some way to tell LINQ that your data container knows how to do operation X in the most efficient manner possible, and that LINQ should defer its implementation of X to the container. Perhaps the simplest way to accomplish that would be to organize LINQ operations into some number of logical groups and then define interfaces which, in-turn, define the operations in each group. If a container implements one or more of these interfaces, LINQ will defer its implementation of the corresponding operation(s) to the container. Such an interface might look like this:

namespace System.Linq
{
    interface ICountable<TSource> : IEnumerable<TSource>
    {
        int Count<TSource>(Func<TSource, bool> predicate);
        int Count<TSource>();
    }
}

The implementation for Enumerable.Count could then look like this:

        public static int Count<TSource>(this IEnumerable<TSource> source)
        {
            if (source == null) throw Error.ArgumentNull("source");
            var countable = source as ICountable<TSource>;
            if (countable != null) return countable.Count();

            // business as usual...
        }

A system like this would also allow a data container to “opt-out” of a particular operation by immediately throwing an exception. For example, suppose that you wrapped IEnumerable around some kind of stream. You wouldn’t want someone to try to “Count” that, would you?

Ideally, there would be a way to only partly implement an interface and otherwise fall back to the default LINQ implementation, but it would be hard to avoid a “stack overflow” scenario without making it much more complicated (or perhaps I’m missing something obvious).

Saturday, December 6, 2008

How about some syntactic sugar for IEnumerable<>?

Recently, my attitude has changed a bit toward IEnumerable<>. It has always served a very good purpose (foreach is so nice), but it has seemed to me to be an interesting, but mostly hidden implementation detail of the framework. And then there was LINQ.

LINQ is a ton of functionality built around that simple little interface. Which makes me think about using it, exposing it directly in my own API. Why would I choose to expose IEnumerable<>, instead of IList<> or something else? First of all, what other interface is there for exposing a read-only set of “things”? Second, I adhere to the philosophy APIs should expose only the necessary functionality and no more, because:

  1. It makes it easier to understand how to use the API (and how not to use it).
  2. It makes it easier to change or completely re-implement the original API.
  3. The less you can do, the less there is to go wrong.

Of course, IEnumerable<> may be useful, but it certainly isn’t pretty. Instead of IEnumerable<Foo>, I would rather see Foo* or Foo+ or…well, you get the idea.

This is not a new idea (I’m sorry, but you probably won’t find those on this blog). There is, of course, Foo?, which means Nullable<Foo>. And then there is the experimental language (thank you, cut-and-paste), which proves how unoriginal this idea is: In Cω, Foo* is a “stream” of Foo. According to the Cω Overview, “Streams in Cω are closely related to IEnumerable<>…”. Well, who’s going to argue with the brains inside MS Research? Not me!

Friday, December 5, 2008

Why LINQ is better than SQL (Part 1?)

To be very honest, LINQ warms the cockles of my heart. The mere fact that I can write a compiler-checked query in C# that references my C# entity class and that query, through some magic incantations, gets executed as a SQL statement against my database...it makes me happy.

But maybe you're more pragmatic than I am. Maybe you think all this LINQ hoo-ha is pure sophistry. Take my word for it, I could go on and on about the wonders of LINQ, but then "I don't have time for this" do I?..Well, maybe one tiny example (since you asked so nicely):

    var r = (
        from e in context.Employee
        where e.Manager.Contact.LastName == "Word"
        select new { e.Contact.FirstName, e.Contact.LastName });

What's the big deal? This query traverses four logical tables. I didn't have to specify how those tables get joined together. Maybe this is kind of obvious, but it surprises me that standard SQL does not have a similar capability. Databases have these things called foreign keys that formalize the relationship between tables. That being the case, why do I have to spell out the relationship between every table in every query.

Don't get me wrong, I love SQL joins. Like the other day, when I needed to identify value gaps in a set of rows. Once upon a time, I probably would've resorted to some process involving cursors, but a simple self-join does the trick. The problem is that the chance to do something "clever" with a join doesn't come up very often. Most of the time (for me, at least), a join is identical to the relationship defined by a foreign key.

Thursday, December 4, 2008

Visual Studio Team System 2008 Database Edition GDR - RTM

Last week, Visual Studio Team System 2008 Database Edition GDR was released to the developer public. I haven't had a lot of time to evaluate it, but my initial impression is positive.

Of course, I had similar feelings about the VSTS 2005 edition of "Data Dude", but my initial excitement quickly turned to disappointment when I realized it just wasn't going to work for us. I couldn't really blame the product...let's just say our db schema is rather sadistic. The SQL parser didn't stand a chance and died in a flurry of spurious error messages.

The parser in the new version is obviously much improved. Once I got the projects/references set up, I was only left with a manageable number of legitimate errors. On top of that, they've managed to remove the dependency on the "Design DB", which makes the product feel a lot less cumbersome. Add to that a lot of other compelling improvements (including basic code analysis and refactoring tools), and I'm a happy customer.

I can't say that all is roses, yet. When I attempt to "deploy" my changes, targeting the original database, the generated script file is blank. The deployment process generates a ton of warnings, but no errors. Curiously, if I instead do a "schema compare" between the project and the aforementioned database, it works.

Data Dude : Visual Studio Team System 2008 Database Edition GDR - RTM

January '09 BAND Gig

I've agreed to be the presenter for the January '09 BAND (Bangor Area .NET Developers) gig. I'm not exactly sure what day it is (hopefully I'll show up), but I do know the topic: Microsoft's Entity Framework (EF). Here's the blurb I came up with to describe the talk (for the BAND site):

Using entity classes has many advantages over more generic data access methods. On the other hand, entity classes can be time-consuming and surprisingly difficult to implement correctly. In this talk, Daniel will show how Microsoft's new Entity Framework does the heavy lifting to make entity data access simple and powerful.

I'm not sure I'll ever get used to referring to myself in the third person.

Obligatory Introduction Post

Hello Blog World. My name is Daniel. I live in Eastern Maine with my beautiful wife and four affable children. My passion and profession is software development. Presently, I am employed as a Software Architect with Tyler Technologies. I heard about this "blog" thing (invented by some guy by the name of Dave Winer, I think) and I had to give it a go. Wish me luck, my imaginary audience!