Thursday, December 18, 2008

Why LINQ is better than SQL, Part 3: Query Composition

A LINQ query is like a SQL view in at least two ways:

  1. No data is retrieved until you really need it.
  2. LINQ queries can be composed with other LINQ queries.

Now I will show you that LINQ queries are better than SQL views because LINQ queries are more modular than views.

The Problem

Suppose that you are creating a dashboard for your business app (which happens to use the same database schema as Microsoft’s AdventureWorks sample database) and one of the data points you need is the sum of total sales by year and month.

A Solution Using SQL Views

If you implemented a SQL view for this purpose, it might look like this:

create view vSalesByYearAndMonth
as
    select
        SUM(so.SubTotal) SubTotal
        , DATEPART(yyyy, so.OrderDate) [Year]
        , DATEPART(m, so.OrderDate) [Month]
    from
        [Sales].[SalesOrderHeader] so
    group by
        DATEPART(yyyy, so.OrderDate), DATEPART(m, so.OrderDate)

Now we can take our view and compose it with other views or queries like this:

create view vSalesByMonth2004
as
    select
        SubTotal
        , [Month]
    from
        vSalesByYearAndMonth
    where
        [Year] = 2004

This is all very good, but now imagine that you’d really like to filter the results by some other aspect of a sales order: the region, the sales person, a product category, etc…You’d need a view for each scenario! Your best bet (assuming you’re using SQL Server) is probably to make a function. That way you can create a bunch of parameters for everything you might conceivably want to filter by and then make the query itself a lot more complicated with a whack of “…where (isnull(@x, x) = x) and (isnull(@y, y) = y) and…”. Whee!

A Solution Using LINQ

Now I’m going to do the same thing using LINQ and Entity Framework (EF). The source of my queries is an ADO.NET Entity Data Model that I generated directly from the AdventureWorks database (if you’re not sure how to do this, here is an example). If you generate a model in this way and never change it, you’re probably missing out on some of the goodness EF has to offer, but for this example I’m sticking with what the wizard gives me.

The first thing I’m going to do is define a class to contain the result of the query (thanks to anonymous types, this step isn’t always necessary):

        class YearMonthTotal
        {
            public int Year { get; set; }
            public int Month { get; set; }
            public decimal Total { get; set; }
        }

Now I’m going to write the LINQ query and encapsulate it in a function (all C# functions must have an explicit return type, which is why the previous step is necessary):

        IEnumerable<YearMonthTotal> GetSalesByYearAndMonth()
        {
            return
            (
                from so in context.SalesOrderHeader
                group so by new { so.OrderDate.Year, so.OrderDate.Month } into sog
                select new YearMonthTotal { Year = sog.Key.Year, Month = sog.Key.Month, Total = sog.Sum(so => so.SubTotal) }
            );
        }

And for the sake of completeness, here’s what the second view looks like in LINQ:

        IEnumerable<YearMonthTotal> GetSalesByMonth2004()
        {
            return (from sbm in GetSalesByYearAndMonth() where sbm.Year == 2004 select sbm);
        }

Fantastic! But, uh, I haven’t done anything in LINQ that I couldn’t easily do with SQL…yet.

Making It Modular

Essentially, I want to be able to pass a “where” clause as a parameter of my function. The new parameter will have the same type as the predicate parameter from Queryable.Where, but I’ll specify SalesOrderHeader as the generic parameter because that is what I’m filtering. Here is the revised function:

        IEnumerable<YearMonthTotal> GetSalesByYearAndMonth(Expression<Func<SalesOrderHeader, bool>> salesOrderFilter)
        {
            var filteredSalesOrders = context.SalesOrderHeader.Where(salesOrderFilter);

            return
            (
                from so in filteredSalesOrders
                group so by new { so.OrderDate.Year, so.OrderDate.Month } into sog
                select new YearMonthTotal { Year = sog.Key.Year, Month = sog.Key.Month, Total = sog.Sum(so => so.SubTotal) }
            );
        }

Wonderfully strange, is it not? How does one use such a beast? Here’s one example (sales for the Northeast):

            var neSalesSummary =
                GetSalesByYearAndMonth(so => so.SalesTerritory.Name == "Northeast");

And something more convoluted (sales for the great state of Maine):

            var maineSalesSummary =
                GetSalesByYearAndMonth(so => so.SalesTerritory.StateProvince.Any(state => state.StateProvinceCode == "ME"));

The Result

Retrieving the results of that last query causes the entire composed LINQ query to be translated into a single SQL query that looks like this:

    SELECT 
    1 AS [C1], 
    [GroupBy1].[K1] AS [C2], 
    [GroupBy1].[K2] AS [C3], 
    [GroupBy1].[A1] AS [C4]
    FROM ( SELECT 
        [Filter2].[K1] AS [K1], 
        [Filter2].[K2] AS [K2], 
        SUM([Filter2].[A1]) AS [A1]
        FROM ( SELECT 
            DATEPART (year, [Extent1].[OrderDate]) AS [K1], 
            DATEPART (month, [Extent1].[OrderDate]) AS [K2], 
            [Extent1].[SubTotal] AS [A1]
            FROM [Sales].[SalesOrderHeader] AS [Extent1]
            WHERE  EXISTS (SELECT 
                cast(1 as bit) AS [C1]
                FROM [Person].[StateProvince] AS [Extent2]
                WHERE ([Extent1].[TerritoryID] = [Extent2].[TerritoryID]) AND (N'ME' = [Extent2].[StateProvinceCode])
            )
        )  AS [Filter2]
        GROUP BY [K1], [K2]
    )  AS [GroupBy1]

It may not be the most readable query, but I can't find any obvious flaws in its logic.

kick it on DotNetKicks.com

0 comments: