Tuesday, December 9, 2008

Overriding LINQ

As I’ve said previously, I love LINQ. I love the fact that so much interesting and useful functionality has been built up around a very simple interface (that being IEnumerable, of course). But, as the saying goes, not everything that can be done should be done.

Take for example Enumerable.Count (in case you hadn’t noticed, the LINQ-to-objects extension methods are hiding in the “System.Linq.Enumerable” class). Given an IEnumerable<T>, it returns the number of items in the sequence. How does it do that when IEnumerator<T> does not have a Count property or method? One way would be to enumerate over the entire sequence, incrementing a counter along the way. Of course, that’s potentially very inefficient and fortunately Enumerable.Count will only take that approach as a last resort. If the source container implements ICollection<T>, then Enumerable.Count will simply return ICollection<T>.Count.

The strategy LINQ uses makes a lot of sense, I’m sure you’ll agree, but I think there is room for improvement. For one thing, the relationships between LINQ and interfaces like ICollection<T> are not documented (probably because they are tenuous). If you were doing something “weird”, LINQ operations might not behave as you expect. Furthermore, the primary reason we have so many data containers to choose from is that different data containers are optimized for different operations. As it is currently implemented, the onus is on LINQ itself to know and choose the best implementation for each LINQ operation. Of course, if the data container in question did not come “in the box”, this is not merely difficult, but impossible.

I think there should be some way to tell LINQ that your data container knows how to do operation X in the most efficient manner possible, and that LINQ should defer its implementation of X to the container. Perhaps the simplest way to accomplish that would be to organize LINQ operations into some number of logical groups and then define interfaces which, in-turn, define the operations in each group. If a container implements one or more of these interfaces, LINQ will defer its implementation of the corresponding operation(s) to the container. Such an interface might look like this:

namespace System.Linq
{
    interface ICountable<TSource> : IEnumerable<TSource>
    {
        int Count<TSource>(Func<TSource, bool> predicate);
        int Count<TSource>();
    }
}

The implementation for Enumerable.Count could then look like this:

        public static int Count<TSource>(this IEnumerable<TSource> source)
        {
            if (source == null) throw Error.ArgumentNull("source");
            var countable = source as ICountable<TSource>;
            if (countable != null) return countable.Count();

            // business as usual...
        }

A system like this would also allow a data container to “opt-out” of a particular operation by immediately throwing an exception. For example, suppose that you wrapped IEnumerable around some kind of stream. You wouldn’t want someone to try to “Count” that, would you?

Ideally, there would be a way to only partly implement an interface and otherwise fall back to the default LINQ implementation, but it would be hard to avoid a “stack overflow” scenario without making it much more complicated (or perhaps I’m missing something obvious).

0 comments: