The Evolution Of LINQ And Its Impact On The Design Of C#

This article discusses:

C# and LINQ
The evolution of LINQ
SQL querying from code

This article uses the following technologies:
LINQ, C#

Lambda Expressions
Extension Methods
Anonymous Types
Implicitly Typed Local Variables
Object Initializers
Query Expressions

was a huge fan of the Connections series, hosted by James Burke, when it aired on the Discovery Channel. Its basic premise: how seemingly unrelated discoveries influenced other discoveries, which ultimately led to some modern-day convenience. The moral, if you will, is that no advancement is made in isolation. Not surprisingly, the same is true for Language Integrated Query (LINQ).

Figure 1.

The Evolution Of LINQ And Its Impact On The Design Of C#

Figure 1 LINQ Architecture (Click the image for a smaller view)

Figure 1 LINQ Architecture (Click the image for a larger view)

var overdrawnQuery = from account in db.Accounts
where account.Balance < 0
select new { account.Name, account.Address };

When the results of this query are iterated over using foreach, each element returned would consist of a name and address of an account that has a balance less than 0.

sequence<Customer> locals = customers.where(ZipCode == 98112);

Let’s suppose that the example above would be the ideal syntax for a query in C#. What would this query look like in C# 2.0, without any language extensions?

IEnumerable<Customer> locals = EnumerableExtensions.Where(customers,
delegate(Customer c)
{
return c.ZipCode == 98112;
});

This code is frightfully verbose, and worse, it requires significant digging to find the relevant filter (ZipCode == 98112). And this example is simple; imagine how much more unreadable this would be with several filters, projections, and so forth. The root of the verbosity is the syntax required for anonymous methods. In the ideal query, the expression would require nothing but the expression to be evaluated. The compiler would then attempt to infer the context; for example, that ZipCode was really referring to the ZipCode defined on Customer. How to fix this problem? Hardcoding the knowledge of specific operators into the language didn’t sit well with the language design team, so they started looking for an alternate syntax for anonymous methods. They wanted it to be extremely concise, and yet not necessarily require more knowledge than the compiler currently needed for anonymous methods. Ultimately they devised lambda expressions.

Lambda Expressions

Lambda expressions are a language feature that is similar in many ways to anonymous methods. In fact, if lambda expressions had been put into the language first, there would have been no need for anonymous methods. The basic idea is that you can treat code as data. In C# 1.0, it is common to pass strings, integers, reference types, and so on to methods so that the methods can act on those values. Anonymous methods and lambda expressions extend the range of the values to include code blocks. This concept is common in functional programming.

Let’s take the example above and replace the anonymous method with a lambda expression:

IEnumerable<Customer> locals =
EnumerableExtensions.Where(customers, c => c.ZipCode == 91822);

There are several things to notice. For starters, the brevity of the lambda expression can be attributed to a number of factors. First, the delegate keyword isn’t used to introduce the construct. Instead, there is a new operator, =>, which tells the compiler that this isn’t a normal expression. Second, the Customer type is inferred from the usage. In this case, the signature of the Where method looks something like:

public static IEnumerable<T> Where<T>(
IEnumerable<T> items, Func<T, bool> predicate)

Lambda expressions, like anonymous methods, also support variable capture. For example, it’s possible to refer to the parameters or locals of the method that contains the lambda expression within the lambda expression’s body:

public IEnumerable<Customer> LocalCusts(
IEnumerable<Customer> customers, int zipCode)
{
return EnumerableExtensions.Where(customers,
c => c.ZipCode == zipCode);
}

Finally, Lambda expressions support a more verbose syntax that allows you to specify the types explicitly, as well as execute multiple statements. For example:

return EnumerableExtensions.Where(customers,
(Customer c) => { int zip = zipCode; return c.ZipCode == zip; });

The good news is that we’re much closer to the ideal syntax proposed in the original paper, and we were able to get there with a language feature that is generally useful outside of query operators. Let’s take a look at where we are again:

IEnumerable<Customer> locals =
EnumerableExtensions.Where(customers, c => c.ZipCode == 91822);

There is an obvious problem here. Instead of thinking about the operations that can be performed on Customer, the consumer currently has to know about this EnumerableExtensions class. In addition, in the case of multiple operators, the consumer has to invert his thinking to write the correct syntax. For example:

IEnumerable<string> locals =
EnumerableExtensions.Select(
EnumerableExtensions.Where(customers, c => c.ZipCode == 91822),
c => c.Name);

Notice that the Select is the outer method, even though it operates on the result of the Where method. The ideal syntax would look more like the following:

sequence<Customer> locals =
customers.where(ZipCode == 98112).select(Name);

So, would it be possible to move closer to the ideal syntax with another language feature?

Extension Methods

Let’s suppose we were to write the Where method as an extension method instead. The query could then be rewritten as:

IEnumerable<Customer> locals =
customers.Where(c => c.ZipCode == 91822);

public static IEnumerable<T> Where<T>(
this IEnumerable<T> items, Func<T, bool> predicate)

Extension Methods

It’s clear that extension methods help simplify our example query, but are they a generally useful language feature outside of that scenario? It turns out that there are many uses for extension methods. One of the most common will probably be to provide shared interface implementations. For example, suppose you have the following interface:

interface IDog
{
// Barks for 2 seconds
void Bark();
void Bark(int seconds);
}

This interface requires that every implementer write an implementation for both overloads. With the "Orcas" version of C#, the interface could simply be:

interface IDog
{
void Bark(int seconds);
}

An extension method could be added in another class:

static class DogExtensions
{
// Barks for 2 seconds
public static void Bark(this IDog dog)
{
dog.Bark(2);
}
}

Now the implementer of the interface need only implement a single method, but the clients of the interface may freely call either overload.

Close [x]

sequence<string> locals =
customers.where(ZipCode == 98112).select(Name);

With just the language extensions we’ve discussed, lambda expressions and extension methods, this could be rewritten as:

IEnumerable<string> locals =
customers.Where(c => c.ZipCode == 91822).Select(c => c.Name);

Notice that the return type is different for this query—IEnumerable<string> instead of IEnumerable<Customer>. This happens because we are only returning the name of the customer from the select statement

That works really well when the projection is only a single field. However, suppose that instead of just the Name of the customer, we also want to return the customer’s address. The ideal syntax might look like this:

locals = customers.where(ZipCode == 98112).select(Name, Address);

Anonymous Types

If we were to continue using our existing syntax to return the name and address, we’d quickly run into the problem that there is no type that contains only a Name and Address. We could still write this query, however, by introducing that type:

class CustomerTuple
{
public string Name;
public string Address;
public CustomerTuple(string name, string address)
{
this.Name = name;
this.Address = address;
}
}

We could then use that type, here CustomerTuple, to construct the result of our query:

IEnumerable<CustomerTuple> locals =
customers.Where(c => c.ZipCode == 91822)
.Select(c => new CustomerTuple(c.Name, c.Address));

That sure seems like a lot of boilerplate code to project out a subset of the fields. It’s also often unclear what to name such a type. Is CustomerTuple really a good name? What if we had projected out Name and Age instead? That could also be a CustomerTuple. So, the problems are that we have boilerplate code and it doesn’t seem that there are any good names for the types that we create. Plus, there could also be many different types required, and managing them could quickly become a headache.

This is exactly what anonymous types are for. This feature basically allows the creation of structural types without specifying the name. If we rewrite the query above using anonymous types, here’s what it looks like:

locals = customers.Where(c => c.ZipCode == 91822)
.Select(c => new { c.Name, c.Address });

This code implicitly creates a type that has the fields Name and Address:

class
{
public string Name;
public string Address;
}

This type can’t be referenced by name, since it has none. The names of the fields can be explicitly declared in the anonymous type creation. For example, if the field being created is derived from a complicated expression, or the name simply isn’t desirable, it’s possible to change the name:

locals = customers.Where(c => c.ZipCode == 91822)
.Select(c => new { FullName = c.FirstName + “ “ + c.LastName,
HomeAddress = c.Address });

In this case, the type that is generated has fields named FullName and HomeAddress.

This gets us closer to the ideal, but there is a problem. You’ll notice that I strategically omitted the type of locals in any place where I used an anonymous type. Obviously we can’t state the name of anonymous types, so how do we use them?

Implicitly Typed Local Variables

There’s another language feature known as implicitly typed local variables (or var for short) that instructs the compiler to infer the type of a local variable. For example:

var integer = 1;

In this case, integer has the type int. It’s important to understand that this is still strongly typed. In a dynamic language, integer’s type could change later. To illustrate this, the following code does not compile:

var integer = 1;
integer = “hello”;

The C# compiler will report an error on the second line, stating that it can’t implicitly convert a string to an int.

In the case of the query above, we can now write the full assignment as shown here:

var locals =
customers
.Where(c => c.ZipCode == 91822)
.Select(c => new { FullName = c.FirstName + “ “ + c.LastName,
HomeAddress = c.Address });

Implicitly typed locals turn out to be convenient outside of the context of a query. For example, it helps simplify complicated generic instantiations:

var customerListLookup = new Dictionary<string, List<Customer>>();

We’re now in a good place with our query; we’re close to the ideal syntax and we’ve gotten there with general-purpose language features.

Interestingly, we found that as more people worked with this syntax, there was often a need to allow a projection to escape the boundaries of a method. As we saw earlier, this is possible by constructing an object by calling its constructor from within Select. However, what happens if there is no constructor that takes exactly the values you need to set?

Object Initializers

Customer customer = new Customer();
customer.Name = “Roger”;
customer.Address = “1 Wilco Way”;

In this case, there is no constructor of Customer that takes a name and address; however, there are two properties, Name and Address, that can be set once an instance is created. Object initializers allow the same creation with the following syntax:

Customer customer = new Customer()
{ Name = “Roger”, Address = “1 Wilco Way” };

In our earlier CustomerTuple example, we created the CustomerTuple class by calling its constructor. We can achieve the same result via object initializers:

var locals =
customers
.Where(c => c.ZipCode == 91822)
.Select(c =>
new CustomerTuple { Name = c.Name, Address = c.Address });

Notice that object initializers allow the parentheses of the constructor to be omitted. In addition, both fields and settable properties can be assigned within the body of the object initializer.

We now have a succinct syntax for creating queries in C#. However, we also have an extensible way to add new operators (Distinct, OrderBy, Sum, and so on) through extension methods and a distinct set of language features useful in their own right.

The language design team now had several prototypes to get feedback on. So we organized a usability study with many participants who had experience with both C# and SQL. The feedback was almost universally positive, but it was clear there was something missing. In particular, it was difficult for the developers to apply their knowledge of SQL because the syntax we thought was ideal didn’t map very well to their domain expertise.

Query Expressions

The language design team then designed a syntax that is closer to SQL, known as query expressions. For example, a query expression for our example might look like this:

var locals = from c in customers
where c.ZipCode == 91822
select new { FullName = c.FirstName + “ “ +
c.LastName, HomeAddress = c.Address };

Query expressions are built on the language features described above. They are literally syntactically translated into the underlying syntax that we’ve already seen. For example, the query above is translated directly into:

var locals =
customers
.Where(c => c.ZipCode == 91822)
.Select(c => new { FullName = c.FirstName + “ “ + c.LastName,
HomeAddress = c.Address });

var locals = (from c in customers
where c.ZipCode == 91822
select new { FullName = c.FirstName + “ “ +
c.LastName, HomeAddress = c.Address})
.Count();

In this case the query now returns the number of customers who live in the 91822 ZIP Code area.

And with that, we’ve managed to end just about where we started (which I always find rather satisfying). The syntax of the next version of C# evolved over the past few years through several new language features to ultimately arrive very close to the original syntax proposed in the winter of 2004. The addition of query expressions builds on the foundations provided by the other language features in the upcoming version of C# and makes many query scenarios easier to read and understand for developers with a background in SQL.