Home All Groups Group Topic Archive Search About

Are Linq-SQL methods commutative

Author
25 Nov 2007 1:16 PM
Andrus
Are Linq-SQL methods commutative ?
Should the following queries return same or different results ?

var query = query.Skip(n).Take(m);

var query = query.Take(m).Skip(n);

Andrus.

Author
25 Nov 2007 2:29 PM
Nicholas Paldino [.NET/C# MVP]
Andrus,

    Absolutely different results.

    Assuming the ordering is the same on each of them (because Skip and Take
make no sense without ordering, LINQ to SQL will create an order for you,
which irritates me to no end, but that's a separate thread), they will
produce different results.

    Say your query will produce the ordered set {1, 2, 3}.  Let n = 1, m =
2.

    The first query:

var query = query.Skip(n).Take(m);

    Will return the ordered set {2, 3}, while the second query:

var query = query.Take(m).Skip(n);

    Will return the ordered set {2}.

    The reason for this is that in the first query, the Skip method skips
one element, then takes the remaining two, while in the second query, the
first two elements are taken, and then the first one is skipped.


--
          - Nicholas Paldino [.NET/C# MVP]
          - mvp@spam.guard.caspershouse.com


Show quote
"Andrus" <kobrule***@hot.ee> wrote in message
news:%234a%23iY2LIHA.5140@TK2MSFTNGP05.phx.gbl...
> Are Linq-SQL methods commutative ?
> Should the following queries return same or different results ?
>
> var query = query.Skip(n).Take(m);
>
> var query = query.Take(m).Skip(n);
>
> Andrus.
>
Author
26 Nov 2007 9:04 AM
Frans Bouma [C# MVP]
Nicholas Paldino [.NET/C# MVP] wrote:

> Andrus,
>
>    Absolutely different results.

    Now that's unexpected. In database land, you can't get 2 different
results.

> Assuming the ordering is the same on each of them (because Skip and
> Take make no sense without ordering, LINQ to SQL will create an order
> for you, which irritates me to no end, but that's a separate thread),

    Why? SELECT * FROM Table without ordering has no defined ordering, so
using a limit + skip operator on that set results in a set of undefined
rows, you'll never know what rows will be returned.


Show quote
> they will produce different results.
>
>   Say your query will produce the ordered set {1, 2, 3}.  Let n = 1,
> m = 2.
>
>    The first query:
>
> var query = query.Skip(n).Take(m);
>
>    Will return the ordered set {2, 3}, while the second query:
>
> var query = query.Take(m).Skip(n);
>
>    Will return the ordered set {2}.
>
>  The reason for this is that in the first query, the Skip method
> skips one element, then takes the remaining two, while in the second
> query, the first two elements are taken, and then the first one is
> skipped.

    I tried it out to be sure, and indeed Linq to Sql generates two
different queries (which really hurt my eyes but that's another story).
The thing is though that on databases, people will use the take/skip
combination to page through a bigger resultset. However consulting the
manual for both shows no word whatsoever about the order in which these
two statements have to be specified.

    I do understand the order, but it's a bit strange as well. For
example, Linq apparently has no problem with an Order by placed in
front of a where, however there IS an order in take/skip which are used
combined as a paging mechanism. Of course, this follows from the specs
of both, but semantically, the intention of what people want to do,
e.g. to page, shouldn't require an order in the statements for paging,
if other elements in the query also don't really require an order (they
do, but that's whiped under the rug)

        FB
--
------------------------------------------------------------------------
Lead developer of LLBLGen Pro, the productive O/R mapper for .NET
LLBLGen Pro website: http://www.llblgen.com
My .NET blog: http://weblogs.asp.net/fbouma
Microsoft MVP (C#)
------------------------------------------------------------------------
Author
26 Nov 2007 9:14 AM
Jon Skeet [C# MVP]
On Nov 26, 9:04 am, "Frans Bouma [C# MVP]"
<perseus.usenetNOS***@xs4all.nl> wrote:

<snip>

>         I tried it out to be sure, and indeed Linq to Sql generates two
> different queries (which really hurt my eyes but that's another story).
> The thing is though that on databases, people will use the take/skip
> combination to page through a bigger resultset. However consulting the
> manual for both shows no word whatsoever about the order in which these
> two statements have to be specified.

I don't see why there'd be any explicit *need* for documentation on
this - it seems pretty obvious to me. If you want to skip some results
and then take a certain number of the rest, you do Skip and then Take.
If you want to take some results and then skip within that (which is a
slightly odd thing to do, admittedly) you do Skip and then Take.

How is it not just intuitive?

>         I do understand the order, but it's a bit strange as well. For
> example, Linq apparently has no problem with an Order by placed in
> front of a where, however there IS an order in take/skip which are used
> combined as a paging mechanism.

Filtering and ordering are effectively orthogonal, unless you can
build a filter which depends on the order. Skipping and taking are
clearly *not* orthogonal.

> Of course, this follows from the specs
> of both, but semantically, the intention of what people want to do,
> e.g. to page, shouldn't require an order in the statements for paging,
> if other elements in the query also don't really require an order (they
> do, but that's whiped under the rug)

Elements in the query which interact are affected by the order in
which they're specified.
Elements in the query which don't interact aren't affected by the
order in which they're specified.

That seems pretty sensible to me.

Jon
Author
26 Nov 2007 11:35 AM
Andrus
> How is it not just intuitive?

In PostgreSQL order of limit and offset clauses does not matter.

So when moving from PostgreSQL direct SQL to Linq-PostgreSQL this is not
intuitive.

The major issue is that I have no idea how to implement orthogonal Skip()
and Take() effectively (so that minimal possible number of rows are returned
from server) in
Linq-PostgreSQL driver.
In standard SQL Postgres uses LIMIT and OFFSET are orthogonal and there is
no row number function.
http://www.postgresql.org/docs/8.2/interactive/queries-limit.html

Andrus.
Author
26 Nov 2007 11:49 AM
Jon Skeet [C# MVP]
On Nov 26, 11:35 am, "Andrus" <kobrule***@hot.ee> wrote:
> > How is it not just intuitive?
>
> In PostgreSQL order of limit and offset clauses does not matter.

But they're not limit and offset - they're "take the next N results"
and "skip the next N results".

> So when moving from PostgreSQL direct SQL to Linq-PostgreSQL this is not
> intuitive.

Did you read the description of the Skip and Take methods before using
them? If you just assume they mean "offset" and "limit" then you'll
get the wrong results, but if you read the description I believe it
*is* intuitive to use them.

> The major issue is that I have no idea how to implement orthogonal Skip()
> and Take() effectively (so that minimal possible number of rows are returned
> from server) in
> Linq-PostgreSQL driver.
> In standard SQL Postgres uses LIMIT and OFFSET are orthogonal and there is
> no row number function.http://www.postgresql.org/docs/8.2/interactive/queries-limit.html

You can't and shouldn't implement Skip and Take orthogonally - they're
*meant* to interact. The developer should specify them in the
appropriate order, which is almost always going to be Skip and then
Take.

Skip X, Take Y should translate to
OFFSET X LIMIT Y

Take Y, Skip X should translate to
OFFSET X LIMIT min(Y-X, 0)

What are you trying to achieve which is causing you difficulties?

Jon
Author
27 Nov 2007 10:15 AM
Frans Bouma [C# MVP]
Jon Skeet [C# MVP] wrote:

> On Nov 26, 11:35 am, "Andrus" <kobrule***@hot.ee> wrote:
> > > How is it not just intuitive?
> >
> > In PostgreSQL order of limit and offset clauses does not matter.
>
> But they're not limit and offset - they're "take the next N results"
> and "skip the next N results".

    Look at it from the other side, you then might understand it.

    Developer D wants to write a SELECT statement with LIMIT n and OFFSET
m. He has to use Linq. That's the problem area the developer is in: how
is a given Linq query translated to SQL? The developer knows what SQL
statement he wants to get, he just wants to write the Linq query to get
that SQL statement.

    However, it can be the developer makes a mistake because LIMIT/OFFSET
have no order in the SELECT, though Take/Skip do.

    According to the description of Take and Skip individually, it's
logical.

    However they're part of a bigger query. 'Take' for example results in
TOP on sqlserver, and if Skip is present, it can be translated to a
paging query with a CTE. Therefore, it's not irrelevant to see the
query as a whole with skip/take being part of that big query, instead
of seeing them as separate elements because that's not going to work
out.

    THe problem is that the Linq query isn't translatable to SQL in a 1:1
basis. Therefore the intention of what was meant has to be understood
before the SQL can be produced. As:
set.Skip(n).Take(m) is equal to set.Take(n+m).Skip(n), it doesn't
matter in theory which order is used for skip/take, both orders are
usable to fetch the same set of rows.

    Jon, you can keep living at the level of Linq and dream that
everything below it is just solved by default, but that's not the case.
People want to express something in the Linq query in such a way that
the Linq query is translated to SQL which expresses that same thing
EXACTLY how they have foreseen it.

    Some people start at the Linq level and expect the provider to cough
up the objects as stated in the manual, like you do. That's not always
possible. Some people start at the SQL level and wonder how to write
that query in Linq. That's also not always possible. Both sides have a
big gap to cross and by stating what's in the spec or what's in MSDN
isn't going to help either side one bit.

> > So when moving from PostgreSQL direct SQL to Linq-PostgreSQL this
> > is not intuitive.
>
> Did you read the description of the Skip and Take methods before using
> them? If you just assume they mean "offset" and "limit" then you'll
> get the wrong results, but if you read the description I believe it
> *is* intuitive to use them.

    Take IS going to be translated to LIMIT, if Skip isn't there.

    So how is specifying Skip all of a sudden not making Take being
translated to LIMIT ?

Show quote
> > The major issue is that I have no idea how to implement orthogonal
> > Skip() and Take() effectively (so that minimal possible number of
> > rows are returned from server) in
> > Linq-PostgreSQL driver.
> > In standard SQL Postgres uses LIMIT and OFFSET are orthogonal and
> > there is no row number
> > function.http://www.postgresql.org/docs/8.2/interactive/queries-limi
> > t.html
>
> You can't and shouldn't implement Skip and Take orthogonally - they're
> *meant* to interact. The developer should specify them in the
> appropriate order, which is almost always going to be Skip and then
> Take.
>
> Skip X, Take Y should translate to
> OFFSET X LIMIT Y
>
> Take Y, Skip X should translate to
> OFFSET X LIMIT min(Y-X, 0)

    It depends on how difficult you want to make it for yourself, you can
specify skip after take by specifying different values for skip/take
(see above) and it should result in the same query as you're requesting
the same resultset in that case.

        FB

--
------------------------------------------------------------------------
Lead developer of LLBLGen Pro, the productive O/R mapper for .NET
LLBLGen Pro website: http://www.llblgen.com
My .NET blog: http://weblogs.asp.net/fbouma
Microsoft MVP (C#)
------------------------------------------------------------------------
Author
27 Nov 2007 10:45 AM
Jon Skeet [C# MVP]
On Nov 27, 10:15 am, "Frans Bouma [C# MVP]"
<perseus.usenetNOS***@xs4all.nl> wrote:

<snip>

>         Some people start at the Linq level and expect the provider to cough
> up the objects as stated in the manual, like you do. That's not always
> possible. Some people start at the SQL level and wonder how to write
> that query in Linq. That's also not always possible. Both sides have a
> big gap to cross and by stating what's in the spec or what's in MSDN
> isn't going to help either side one bit.

If you really think that explaining why things work the way they do
isn't going to help anyone, then I think we disagree on so much, so
fundamentally, that I don't think there's much point in us debating
things.

Personally, I think that explaining to "SQL heads" the way that LINQ
is designed to work - including referring to the documentation - will
help them write better LINQ queries. Just like explaining to "LINQ
heads" how SQL works will also help those people write better SQL, and
indeed will help them understand the SQL generated by their LINQ
queries better.

I will continue to suggest to people who are using LINQ that they
should actually learn about LINQ and its design rather than just
assuming it'll work like SQL. I think that will make them a lot more
productive. However, I think I'll give up on trying to persuade you of
that notion.

Jon
Author
27 Nov 2007 3:31 PM
Frans Bouma [C# MVP]
Jon Skeet [C# MVP] wrote:

Show quote
> On Nov 27, 10:15 am, "Frans Bouma [C# MVP]"
> <perseus.usenetNOS***@xs4all.nl> wrote:
>
> <snip>
>
> >         Some people start at the Linq level and expect the provider
> > to cough up the objects as stated in the manual, like you do.
> > That's not always possible. Some people start at the SQL level and
> > wonder how to write that query in Linq. That's also not always
> > possible. Both sides have a big gap to cross and by stating what's
> > in the spec or what's in MSDN isn't going to help either side one
> > bit.
>
> If you really think that explaining why things work the way they do
> isn't going to help anyone, then I think we disagree on so much, so
> fundamentally, that I don't think there's much point in us debating
> things.

    It will answer some questions perhaps, but as there's no correlation
between the nature of how linq is based on sequences and how the
corresponding SQL query looks like, it will IMHO likely only increase
confusion instead of solving it.

    Just take a look at the initial question of this thread. The root
cause of that question is how it's done in SQL. Of course, operating on
the Linq level makes you THINK you can forget all about the SQL, but
that's just a facade.

    My point is that if you explain just the Linq side of things, people
will never be able to bridge over to the SQL they want to produce with
the Linq query they have in mind.

> Personally, I think that explaining to "SQL heads" the way that LINQ
> is designed to work - including referring to the documentation - will
> help them write better LINQ queries. Just like explaining to "LINQ
> heads" how SQL works will also help those people write better SQL, and
> indeed will help them understand the SQL generated by their LINQ
> queries better.

    aren't you forgetting one thing, namely how linq element X is
resulting in SQL element Y and vice versa, how you can achieve SQL
element Y by writing Linq element X ?

    IMHO the thing that will help is what to write to achieve ABC in SQL.
I.e. to get a subquery instead of a join, to get a join instead of a
subquery, to get a scalar query in a derived table instead of in the
projection (which is MUCH slower) etc.

> I will continue to suggest to people who are using LINQ that they
> should actually learn about LINQ and its design rather than just
> assuming it'll work like SQL. I think that will make them a lot more
> productive. However, I think I'll give up on trying to persuade you of
> that notion.

    I'm not saying it should work like SQL... :/ I'm saying that it's
unclear what SQL it will produce so it's not determinable when writing
the Linq query what it will return, at least not always, because
there's a projection involved towards SQL.

    Sure, the SQL might work in a lot of cases, but is it the SQL which is
determined to be more efficient? Unclear, you have to examine every
linq query's SQL statements.

    Then tell me, what's the advantage over specifying some queries in
text in an ibatis.net config file?

        FB

--
------------------------------------------------------------------------
Lead developer of LLBLGen Pro, the productive O/R mapper for .NET
LLBLGen Pro website: http://www.llblgen.com
My .NET blog: http://weblogs.asp.net/fbouma
Microsoft MVP (C#)
------------------------------------------------------------------------
Author
27 Nov 2007 9:33 AM
Frans Bouma [C# MVP]
Jon Skeet [C# MVP] wrote:

Show quote
> On Nov 26, 9:04 am, "Frans Bouma [C# MVP]"
> <perseus.usenetNOS***@xs4all.nl> wrote:
>
> <snip>
>
> >         I tried it out to be sure, and indeed Linq to Sql generates
> > two different queries (which really hurt my eyes but that's another
> > story).  The thing is though that on databases, people will use the
> > take/skip combination to page through a bigger resultset. However
> > consulting the manual for both shows no word whatsoever about the
> > order in which these two statements have to be specified.
>
> I don't see why there'd be any explicit need for documentation on
> this - it seems pretty obvious to me. If you want to skip some results
> and then take a certain number of the rest, you do Skip and then Take.
> If you want to take some results and then skip within that (which is a
> slightly odd thing to do, admittedly) you do Skip and then Take.
>
> How is it not just intuitive?

    Because skip/take equivalents in SQL work on the same projection
result, while in Linq they don't.

    You should look less at specs and more at what users WANT TO DO. Time
and time again you post a reply with how the syntaxis is, how the specs
says how it works etc. etc. All fine, but that's completely not hte
point here: the point is that the user wants to PAGE through a
resultset. In Firebird, Mysql, Postgresql, people have 2 keywords in
the SELECT statement to do that, similar to skip/take, but the order
isn't important, because both work at the same projection result. THe
intention of the USER is to page. Messing the order up has no effect at
compiletime, but at runtime it does.

    How do you PREVENT developers to make this error? Exactly, by
specifying a pagenumber and a page size instead of skip/take OR by
specifying that the order doesn't matter.

> >         I do understand the order, but it's a bit strange as well.
> > For example, Linq apparently has no problem with an Order by placed
> > in front of a where, however there IS an order in take/skip which
> > are used combined as a paging mechanism.
>
> Filtering and ordering are effectively orthogonal, unless you can
> build a filter which depends on the order. Skipping and taking are
> clearly not orthogonal.

    In SQL, order by is done on a projected resultset. In Linq, this isn't
always clear, because you can place an order by in a new projection
scope, and you don't know IF that results in a subquery (projection
scope) or not. My point was that MS should have made the same
limitation to linq as ISO did to SQL: Order By is the last statement in
the query. While order is important for extension methods like
skip/take specified directly, it's not for order by apparently.

> > Of course, this follows from the specs
> > of both, but semantically, the intention of what people want to do,
> > e.g. to page, shouldn't require an order in the statements for
> > paging, if other elements in the query also don't really require an
> > order (they do, but that's whiped under the rug)
>
> Elements in the query which interact are affected by the order in
> which they're specified.
> Elements in the query which don't interact aren't affected by the
> order in which they're specified.
>
> That seems pretty sensible to me.

    Yes, because you don't understand what the word 'intention' means when
a developer writes code. You don't seem to grasp the little fact that a
developer who makes a mistake by writing ACB, while the intention was
ABC, should be WARNED with an error or warning that ACB isn't going to
work probably. Like I said: I understand the ordering of skip/take in
the context of the situation where you first want to take and then want
to skip (however I can't find a use case for that). What I don't
understand is that 'skip' and 'take' aren't placed TOGETHER in such a
way that there's no way to make a mistake when one tries to PAGE
through a resultset.

    You seem to find it 'sensible' that people can do that. That's what I
find odd. But I disgress, according to the syntax/specs everything
seems logical, because it is in the spec, however precisely that isn't
the point at all here.

        FB


--
------------------------------------------------------------------------
Lead developer of LLBLGen Pro, the productive O/R mapper for .NET
LLBLGen Pro website: http://www.llblgen.com
My .NET blog: http://weblogs.asp.net/fbouma
Microsoft MVP (C#)
------------------------------------------------------------------------
Author
27 Nov 2007 10:27 AM
Jon Skeet [C# MVP]
On Nov 27, 9:33 am, "Frans Bouma [C# MVP]"
<perseus.usenetNOS***@xs4all.nl> wrote:
> > I don't see why there'd be any explicit need for documentation on
> > this - it seems pretty obvious to me. If you want to skip some results
> > and then take a certain number of the rest, you do Skip and then Take.
> > If you want to take some results and then skip within that (which is a
> > slightly odd thing to do, admittedly) you do Skip and then Take.
>
> > How is it not just intuitive?
>
>         Because skip/take equivalents in SQL work on the same projection
> result, while in Linq they don't.

No - in LINQ each step effectively works off the result of the
previous step. That's one of the fundamentals of LINQ, and people
should know it before they write too much code which assumes that
LINQ=SQL.

>         You should look less at specs and more at what users WANT TO DO. Time
> and time again you post a reply with how the syntaxis is, how the specs
> says how it works etc. etc. All fine, but that's completely not hte
> point here: the point is that the user wants to PAGE through a
> resultset.

Okay, so create a Page extension method which does a Skip then a Take.
It's really easy to do - and I could certainly agree that it would
have made logical sense to include that in .NET 3.5. However, it's
*also* nice to be able to have those two operations separately.

> In Firebird, Mysql, Postgresql, people have 2 keywords in
> the SELECT statement to do that, similar to skip/take, but the order
> isn't important, because both work at the same projection result. THe
> intention of the USER is to page. Messing the order up has no effect at
> compiletime, but at runtime it does.

Indeed.

>         How do you PREVENT developers to make this error? Exactly, by
> specifying a pagenumber and a page size instead of skip/take OR by
> specifying that the order doesn't matter.

So write that method, and use that everywhere. Create an FXCop rule
suggesting that Skip/Take should be avoided. Just don't take Skip/Take
away from people who want them :)

> > Filtering and ordering are effectively orthogonal, unless you can
> > build a filter which depends on the order. Skipping and taking are
> > clearly not orthogonal.
>
>         In SQL, order by is done on a projected resultset. In Linq, this isn't
> always clear, because you can place an order by in a new projection
> scope, and you don't know IF that results in a subquery (projection
> scope) or not. My point was that MS should have made the same
> limitation to linq as ISO did to SQL: Order By is the last statement in
> the query. While order is important for extension methods like
> skip/take specified directly, it's not for order by apparently.

There's no logical reason in LINQ *in general* to force ordering to be
the last operation. (In fact, it never is - you always want to either
group or select at the end.)

You always come at LINQ as if it should be the same as SQL - LINQ is
more than SQL, and if that occasionally makes the SQL side less than
ideal, that's a hit I'm certainly willing to take.

> > That seems pretty sensible to me.
>
>         Yes, because you don't understand what the word 'intention' means when
> a developer writes code. You don't seem to grasp the little fact that a
> developer who makes a mistake by writing ACB, while the intention was
> ABC, should be WARNED with an error or warning that ACB isn't going to
> work probably.

I think it would be nice if LINQ to SQL barfed at runtime rather than
executing the probably-incorrect query, but I don't think it should be
a compile-time error: that would require the compiler to know *way*
too much about what's going on.

Any time you fancy dispensing with the ad hominem attacks, that would
be fine too, by the way.

> Like I said: I understand the ordering of skip/take in
> the context of the situation where you first want to take and then want
> to skip (however I can't find a use case for that). What I don't
> understand is that 'skip' and 'take' aren't placed TOGETHER in such a
> way that there's no way to make a mistake when one tries to PAGE
> through a resultset.

Again, do it yourself and publish it to the world. It's about three
lines of code.

>         You seem to find it 'sensible' that people can do that. That's what I
> find odd. But I disgress, according to the syntax/specs everything
> seems logical, because it is in the spec, however precisely that isn't
> the point at all here.

I think we're coming back to our fundamental disagreement: I don't
think people should expect to be able to use LINQ without making
mistakes if they don't take a little bit of time to learn about it
first. I don't expect that about *any* technology - why should LINQ be
different?

Taking just a short time to understand the overall nature of LINQ
(rather than just focusing on LINQ to SQL) makes the behaviour of Skip/
Take obvious *and* enriches the developer's understanding of the
bigger picture, so they can decide where LINQ is appropriate *aside*
from LINQ to SQL.

Jon
Author
27 Nov 2007 12:06 PM
Lasse_Vågsæther_Karlsen
Jon Skeet [C# MVP] wrote:
> On Nov 27, 9:33 am, "Frans Bouma [C# MVP]"
<snip>
>> In Firebird, Mysql, Postgresql, people have 2 keywords in
>> the SELECT statement to do that, similar to skip/take, but the order
>> isn't important, because both work at the same projection result. THe
>> intention of the USER is to page. Messing the order up has no effect at
>> compiletime, but at runtime it does.

Actually, the order of the operations involved does matter, it's just
that the SQL syntax allows both ways and then internally reorders the
operation to perform one before the other.

The postgresql link posted earlier explicity states this:
"If both OFFSET and LIMIT appear, then OFFSET rows are skipped before
starting to count the LIMIT rows that are returned."

And in LINQ, you do not write one query, you have syntax that allows you
to chain together individual operations that, as Jon pointed out, works
sequentially on the previous results. SQL, however, is generally seen as
one big statement and not a chain of individual operations, although
internally there is some sequencing involved.

As such, you can argue that OFFSET/LIMIT is basically two keywords that
specify one operation, the Page operation suggested.

--
Lasse Vågsæther Karlsen
mailto:la***@vkarlsen.no
http://presentationmode.blogspot.com/
Author
27 Nov 2007 3:01 PM
Frans Bouma [C# MVP]
Lasse Vågsæther Karlsen wrote:

Show quote
> Jon Skeet [C# MVP] wrote:
> > On Nov 27, 9:33 am, "Frans Bouma [C# MVP]"
> <snip>
> > > In Firebird, Mysql, Postgresql, people have 2 keywords in
> > > the SELECT statement to do that, similar to skip/take, but the
> > > order isn't important, because both work at the same projection
> > > result. THe intention of the USER is to page. Messing the order
> > > up has no effect at compiletime, but at runtime it does.
>
> Actually, the order of the operations involved does matter, it's just
> that the SQL syntax allows both ways and then internally reorders the
> operation to perform one before the other.
>
> The postgresql link posted earlier explicity states this:
> "If both OFFSET and LIMIT appear, then OFFSET rows are skipped before
> starting to count the LIMIT rows that are returned."

    true, as it otherwise wouldn't make any sense as there's just 1
resultset the operators are applied on.

> And in LINQ, you do not write one query, you have syntax that allows
> you to chain together individual operations that, as Jon pointed out,
> works sequentially on the previous results. SQL, however, is
> generally seen as one big statement and not a chain of individual
> operations, although internally there is some sequencing involved.
> As such, you can argue that OFFSET/LIMIT is basically two keywords
> that specify one operation, the Page operation suggested.

    Yes, and as skip/take are used in that exact same fashion to achieve
exactly that same thing, you can't see them as separate elements IMHO.
The thing is that you could argue that in linq it's all about
sequences, but that's just for linq to objects. For linq to <database
provider> it's different: as a projection of what linq describes is
made onto SQL, and as that projection isn't a 1:1 projection, there's a
lot of room for interpretating things the wrong way, or assuming things
will turn out differently, simply, because the sequence-usage of linq
is not present in an RDBMS in such a way that it's similar to linq.

    This makes it so hard to talk about linq queries which are meant for
running onto a DB, simply because although it's perfectly described
what the linq query is SUPPOSED to be returning, it's not said it is
possible to do so.

        FB

--
------------------------------------------------------------------------
Lead developer of LLBLGen Pro, the productive O/R mapper for .NET
LLBLGen Pro website: http://www.llblgen.com
My .NET blog: http://weblogs.asp.net/fbouma
Microsoft MVP (C#)
------------------------------------------------------------------------
Author
27 Nov 2007 3:21 PM
Frans Bouma [C# MVP]
Jon Skeet [C# MVP] wrote:

Show quote
> On Nov 27, 9:33 am, "Frans Bouma [C# MVP]"
> <perseus.usenetNOS***@xs4all.nl> wrote:
> > > I don't see why there'd be any explicit need for documentation on
> > > this - it seems pretty obvious to me. If you want to skip some
> > > results and then take a certain number of the rest, you do Skip
> > > and then Take.  If you want to take some results and then skip
> > > within that (which is a slightly odd thing to do, admittedly) you
> > > do Skip and then Take.
> >
> > > How is it not just intuitive?
> >
> >         Because skip/take equivalents in SQL work on the same
> > projection result, while in Linq they don't.
>
> No - in LINQ each step effectively works off the result of the
> previous step. That's one of the fundamentals of LINQ, and people
> should know it before they write too much code which assumes that
> LINQ=SQL.

    With that knowledge they can't do a single thing if they are writing a
linq query which has to meet a SQL query they have in mind.

> >         You should look less at specs and more at what users WANT
> > TO DO. Time and time again you post a reply with how the syntaxis
> > is, how the specs says how it works etc. etc. All fine, but that's
> > completely not hte point here: the point is that the user wants to
> > PAGE through a resultset.
>
> Okay, so create a Page extension method which does a Skip then a Take.
> It's really easy to do - and I could certainly agree that it would
> have made logical sense to include that in .NET 3.5. However, it's
> *also* nice to be able to have those two operations separately.

    You mean to be able to just skip or just take? Sure. The sad thing is
that when they're used together in the same projection scope, they are
resulting in a different SQL query than used separately. IMHO that
sounds like they should have been tied together in a single statement.

> >         How do you PREVENT developers to make this error? Exactly,
> > by specifying a pagenumber and a page size instead of skip/take OR
> > by specifying that the order doesn't matter.
>
> So write that method, and use that everywhere. Create an FXCop rule
> suggesting that Skip/Take should be avoided. Just don't take Skip/Take
> away from people who want them :)

    I'm not saying they should be dropped :). I suggest that if they are
used together in a query, people should not be bothered with an order,
OR, if they are used in the wrong order, they should be warned. Similar
to use ThenBy first, which is also not correct.

    As you can fetch any row set with skip first following by take, they
should have forced it to have that order to avoid problems.

Show quote
> > > Filtering and ordering are effectively orthogonal, unless you can
> > > build a filter which depends on the order. Skipping and taking are
> > > clearly not orthogonal.
> >
> >         In SQL, order by is done on a projected resultset. In Linq,
> > this isn't always clear, because you can place an order by in a new
> > projection scope, and you don't know IF that results in a subquery
> > (projection scope) or not. My point was that MS should have made
> > the same limitation to linq as ISO did to SQL: Order By is the last
> > statement in the query. While order is important for extension
> > methods like skip/take specified directly, it's not for order by
> > apparently.
>
> There's no logical reason in LINQ *in general* to force ordering to be
> the last operation. (In fact, it never is - you always want to either
> group or select at the end.)

    that's what the spec says. Ordering is always done after projection,
as otherwise there's no way you CAN order, you can only order
resultsets. Check Warren's blog, I believe it's the last entry where he
introduces a complex piece of code to move up a wrongly placed order by
statement.

> You always come at LINQ as if it should be the same as SQL - LINQ is
> more than SQL, and if that occasionally makes the SQL side less than
> ideal, that's a hit I'm certainly willing to take.

    It might look that way, but I'm just hammering on the essential
element that when using Linq for DATABASE (!) queries, you have to
realize that the target is a very expensive resource which can be
brought down to its knees with the wrong queries, and this isn't
obvious.

    When I started LLBLGen Pro I too found it logical that people would
simply learn our query api and be done with it. However we quickly
learned that people often think from the other side: they have a stored
proc which has to be migrated to C# code (and in this context thus to a
linq query) or are simply thinking in SQL as that's what they know and
are wrestling with the query api because it doesn't give them the
freedom a SQL statement gives them: the SQL statement is the one
executed on the DB, a linq query is an abstraction layer above that, so
to reach what you want, namely the SQL query, you have to know how that
linq query translates to that SQL query so if you have that SQL query
in your head, you can write the Linq query :)

    Trust me, in the newsgroups here we'll see a lot of people having a
SQL statement which they want to convert to Linq queries and have a
hard time.

    That's why I find it a bit of a dissapointment that Linq lacks the
fine grained control elements to make that conversion so much more
easier. Because that's what people want: they want to write a Linq
query that mimics what they have in mind in SQL.

> > > That seems pretty sensible to me.
> >
> >         Yes, because you don't understand what the word 'intention'
> > means when a developer writes code. You don't seem to grasp the
> > little fact that a developer who makes a mistake by writing ACB,
> > while the intention was ABC, should be WARNED with an error or
> > warning that ACB isn't going to work probably.
>
> I think it would be nice if LINQ to SQL barfed at runtime rather than
> executing the probably-incorrect query, but I don't think it should be
> a compile-time error: that would require the compiler to know way
> too much about what's going on.

    Don't you agree that an exception at runtime is simply too late?
What's the use of a runtime exception, when code analysis could have
shown a warning up front. ?

> >         You seem to find it 'sensible' that people can do that.
> > That's what I find odd. But I disgress, according to the
> > syntax/specs everything seems logical, because it is in the spec,
> > however precisely that isn't the point at all here.
>
> I think we're coming back to our fundamental disagreement: I don't
> think people should expect to be able to use LINQ without making
> mistakes if they don't take a little bit of time to learn about it
> first. I don't expect that about any technology - why should LINQ be
> different?

    Because it would help a lot of people out, that's why. :) I was that
stubborn with our own API as well: why do they need sql examples, if
it's understandable by reading the specs/documentation? But keeping the
eyes closed for that doesn't make the problem go away, only the people
will go away and be dissapointed in the o/r mapping technology.

    There's a lot of energy to be spend on converting people over to use
Linq instead of stored procedures or inline SQL strings. If it's very
hard to do the more complex SQL statements, it's going to be an uphill
battle.

> Taking just a short time to understand the overall nature of LINQ
> (rather than just focusing on LINQ to SQL) makes the behaviour of
> Skip/ Take obvious and enriches the developer's understanding of the
> bigger picture, so they can decide where LINQ is appropriate aside
> from LINQ to SQL.

    Ok, here's one lesson for you: although Linq is a general purpose
technique, it doesn't matter how it works for linq to objects etc., if
you are going to execute the linq query on a database, as the linq
query isn't the one that's being executed, the generated SQL query from
that linq query is. I.o.w.: it's not as simple as you make it out to
be: living on the level of Linq is nice and if you can afford to ignore
the rest, why bother, but for a lot of people the database they're
using in their app is a resource they can't affort to waste a lot of
time on: queries have to be fast, predictable and therefore scalable.

    It's nice for you that you can determine what the SQL will be when you
are handed a random Linq query, but most of the C# developers won't be
able to. That's the main problem here, which is IMHO completely ignored
by MS.

        FB

--
------------------------------------------------------------------------
Lead developer of LLBLGen Pro, the productive O/R mapper for .NET
LLBLGen Pro website: http://www.llblgen.com
My .NET blog: http://weblogs.asp.net/fbouma
Microsoft MVP (C#)
------------------------------------------------------------------------
Author
26 Nov 2007 4:56 PM
Nicholas Paldino [.NET/C# MVP]
Frans,

    See inline:

>> Andrus,
>>
>>    Absolutely different results.
>
> Now that's unexpected. In database land, you can't get 2 different
> results.

    Not really, because to translate what is being done in LINQ to database
land, you would actually perform a query on the result set of the previous
query.  The operations are not performed at the same time.

>> Assuming the ordering is the same on each of them (because Skip and
>> Take make no sense without ordering, LINQ to SQL will create an order
>> for you, which irritates me to no end, but that's a separate thread),
>
> Why? SELECT * FROM Table without ordering has no defined ordering, so
> using a limit + skip operator on that set results in a set of undefined
> rows, you'll never know what rows will be returned.

    Well, that's not completely true.  If you are using a clustered index in
SQL Server, then the ordering of the result set will be on that index,
however, without that index, the order is undefined (basically, however it
is stored in the file).

    But I agree with you, in database land, it makes no sense, because you
have to define an ordering.  LINQ to SQL gets around this by ordering by all
the columns in the table if another order is not found (I believe, I have to
test that part, but without a specific ordering, it orders on every column
in the table).

    This is why I said what I said in parenthesis.  I hate the fact that
Take and Skip are exposed, and assume some kind of order for you when it
doesn't make sense to not have ordering.  It should be mandatory to provide
an order when using any of these extensions.  It's also why I said "assuming
the ordering is the same on each of them", because LINQ to SQL will always
produce a default ordering.

Show quote
>
>
>> they will produce different results.
>>
>>   Say your query will produce the ordered set {1, 2, 3}.  Let n = 1,
>> m = 2.
>>
>>    The first query:
>>
>> var query = query.Skip(n).Take(m);
>>
>>    Will return the ordered set {2, 3}, while the second query:
>>
>> var query = query.Take(m).Skip(n);
>>
>>    Will return the ordered set {2}.
>>
>>  The reason for this is that in the first query, the Skip method
>> skips one element, then takes the remaining two, while in the second
>> query, the first two elements are taken, and then the first one is
>> skipped.
>
> I tried it out to be sure, and indeed Linq to Sql generates two
> different queries (which really hurt my eyes but that's another story).
> The thing is though that on databases, people will use the take/skip
> combination to page through a bigger resultset. However consulting the
> manual for both shows no word whatsoever about the order in which these
> two statements have to be specified.

    Well, there is no order that they HAVE to be specified in.  Yes, they
are usually used in conjunction, but it is not a requirement that they be
used in conjunction.  They can be specified in any order, which is what the
OP's question was arising from.  He wanted to know if the order in which the
operations were applied would affect the outcome, which it definitely will.

> I do understand the order, but it's a bit strange as well. For
> example, Linq apparently has no problem with an Order by placed in
> front of a where, however there IS an order in take/skip which are used
> combined as a paging mechanism. Of course, this follows from the specs
> of both, but semantically, the intention of what people want to do,
> e.g. to page, shouldn't require an order in the statements for paging,
> if other elements in the query also don't really require an order (they
> do, but that's whiped under the rug)

    Well, it depends here on what operations we are talking about.  You are
making the error of assuming that Take and Skip are both sub parts of a
larger page operation, and assuming that because other elements of the
operation that can be applied (i.e. where) don't require order, these don't.

    But that's wrong.  Like you said, in db-land, not applying an order
results in an undefined order in the result set.  If you are going to page
off that, then the DB is completely within its rights to return a
differently ordered result set between subsequent queries, and your paging
results will simply be wrong.  You would be depending on an implementation
detail that the DB would consistently return the undefined order to you.

    The reason Take and Skip (should) require an order is because of this
fact.  Think of it.  The definition of Take is "take the FIRST n elements".
Skip is defined as "skip the NEXT n elements".  Because of FIRST and NEXT,
you require order.  Without knowing the order, the concepts of FIRST and
NEXT do not exist.


--
          - Nicholas Paldino [.NET/C# MVP]
          - mvp@spam.guard.caspershouse.com

Show quote
>
> FB
> --
> ------------------------------------------------------------------------
> Lead developer of LLBLGen Pro, the productive O/R mapper for .NET
> LLBLGen Pro website: http://www.llblgen.com
> My .NET blog: http://weblogs.asp.net/fbouma
> Microsoft MVP (C#)
> ------------------------------------------------------------------------
Author
27 Nov 2007 10:03 AM
Frans Bouma [C# MVP]
Nicholas Paldino [.NET/C# MVP] wrote:

Show quote
> Frans,
>
>    See inline:
>
> > > Andrus,
> > >
> >>   Absolutely different results.
> >
> > Now that's unexpected. In database land, you can't get 2 different
> > results.
>
>  Not really, because to translate what is being done in LINQ to
> database land, you would actually perform a query on the result set
> of the previous query.  The operations are not performed at the same
> time.

    That's not defined. There's no ruleset how to translate a Linq query
to SQL, on the contrary, it's often a struggle how to completely grasp
the intention of the linq query with SQL.

    So, in the case of take->skip, true, you need multiple projections and
in the case of skip -> take, you need less, perhaps none. However, in
the case of a select statement which specifies BOTH at once, you can't
get 2 different resultsets by specifying them in a different order, see
Firebird's and PostgreSql's SELECT syntax for details.
Firebird:
SELECT FIRST m SKIP n * FROM ...

PostgreSQL:
SELECT * FROM ... LIMIT m OFFSET n

MySql actually does have an order, but it has just 1 clear intention:
you skip n rows and take m. You can swap them, but that doesn't matter
for the statement, you can't specify take m, skip n from that m

MySQL:
SELECT * FROM ... LIMIT n, m


    Oh yes, I know... this is SQL and not linq. Let's talk about that for
a second. A developer writing a linq query which is executed on the
database has to understand the query WILL become SQL. The translation
is far from trivial, so the intention of the linq query has to be
represented by the SQL query as much as possible.

    The developer isn't helped by fuzzy linq syntaxis, as predicting what
SQL is going to be generated is then even more difficult than it is
now. Take is used for TOP if Skip isn't there.

    Is take used for TOP IF skip is there?

    No, it's not. Well, not always. If Skip is there and Take is there, it
should be translated into a paging query (paging is skipping rows and
taking the top n rows from what's left). If paging is performed 'TOP'
makes no sense.

    So if Take is specified first, and then Skip, the query has to use a
temp table to store the taken rows and then skip the first n rows from
that. Otherwise it can't always be done: TOP can't always be added to a
subquery: if you sort on a column not in the subquery, you can run into
errors if TOP / DISTINCT is used in the subquery.

> > > Assuming the ordering is the same on each of them (because Skip
> > > and Take make no sense without ordering, LINQ to SQL will create
> > > an order for you, which irritates me to no end, but that's a
> > > separate thread),
> >
> > Why? SELECT * FROM Table without ordering has no defined ordering,
> > so using a limit + skip operator on that set results in a set of
> > undefined rows, you'll never know what rows will be returned.
>
> Well, that's not completely true.  If you are using a clustered index
> in SQL Server, then the ordering of the result set will be on that
> index, however, without that index, the order is undefined
> (basically, however it is stored in the file).

    That's just luck, if I recall correctly, but I have to check BOL for
semantics on this. The SQL standard (not obeyed by anyone in full,
admitted) describes this explicitly. I don't like to hammer on standard
specs that much, but this one thing is essential to understand so
people don't run into problems with unexpected results.

    The RDBMS can for example decide to return a part of the rowset before
another part because it has that one already in memory.

> But I agree with you, in database land, it makes no sense, because
> you have to define an ordering.  LINQ to SQL gets around this by
> ordering by all the columns in the table if another order is not
> found (I believe, I have to test that part, but without a specific
> ordering, it orders on every column in the table).

    I think it orders on every column in the projection. If it would order
on every column in the source of the projection, it can run into issues
with DISTINCT where not every column of the source is in the projection.

Show quote
> > > they will produce different results.
> > >
> >>  Say your query will produce the ordered set {1, 2, 3}.  Let n = 1,
> > > m = 2.
> > >
> >>   The first query:
> > >
> > > var query = query.Skip(n).Take(m);
> > >
> >>   Will return the ordered set {2, 3}, while the second query:
> > >
> > > var query = query.Take(m).Skip(n);
> > >
> >>   Will return the ordered set {2}.
> > >
> >> The reason for this is that in the first query, the Skip method
> > > skips one element, then takes the remaining two, while in the
> > > second query, the first two elements are taken, and then the
> > > first one is skipped.
> >
> > I tried it out to be sure, and indeed Linq to Sql generates two
> > different queries (which really hurt my eyes but that's another
> > story).  The thing is though that on databases, people will use the
> > take/skip combination to page through a bigger resultset. However
> > consulting the manual for both shows no word whatsoever about the
> > order in which these two statements have to be specified.
>
> Well, there is no order that they HAVE to be specified in.  Yes, they
> are usually used in conjunction, but it is not a requirement that
> they be used in conjunction.  They can be specified in any order,
> which is what the OP's question was arising from.  He wanted to know
> if the order in which the operations were applied would affect the
> outcome, which it definitely will.

    Though the intention is to page, i.e. first skip, then take. When
would you first take and then skip? With Skip(n) and Take(m), isn't:
set.Skip(n).Take(m) the same as set.Take(m+n).Skip(n) ?

    the net result is that you get a set of rows which is limited to a
size x from a potentially bigger set with size y. In other words:
specifying Take first and then Skip has no real value over specifying
Skip first and then Take. Mind you: the intention is to get a resultset
which has a fixed size and is a subset from a bigger set.

    As some databases offer you this in the SELECT statement, one could
argue (and there are no rules for this, so why not!) that Skip and
Take, IF specified both in the same projection scope, are translated
into the parameters for the LIMIT/OFFSET etc. keywords of the SELECT
statement of the projection scope they're specified on.

Show quote
> > I do understand the order, but it's a bit strange as well. For
> > example, Linq apparently has no problem with an Order by placed in
> > front of a where, however there IS an order in take/skip which are
> > used combined as a paging mechanism. Of course, this follows from
> > the specs of both, but semantically, the intention of what people
> > want to do, e.g. to page, shouldn't require an order in the
> > statements for paging, if other elements in the query also don't
> > really require an order (they do, but that's whiped under the rug)
>
> Well, it depends here on what operations we are talking about.  You
> are making the error of assuming that Take and Skip are both sub
> parts of a larger page operation, and assuming that because other
> elements of the operation that can be applied (i.e. where) don't
> require order, these don't.

    Why else would you want to use take and skip in the same projection
scope? Their sole purpose is to limit the resultset to a fixed set and
to specify where to start in the overall resultset for the subset to
return.

> But that's wrong.  Like you said, in db-land, not applying an order
> results in an undefined order in the result set.  If you are going to
> page off that, then the DB is completely within its rights to return
> a differently ordered result set between subsequent queries, and your
> paging results will simply be wrong.  You would be depending on an
> implementation detail that the DB would consistently return the
> undefined order to you.

    'Order' in my remark above was about skip first/take after that, not
about an ORDER BY added to the query by the linq provider. Sorry about
that confusion.

> The reason Take and Skip (should) require an order is because of this
> fact.  Think of it.  The definition of Take is "take the FIRST n
> elements". Skip is defined as "skip the NEXT n elements".  Because of
> FIRST and NEXT, you require order.  Without knowing the order, the
> concepts of FIRST and NEXT do not exist.

    Though I've shown above that you can use Take and Skip in any order to
obtain the same rows. The INTENTION of the developer is to get row n
till n+m from a set of rows of size m+k, agreed? Why else would Take
and Skip show up in the same projection scope.

        FB   

--
------------------------------------------------------------------------
Lead developer of LLBLGen Pro, the productive O/R mapper for .NET
LLBLGen Pro website: http://www.llblgen.com
My .NET blog: http://weblogs.asp.net/fbouma
Microsoft MVP (C#)
------------------------------------------------------------------------
Author
25 Nov 2007 4:37 PM
Jon Skeet [C# MVP]
Andrus <kobrule***@hot.ee> wrote:
> Are Linq-SQL methods commutative ?
> Should the following queries return same or different results ?
>
> var query = query.Skip(n).Take(m);
>
> var query = query.Take(m).Skip(n);

Different results - the first gives results (zero-based) n to n+m-1.

The second gives results 0 to Min(n-1, m-1).

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet   Blog: http://www.msmvps.com/jon.skeet
World class .NET training in the UK: http://iterativetraining.co.uk

AddThis Social Bookmark Button