Search This Blog

Wednesday, December 1, 2010

Lesson: Subquery

A subquery is a SELECT statement inside another statement. 
The main advantages of subqueries are:
  • They allow queries that are structured so that it's possible to isolate each part of a statement.
  • They provide alternative ways to perform operations that would otherwise require complex joins and unions.
  • They are, in many people's opinion, readable. Indeed, it was the innovation of subqueries that gave people the original idea of calling the early SQL ``Structured Query Language.''

The most common use of a subquery is in the form:
non_subquery_operand comparison_operator (subquery)
Where comparison_operator is one of:
=  >  <  >=  <=  <>
For example:
... 'a' = (SELECT column1 FROM t1)
At one time the only legal place for a subquery was on the right side of a comparison, and you might still find some old DBMSs that insist on this.
Here is an example of a common-form subquery comparison that you cannot do with a join. It finds all the values in table t1 that are equal to a maximum value in table t2:
SELECT column1 FROM t1
       WHERE column1 = (SELECT MAX(column2) FROM t2);
Here is another example, which again is impossible with a join because it involves aggregating for one of the tables. It finds all rows in table t1 containing a value that occurs twice:
SELECT * FROM t1
       WHERE 2 = (SELECT COUNT(column1) FROM t1);

Syntax:
operand comparison_operator ANY (subquery)
operand IN (subquery)
operand comparison_operator SOME (subquery)
The ANY keyword, which must follow a comparison operator, means ``return TRUE if the comparison is TRUE for ANY of the rows that the subquery returns.'' For example:
SELECT s1 FROM t1 WHERE s1 > ANY (SELECT s1 FROM t2);
Suppose that there is a row in table t1 containing (10). The expression is TRUE if table t2 contains (21,14,7) because there is a value 7 in t2 that is less than 10. The expression is FALSE if table t2 contains (20,10), or if table t2 is empty. The expression is UNKNOWN if table t2 contains (NULL,NULL,NULL).
The word IN is an alias for = ANY. Thus these two statements are the same:
SELECT s1 FROM t1 WHERE s1 = ANY (SELECT s1 FROM t2);
SELECT s1 FROM t1 WHERE s1 IN    (SELECT s1 FROM t2);
However, NOT IN is not an alias for <> ANY, but for <> ALL. See section 14.1.8.4 Subqueries with ALL.
The word SOME is an alias for ANY. Thus these two statements are the same:
SELECT s1 FROM t1 WHERE s1 <> ANY  (SELECT s1 FROM t2);
SELECT s1 FROM t1 WHERE s1 <> SOME (SELECT s1 FROM t2);
Use of the word SOME is rare, but this example shows why it might be useful. To most people's ears, the English phrase ``a is not equal to any b'' means ``there is no b which is equal to a,'' but that isn't what is meant by the SQL syntax. Using <> SOME instead helps ensure that everyone understands the true meaning of the query. 

Syntax:
operand comparison_operator ALL (subquery)
The word ALL, which must follow a comparison operator, means ``return TRUE if the comparison is TRUE for ALL of the rows that the subquery returns.'' For example:
SELECT s1 FROM t1 WHERE s1 > ALL (SELECT s1 FROM t2);
Suppose that there is a row in table t1 containing (10). The expression is TRUE if table t2 contains (-5,0,+5) because 10 is greater than all three values in t2. The expression is FALSE if table t2 contains (12,6,NULL,-100) because there is a single value 12 in table t2 that is greater than 10. The expression is UNKNOWN if table t2 contains (0,NULL,1).
Finally, if table t2 is empty, the result is TRUE. You might think the result should be UNKNOWN, but sorry, it's TRUE. So, rather oddly, the following statement is TRUE when table t2 is empty:
SELECT * FROM t1 WHERE 1 > ALL (SELECT s1 FROM t2);
But this statement is UNKNOWN when table t2 is empty:
SELECT * FROM t1 WHERE 1 > (SELECT s1 FROM t2);
In addition, the following statement is UNKNOWN when table t2 is empty:
SELECT * FROM t1 WHERE 1 > ALL (SELECT MAX(s1) FROM t2);
In general, tables with NULL values and empty tables are edge cases. When writing subquery code, always consider whether you have taken those two possibilities into account.
NOT IN is an alias for <> ALL. Thus these two statements are the same:
SELECT s1 FROM t1 WHERE s1 <> ALL (SELECT s1 FROM t2);
SELECT s1 FROM t1 WHERE s1 NOT IN (SELECT s1 FROM t2);

Reference: 
http://dev.mysql.com/doc/refman/5.0/en/subqueries.html


No comments:

Post a Comment