Advanced Queries and SQL

Presented by Beth Chapman

Queries and SQL

During this session, we will discuss the use of advanced query techniques. These techniques include the use of subqueries, self-joins, and unions. We will also discuss in depth updateable queries.

The examples included in this paper are SQL code rather than screen shots of the QBE. Microsoft Access allows me to enter this standard version of SQL syntax in the SQL View window. You can bring up the QBE window, insert the SQL syntax into the SQL view and Access will create a visual equivalent and convert the ANSI/ISO syntax to Access-specific SQL.

Subqueries

A subquery is literally a query within a query. Subqueries occur when the result of a query is needed to determine which rows or groups should be included as part of the query result. Subqueries can be used within SELECT, SELECT...INTO, INSERT...INTO, DELETE, or UPDATE statements, or inside another subquery.

A subquery is different from a nested query in that the query itself actually exists as part of a filter criteria rather than as an external data source. In other words, subqueries (often called inner queries) are found as a row (WHERE) or group (HAVING) filter conditions.

Restrictions

The following are restrictions to using subqueries whether they appear in SELECT or Action queries:

The query can typically contain only one field name in its SELECT fields list. The exception to this rule is the use of the wildcard (*) when performing an existence subquery.
You may not use UNION and ORDER BY clauses
WHERE field names within the subquery may reference fields from the main query (often called the outer reference). These subqueries are typically called correlated queries and may take a longer time to process as each row in the main query needs to be correlated with the subquery result. This means that the inner query has to be executed each time a new row in the outer query is encountered rather than just once.
You must enclose subqueries within parentheses.

SELECT Queries

Subqueries may appear in several formats as part of the filter criteria within a SELECT query. First, a subquery may represent the right side of a standard comparison operator (i.e. =). Second, subqueries may be used to create membership lists (IN). Subqueries can also be used to determine if a given value exists within another table (EXISTS). Finally, a subquery can be used to determine whether any (ANY/SOME) or all (ALL) rows in another table meet the criteria used to include the current row in the outer query in the final result. We will describe each of these types of subqueries separately.

When using the QBE, subqueries can be created by entering the code into the appropriate column criteria section.

Comparison Operators

Subqueries may only be used with the standard comparison operators (=,<=,>=,<,>,<>) and the set membership test (IN). Remember that a subquery is required when we aren't sure of the information needed to perform the query.

For example, if I want to get a listing of all orders taken by George Bush, but I don't know his salesrep ID, I could create this query:

SELECT DISTINCT cu_company FROM Customers, OrderHeaders WHERE oh_custid = cu_custid AND oh_repid = (SELECT sr_repid FROM Salesreps WHERE UCASE(sr_last) = "BUSH") AND UCASE(sr_first) = "GEORGE" AND UCASE(sr_init) = "W";

The inner, or subquery is processed first, providing the outer query with the proper salesrep ID value. Since this type of subquery is only executed the first time the query is evaluated, additional performance time is minimal.

The result from a subquery must be a single row return. Therefore, if more than one row matches the filter criteria, an Access error message will be encountered.

Set membership

There are three sets of membership subqueries: IN, existence, and quantification.

IN

You should already be familiar with the standard set membership test: IN. When used within a simple query, the IN operator is followed by a hand-entered list of possible values:

SELECT Cu_Company FROM Customers WHERE Cu_State IN("PA","AZ","NY");

If the appropriate field in the record currently being evaluated matches an element in the list (i.e. the field value is a member of the set), the record is included in the result.

There are times, however, when we don't know what values belong in the list. For instance, suppose I want to know the customer name and order numbers for all orders that included the purchase of a Microsoft mouse? The easiest way to get this information is through the use of a subquery which provides a list of order numbers for any order having a MSMOUS line item:

SELECT CU_COMPANY, OH_ORDERNO FROM Customer, OrderHeaders WHERE CU_CUSTID = OH_CUSTID AND OH_ORDERNO In (Select ol_orderno FROM LineItems WHERE ol_item = "MSMOUS");

Since inner queries are executed first, SQL "replaces" my subquery with a list of order numbers. As each row in the outer query is processed, the order number is checked against the list. If a match is found, the row is included in the result set. Otherwise, the current row is not a member of the set and is excluded from the result.

Existence tests

The existence test checks to see if the specified value from the current record exists in the subquery result. Unlike the earlier subqueries, existence tests return only a true or false. If the result returned is true, then the current record is included in the final result. If the result is false, then the current record is excluded from the final result.

Since the result can only be true or false, the * (asterisk) wildcard is allowed. In fact, using the wildcard is more common than using a column name.

Let's take a different look at our query for finding customers and orders containing a MS mouse:

SELECT cu_company, oh_orderno FROM customer, orderheaders WHERE cu_custid = oh_custid AND exists(Select * FROM lineitems WHERE ol_orderno = oh_orderno AND ol_item = "MSMOUS");

Notice that this example has a reference to a field in the inner query (i.e., the existence test) from the outer query. That is, the order number field (i.e., oh_orderno) from the order header table is referenced in the inner query, even though the table itself is from the outer query (i.e. ol_orderno = oh_orderno). This outer reference means that SQL must run the inner query for each record in the outer query.

Given this behavior, if we have 200 customer records, the existence test must run 200 times. This type of query is often called a correlated query. In other words, the inner query result must correlate to the current outer query record. Correlated queries are rather slow to execute when compared to non-correlated queries.

In fact, few times occur when an existence test cannot be replaced by the more efficient IN subquery, which we have just seen. We get the same result with better performance. You can, of course, use the NOT predicate with the IN or the EXISTS to negate the test.

Quantification tests

The final subquery type is the quantification tests. These extensions of the IN set membership test compare a value (usually the outer query request) to a set of values that result from the inner query. The following sections describe the two types of quantification tests: ANY/SOME and ALL.

ANY/SOME Quantification Test

The ANY (or SOME) test is used to include those records from the main query for which some of the records in the inner, or subquery, return a true. In other words, if I want to know which sales reps have an order (ANY) that exceeds ten percent of their sales quota, I can write a query such as this:

SELECT sr_name FROM Salesreps WHERE (0.1*sr_quota) < ANY(SELECT SUM(ol_price*ol_qty) FROM orderheaders,lineitems WHERE oh_repid = sr_repid AND oh_orderno = ol_orderno);

What probably appears confusing is the direction of the comparison. The query I wanted was the names of sales reps who have a sale worth more than ten percent of their sales quota. Yet I used the less-than operator because of the way the query has been phrased.

In essence, I need to know whether any instances occur in which ten percent of the quota is less than the sale itself. If this case is true, than the sale must be greater than ten percent of the quota.

For example, if a sales rep has a quota of 1000.00, then ten percent is 100.00. This rep also has a sales order for 110.00. Given this information, the sample query evaluates to 100.00 < 110.00, which is true. The sale is therefore greater than ten percent of the rep's quota, so processing stops and the sales rep is included in the result.

ALL quantification test

The ALL test is similar to the ANY/SOME except that it is more stringent. For the record in the main query to be included in the result set, all the records in the subquery must meet the criteria. In our sales rep example, therefore, all the sales orders must be greater than ten percent of the sales rep's quota for the sales rep to be included in the result set.

Subqueries and Action Queries

So far we have discussed subqueries only within the context of a SELECT queries. However, we frequently use subqueries within action queries such as INSERT, DELETE, and UPDATE.

For example, we may make periodic backups, or archives, of old invoices to keep performance levels high in the main order entry system. This task requires a batch INSERT INTO another table. We can write a query such as this:

.INSERT INTO OldInvoices (Oh_Custid,Oh_Orderno,Oh_Date,Oh_RepID,Oh_Voided) SELECT * FROM OrderHeaders WHERE Oh_Balance = 0 OR Oh_Voided Is Not Null;

We also can use this technique to update records. For instance, sales rep George Bush is no longer with the company; therefore, we need to delete all his open orders, because he isn't going to be around to take care of them. Of course, we do not disturb any of the completed orders because we would be erasing order history.

We can therefore write the following DELETE query:

DELETE FROM OrderHeaders WHERE Oh_Shipped Is Null AND Oh_Voided Is Null AND Oh_RepID = (SELECT Sr_RepID FROM SalesReps WHERE UCASE(Sr_First) = "GEORGE" AND UCASE(Sr_Last) = "Bush");

However, this query may not be appropriate. After all, chances are that the orders must still be completed. The only difference is that George will not be doing any of the tracking. Instead of doing a DELETE action then, we would probably choose to do an UPDATE:

UPDATE OrderHeaders SET Oh_RepID = Null WHERE Oh_Shipped = Null AND Oh_Voided = Null AND Oh_RepID = (SELECT Sr_RepID FROM SalesReps WHERE UCASE(Sr_First) = "GEORGE" AND UCASE(Sr_Last) = "BUSH");

Remember that you can place subqueries only within WHERE or HAVING clauses; therefore, you need to know the RepID of the replacement.

Self Joins

There are at least three special circumstance in which we find that we need to join records in the same table: timeline data, many-to-many relationships, and tables in which rows are related to other rows in the same table (i.e., an employee table in which there are supervisors and the people who work for them).

Let's look at the second instance first, as it is easier to solve. When we have tables involved in a many-to-many relationship, we create a specialized table known as a resolver, link or join. This table typically contains only the unique identifier for each record involved in the relationship.

For instance, in a legal database, any given case may have many lawyers and any given lawyer can work on many cases. To determine which lawyers have worked on which cases, we create a table that contains only the lawyer and case IDs. This resolver table creates a one-to-many relationship between itself and the primary tables (lawyers and cases).

When we need to know which cases two specific lawyers (i.e., Larry and Laura) have worked on together, we need to make two passes through the resolver. The first pass finds all the cases for Larry and the second finds all the cases for Laura.

However, this query merely gives us all the cases for Larry and Laura. What we really want are only those cases on which both Larry and Laura participated. We must therefore make a third pass through the result of the first two passes and eliminate any records that do not have the same case ID for both Laura and Larry.

This process sounds rather complicated because SQL does not make multiple passes through a table in a single query. So we need to make SQL think that we have two tables that happen to contain the same information. We then could rejoin these two tables on the interaction ID, which eliminates any records that are not matches.

Unfortunately, this method also gives us a result with lawyers other than Larry and Laura. We must therefore further restrict the query result by using two search conditions. The first criteria allows only Larry's records to be selected from one of the tables, and the other criteria allows only Laura's records to be selected from the other table.

To perform this step, just add the same table twice in the QBE. Adjust the criteria so that one instance of the table reflects only Larry's records and the other instance of the table reflects only Laura's records. Then make sure that the tables are joined on the case ID. Access produces a query that looks like this:

SELECT Case1.LK_CASE FROM BKLINK AS Case1 INNER JOIN BKLINK AS Case2 ON Case1.LK_CASE = Case2.LK_CASE WHERE ((Case1.LK_ATTY="LCOHE") AND (Case2.LK_ATTY="LWECH"));

Notice that the SQL code reflects the fact that two versions of the table are required. In fact, Access automatically assigns an alias (Bklink AS Bklink_1) to the second copy by appending _1 to the name (I manually changed this to be Case1 and Case2).

Further, notice that after the alias is assigned, you must use it when referencing a field from that table anywhere within the SQL statement. That is why the alias appears in the fields list (i.e. Case1.LK_CASE) even though the alias is not set prior to this point.

Now for the timeline example. In this instance, we have a table that contains information collected for the same cases over time (for example, a meetings table). Because any given attendee may attend more than one meeting, many records may exist for the same person. Let's suppose for marketing purposes, we want to know who attended a meeting last year, but not this year. This query requires that we make two passes through the same table (a self join) looking for attendees for two different years.

However, this example is a more complicated self join because we are not looking for the records residing in the intersection, as we were in the interactions example. In fact, this query requires that we return those records that are the difference of the join. In other words, we want people who are not part of the intersection between 1993 and 1994 (people who attended meetings in both years), and we don't want people who are in the 1994 circle but not the 1993 (people who attended meetings this year but not last year). The group of names we want resides only in the 1993 circle outside the intersection (the difference).

To perform this self join, we need to combine the techniques of a simple self join and subqueries.

SELECT meet1.mt_name FROM meeting AS meet1 WHERE meet1.mt_year = "1993" AND meet1.mt_name NOT IN(SELECT meet2.mt_name FROM meeting meet2 WHERE meet2.mt_year = "1994" AND meet1.mt_name = meet2.mt_name);

Unions

A UNION is not technically a subquery at all. A UNION is a specialized SQL operation that combines the results from two or more independent queries or tables into a single result.

Typically, you use a UNION when you need to combine information from tables that are not directly related or when a table is related to more than one other table required for the query. For example, suppose we want to know the balance owed for all of our customers in an order entry system. In order to get this information we need to know the total dollar amount ordered and the total number of payments made. I know that this sounds rather straightforward, but it actually isn't.

The problem lies in the fact that the order header table is the parent table to both line item details and payments. SQL prevents you from relating a parent table to multiple children within the same query. What happens is that the query result will return one row for every occurrence of the parent value in the child with the most occurrences.

This means, for instance, that if an order has three line items and a single payment for the entire order amount, the query result will show the payment amount three times. Based upon this result, you owe the customer money! The query itself looks like this:

SELECT oh_orderno,oh_custid,oh_date,SUM(ol_price*ol_qty) As InvoiceTotal, 0.00 As Payments From OrderHeaders,LineItems WHERE orderheaders.oh_orderno = lineitems.ol_orderno GROUP BY oh_custid,oh_orderno,oh_date UNION SELECT py_orderno,oh_custid,oh_date,0.00, SUM(py_amount) FROM orderheaders,payments WHERE orderheaders.oh_orderno = payments.py_orderno GROUP BY oh_custid,py_orderno,oh_date ORDER BY oh_date;

The purpose of a UNION is to take several dissimilar tables, or query results, and create a single unified result. To perform this step, SQL needs to know how you want the result table to look. You use the first SELECT fields list for this determination. Therefore, the fields list in the first SELECT must allow for all the fields that will be needed, including those fields that are not in the first statement but will appear later.

You accomplish this task through the use of placeholders. Note in our preceding example the alias fields type and amount. These names do not actually reference real fields. The rule is that each SELECT statement must contain the same number of fields, although the fields in subsequent queries need not be the same size or even data type.

In our example, however, I represented empty field positions with the same size and data type as I expect to store there. You can, of course, use anything at all to indicate that the field is empty for the current record.

The next point of interest with UNIONs is the concept of DISTINCT. When we issue a standard SELECT statement, the default is to include all the records from the involved tables. In fact, if we want to eliminate duplicate records we need to include the DISTINCT or DISTINCTROW clause.

Notice, however, that in the UNION we have explicitly issued the ALL clause. The reason is that UNIONs are assumed to have a high probability for duplicate records. Therefore, DISTINCT becomes the default and the ALL optional.

Remember that creating a DISTINCT query result can be time consuming. Based upon your knowledge of the data, if you are fairly sure you will have few to no duplicates, use the ALL option when declaring a UNION. Further, the output of a query that uses DISTINCT isn't updateable and doesn't reflect subsequent changes made by other users.

The last rule for unions deals with those SQL SELECT clauses that affect the entire result table. In the case of our UNION, the clause is the ORDER BY. Because ORDER BY sorts the entire result table, this clause can be issued only one time and only from the last SELECT in the UNION.

However, you can use a GROUP BY and/or HAVING clause in each of the UNION queries to group the returned data. These clauses act upon an intermediate table rather than the final table.

Finally, a UNION is one of the query types not directly accessible through the QBE. Instead, you must choose the SQL Specific option from the Query menu. This selection brings you into the SQL View window. In other words, you need to know how to write your own queries by hand to write a UNION.

You may want to test the individual queries and make sure that you are getting the required result before creating a UNION. Remember that you are allowed to combine the results of separate queries that already exist in the database, as well as write the UNION as a single query.

RecordSource Queries

As you already know, the RecordSource property of a report or form is more likely to reference a query than a table. This may be due to the fact that the Recordset must combine the fields from more than one table, or perhaps we just want the data to be ordered in some fashion, such as company name.

RecordSource queries, however, need not be precompiled queries. In other words, we don't have to define the query in the QBE and store it in the Jet query container. Instead, we can define the query directly into the RecordSource property of the form's property sheet. One of the main advantages of this technique is the very fact that by not having the query stored in the query container, we prevent the user from being able to access and manipulate the query.

Further, we can actually change the RecordSource property on-the-fly through Access Basic. For instance, suppose that I want to allow the user to choose the order in which they would like to have the data sorted. Sometimes by company name, sometimes by order date within the company name, etc. In order to change the appearance of the data, I must either apply a filter or change the underlying RecordSource property and requery the Recordset.

Since it is usually faster to execute a SQL statement than to apply a filter, I typically choose to change the RecordSource property. Here's how I do this. I first present the user with a modal form from which they can choose the new sorted order. When the user clicks on the OK command button, the click event evaluates their sort selections, creates a new ORDER BY string stored in the variable g_sortstring, and executes the following lines of Access Basic code:

Application.Echo False l_select = "SELECT OrderHeaders.OH_ORDERNO,oh_voided, Oh_custid,Customers.CU_CUSTID, Customers.CU_COMPANY, Customers.CU_ADDR1, Customers.CU_ADDR2, Customers.CU_CITY, Customers.CU_STATE, Customers.CU_ZIP, Customers.CU_CONTACT, OrderHeaders.OH_DATE, OrderHeaders.OH_SHIPVIA, OrderHeaders.OH_PONO,
OrderHeaders.OH_COMMENT, OrderHeaders.OH_MESSAGE, OrderHeaders.OH_REPID, OrderHeaders.Oh_shiptoaddress, OrderHeaders.Oh_shiptocity, OrderHeaders.Oh_shiptostate, OrderHeaders.Oh_shiptozip, Customers.CU_FAX, Customers.CU_PHONE,Sr_repid, OrderHeaders.Oh_shipped, OrderHeaders.OH_SHIPTOName,oh_printed FROM SalesReps INNER JOIN (Customers INNER JOIN OrderHeaders ON Customers.CU_CUSTID = OrderHeaders.OH_CUSTID) ON SalesReps.SR_REPID = OrderHeaders.OH_REPID ORDER BY " & g_sortstring & ";"
Forms!FrmOrderEntry.RecordSource = l_select Forms!FrmOrderEntry.Requery Application.Echo True

This code does several things. First, it "turns off" screen repainting by setting the Echo property of the Application object to False. This insures that the user does not see the form "flash" as the values are refreshed during the requery. Second, we have to make sure that we have the same SELECT query as the original RecordSource property, the only difference being the contents of the ORDER BY clause.

The next two steps occur against the target form, in this case the order entry form. The first step is to reset the RecordSource property. The second is to requery the underlying Recordset using the new RecordSource.

Notice that we have to use the formal naming of the form (Forms!FrmOrderEntry) rather than the ME property. This is because we are still in the sort selection form. Therefore, ME would refer to the incorrect form.

Being able to directly use a SQL statement is only possible when referencing a property, such as the RecordSource, which supports queries. There are other times, in the course of an application, in which you might want to execute a SQL statement.

For example, when the user adds a new customer, we need to create a record in the purchase totals table. This requires an INSERT statement. We cannot execute the INSERT directly. Instead, we need to create an object variable that represents a query. To accomplish this task, we need to understand how Access Basic works with query definitions.

Access Basic and Queries

The QueryDef collection of the JET data engine references all of the defined, or compiled, queries. A defined query is one that can be seen from the Query window of the Database Container. In order to use one of the defined queries for an action such as creating a Recordset, you can either use a direct reference or create a QueryDef variable.

To directly reference a query, you merely need to create a recordset variable using the name of the query:

Set f_rs = l_db.OpenRecordset("customerinvoices")

If you want to use the second method, you need to create a QueryDef variable which contains a reference to the appropriate query:

Dim l_query As QueryDef Set l_query = l_db.QueryDefs("CustomerInvoices")

And then just open the new Recordset:

Set f_rs = l_query.OpenRecordset()

An alternate third method would be to place the actual SELECT statement into the OpenRecordset method:

Set f_rs = l_db.OpenRecordset("SELECT * FROM Customers")

The advantage of this method lies in the fact that by not using a predefined query, you can modify the statement at anytime from within the code. On the other hand, using a predefined query is typically slightly faster to execute. The reason being that queries stored in the JET are already compiled and merely need to be executed. Queries defined from within code, must be compiled before they can be executed. This takes a small amount of additional processing time.

Creating New/Modifying Old Queries

There might be situations where you may want to add queries to the JET's QueryDefs collection. For example, we earlier discussed changing the RecordSource property on-the-fly based upon the user's selection of sort orders. Suppose we want to take this a step further by allowing the user to permanently save the chosen sort order. We can now take one of two approaches, we can modify the stored query or we can create a new one.

If you want to add a query to the QueryDefs collection, you can do so using the DBEngine.CreateQueryDef() method:

Dim l_db As Database, l_query As QueryDef Set l_db = DBEngine.Workspaces(0).Databases(0) 'Create the query definition Set l_query = DBEngine.CreateQueryDef ("QrySortByCompany", _ "SELECT * FROM Customers ORDER BY Cu_Company")

As soon as Access interprets the CreateQueryDef method, the new query, "QrySortByCompany," is added to those queries already stored in the database Tables/Queries container.

Whenever you use one of the Createx methods for creating new objects such as tables, queries, etc., you need to refresh the view in the Database window before the object name becomes visible if you are already viewing the documents in that container. For instance, if you are viewing the stored queries in the query container of the Database window and then use CreateQueryDef, the name of the new query does not appear until you switch to another container view (such as forms) and then switch back.

To change the stored SQL statement, you merely need to reference the appropriate QueryDef and use the SQL property to affect the transformation:

Dim l_db As Database, l_query As QueryDef, l_rs As Recordset Set l_db = DBEngine.Workspaces(0).Databases(0) Set l_query = l_db.QueryDefs ("QryOrderEntry")
l_query.SQL = "SELECT * FROM Customers ORDER BY Cu_Company;" Set l_rs = l_db.OpenRecordSet("QryOrderEntry")

Finally, you can even change the functionality of a stored query definition by using the QueryDef SQL property to run a query against the Recordset created by the stored query:

Dim l_db As Database, l_query As QueryDef Set l_db = DBEngine.Workspaces(0).Databases(0)
Set l_query = l_db.QueryDefs ("QryPhones")
'update the area code for 215 to 610 l_query.SQL = "UPDATE Phones SET ph_areacode = ""610"" _ where ph_areacode = ""215"";" 'run the update query l_query.Execute 'close the query l_query.Close

Since this example is using an Update query, creating a Recordset is not necessary. Therefore, I have used the Execute method of the QueryDef variable to run the SQL statement. Also note the use of the Close method. An important habit to develop is closing objects that are no longer in use. This method applies to table as well as query Recordsets.

Creating Temporary Queries

Our final topic involves the creation of temporary queries. We don't always want a query to stored in the Query/Table container, yet we need to use the CreateQueryDef method in order to execute any non-control source query. We can create a query which is not stored in the QueryDefs collection by using this slight modification to the CreateQueryDef() method:

Set l_db = DBEngine.Workspaces(0).Databases(0) Set l_query = l_db.CreateQueryDef("", "INSERT INTO totals _ VALUES(" + Me!Cu_Custid + ",0,0,0);") l_query.Execute

The difference between this CreateQueryDef() statement and the ones used previously, lies in the name assigned to the newly created query. In previous versions of the method statement, we always referenced a query name, whereas we used an empty string as the name here.

When Access encounters a CreateQueryDef() without a valid query name, it cannot store query. In fact, Access will assign the query the name #Temporary QueryDef#. After the query has been executed, it will be eliminated. This technique saves us the trouble of having to use the Delete method to remove temporary queries from the Query/Tables container.

While this has been a very brief tour into the use of queries and QueryDefs through Access Basic, it has hopefully pointed out the importance of having a solid understanding of how to create and manipulate SQL statements in Access.

Updateable Queries

Updateable Queries or Updateable Views is one of the most powerful features of the Jet query engine, but is also one of the least understood. This remainder of this paper will help describe how updateable queries work and how you can use them to your advantage in applications.

Much of the text of this document comes from Chapter 4, "Queries," in the Microsoft Press book, Jet Database Engine Programmer's Guide.

Introduction

Since version 1.0, the Microsoft Jet database engine has provided the ability to update the result of a multi-table join without writing any code. This is a feature that is found in few, if any other, desktop database systems.

In the first version of Jet the following rules applied to query updateability:

Only the many-side of a one-to-many query is updateable
Either side of a one-to-one query is updateable
Neither side of a many-to-many query is updateable

While this seems like a simple set of rules, it turned out to cause much confusion amongst users of Jet. In order to simplify this confusion, we dramatically increased the updateability rules in Jet 2.0. However, while the flexibility of updateable queries was increased in Jet 2.0 to allow either side of a multi-table join to be updated, the complexity of the rules increased as well. Because of this, many developers do not take full advantage of this powerful feature.

Terminology

Before describing the rules behind updateable queries it is necessary understand some basic terminology.

Primary key
One or more fields whose value or values uniquely identify each record in a table. You can only have one primary key. An Employees table, for example, could use the Social Security number as a primary key.
Foreign key
One or more table fields that refer to the primary key field or fields in another table. For example, a list of valid part numbers would contain a foreign key to an inventory table containing references to valid part numbers.
One-to-many relationship
An association between two tables in which:
- The primary key value of each record in the primary table corresponds to the value in the matching field or fields of many records in the related table.
- The foreign key value of each record in the related table corresponds to the value in the matching field or fields of many records in the primary table.
One-to-one relationship
An association between two tables in which:
- The primary key value of each record in the primary table corresponds to the value in the matching field or fields of one and only one record in the related table.
- The foreign key value of each record in the related table corresponds to the value in the matching field or fields of one and only one record in the primary table.
Inner Join
A join in which related records from two tables are combined and added to a query's result set only if the values of the joined fields meet a specified condition. For example, the default join between tables in Microsoft Access query design is an inner join that selects records from both tables only when the values of the joined fields are equal.
Left Outer Join
An outer join in which all of the records in the left-hand side of the LEFT JOIN operation in the query's SQL statement are added to the result set, even if there are no matching values in the joined field from the right-hand table. Records from the right-hand table are combined with those from the left-hand table only when there are matching values in the joined fields. When a left-side record has no match, a row of NULL values is joined on the right side.
Right Outer Join
An outer join in which all of the records in the right-hand side of the RIGHT JOIN operation in the query's SQL statement are added to the result set, even if there are no matching values in the joined field from the left-hand table. Records from the left-hand table are combined with those from the right-hand table only when there are matching values in the joined fields. When a right-side record has no match, a row of NULL values is joined on the left side.
One-side table
In a multi-table query, every table is either on the one-side of the query or on the many-side of the query. The one-side is defined as a table whose join key is a primary key or unique index. The one-side table is sometimes also called the primary table.
Many-side table
In a multi-table query, every table is either on the one-side of the query or on the many-side of the query. The many-side is defined as a table whose join key is a non-unique index or has no index.

Referential Integrity

Referential integrity is a system of rules that the Jet database engine will use to ensure that relationships between records in related tables are valid, and that you don't accidentally delete or change related data. You can set referential integrity when all of the following conditions are met:

The matching field from the primary table is a primary key or has a unique index.
The related fields have the same data type.
Both tables belong to the same Jet database. If the tables are linked tables, they must be tables in Jet database (MDB) format, and you must open the database in which they are stored to set referential integrity. Referential integrity can't be enforced for linked tables from databases in other formats.

When referential integrity is enforced, you must observe the following rules:

You can't enter a value in the foreign key field of the related table that doesn't exist in the primary key of the primary table. However, you can enter a null in the foreign key, specifying that the records are unrelated. For example, you can't have an order that is assigned to a customer that doesn't exist, but you can have an order that is assigned to no one by entering a null in the CustomerID field.
You can't delete a record from a primary table, if matching records exist in a related table. For example, you can't delete an employee record from the Employees table if there are orders assigned to the employee in the Orders table.
You can't change a primary key value in the primary table, if that record has related records. For example, you can't change an employee's ID in the Employees table if there are orders assigned to that employee in the Orders table.

If you want the Jet database engine to enforce these rules for a relationship, select the Enforce Referential Integrity check box when you create the relationship using the Relationships window in the Microsoft Access user interface. If referential integrity is enforced and you break one of the rules with related tables, Microsoft Access displays a message and doesn't allow the change.

You can override the restrictions against deleting or changing related records and still preserve referential integrity by setting the Cascade Update Related Fields and Cascade Delete Related Records check boxes. When the Cascade Update Related Fields check box is set, changing a primary key value in the primary table automatically updates the matching value in all related records. When the Cascade Delete Related Records check box is set, deleting a record in the primary table deletes any related records in the related table.

Updating Your Data with Queries

This section describes the rules governing query updateability. You can use the dynamic selecting and sorting capabilities of the query engine for more than just static searches: Once a result set is retrieved, the data can also be updated through a program such as Microsoft Access, or a custom Microsoft Visual Basic program.

Single-table query updateability

A query based on a single table can select some or all records, and some or all columns, from that table. In the simplest case, the query returns all rows and all columns:

SELECT * FROM Customers;

Every row retrieved, and every column in every row, is accessible and subject to change (except calculated columns based on expressions.) Every row can be deleted, and new rows can be inserted.

If you were to use Microsoft Access to open a datasheet based on this query, you would be able to freely browse forward and backward, making any changes you like (subject to security restrictions, validation rules, and referential integrity requirements). Any changes made to the result set created by this query would automatically be reflected in the underlying table.

A query can also select and sort rows from the original table and remain updateable:

SELECT *
FROM Customers
WHERE (((Customers.ContactTitle)="Owner"))
ORDER BY Customers.CompanyName;

An updateable single-table query like this can be useful in a data entry situation in which you would like to present the rows to the user in a certain order, or hide certain rows based on the selection criteria.

One thing to be aware of, however, is that the Jet database engine does not prevent the user from adding a new row through a query's result set that would not have met the original selection criteria. For example, the above query selects only rows in which the ContactTitle is "Owner." The user could add a new row and specify a value other than "Owner" as the ContactTitle, in which case the user would add a row that the query, if subsequently re-run, would no longer select. It is up to your application to enforce insert restrictions such as this.

Single-table query updateability restrictions

A query can select specific columns rather than selecting all columns with the asterisk (*) character:

SELECT 
  Customers.ContactName, 
  Customers.ContactTitle, 
  Customers.Address, 
  Customers.City, 
  Customers.Region, 
  Customers.PostalCode
FROM Customers;

This can be a useful technique to hide certain columns from users, while still allowing them access to the information they need. The columns made available through a query like this are fully updatable. It may, however, be impossible to add a new row through this query, either because the fields not included in the output are specified as "required" at the table level, or because the primary key or foreign keys cannot be created with default values.

Another restriction is on non-native tables, such as Borland's Paradox. Microsoft Jet requires that a primary key be defined for Paradox tables. Btrieve and xBase data sources do not require a unique index, but ODBC tables do. Any queries based on ODBC tables lacking a unique index are not updatable.

Multi-table query updatability

In addition to allowing updates to single-table queries, Microsoft Jet supports updateable multi-table joins. This is extremely useful, because it enables you to combine data from the main table with lookup information from other tables (a many-to-one join), or to join a master table with a related detail table (a one-to-many join) and still have an updateable result set.

Many-to-One Joins

This example joins the Products table (the focus of the query) with the Suppliers table, which provides lookup information, including the Supplier CompanyName and City:

SELECT DISTINCTROW 
  Products.*, 
  Suppliers.CompanyName, 
  Suppliers.City
FROM Suppliers INNER JOIN Products 
ON Suppliers.SupplierID = Products.SupplierID;

The user can change data in any column from the Products table, including the SupplierID that links the Products table to the Suppliers table. If you change the value of the linking field through a datasheet or a form in a Microsoft Access application, you will also see that the corresponding "lookup" information from the "one" table is automatically retrieved and redisplayed. This technique is known as Row Fix-up or AutoLookup.

Although the focus of this many-to-one query is the Products table, it is also possible (though perhaps not desirable) to change the values in columns retrieved from the lookup table, such as the CompanyName or City field in Suppliers. The user may attempt to alter the CompanyName value on one row of the result set created from this query, under the impression that the change will affect only the current row. However, since the value is actually stored back in the Suppliers lookup table, the value is changed for every row in the result set.

One-to-many joins

There is no real logical difference between a "many-to-one" join, as described above, and a "one-to-many" join, except from the point of view of the user. A one-to-many join is sometimes referred to as a master-detail relationship. A single row in the "one" or "master" table is related to one or more rows in the "many" or "detail" table. Updateable multi-table joins are especially useful with these types of one-to-many relationships.

The following query joins the Orders table to the Order Details table in a classic one-to-many relationship:

SELECT DISTINCTROW 
  Orders.*, 
  [Order Details].*
FROM Orders INNER JOIN [Order Details] 
ON Orders.OrderID = [Order Details].OrderID;

Columns in the Orders table focus on the order itself: the Customer who placed the order, the Employee who took the order, the date the order was taken, and so on. Columns derived from the Order Details table specify the actual items that were ordered: the Product ID and pricing details. Just as with the many-to-one example shown above, changes to columns from the "one" table on any given row are automatically made for all other rows based on the same value in the "one" table. For example, if the user changes the Customer ID field, which is drawn from the "master" Orders table, on any given row, the change is automatically reflected in all other rows for this same Order.

Updateable multi-table joins are not limited to a single-level hierarchy. For example, the following query links from Employees to Orders to Order Details:

SELECT DISTINCTROW 
  Employees.EmployeeID, 
  Employees.LastName, 
  Orders.OrderID, 
  Orders.OrderDate, 
  [Order Details].ProductID
FROM (Employees INNER JOIN Orders 
     ON Employees.EmployeeID = Orders.EmployeeID) 
  INNER JOIN [Order Details] 
    ON Orders.OrderID = [Order Details].OrderID;

Columns from all three of these joined tables are updateable in the resulting result set.

Inserting rows into a multi-table query

When inserting a row into a result set based on a multi-table join, rows can be added to one or more, or all of the tables included in the join. It is easy to see that rows from the "many" side of the join are simply appended to the "many" table as needed. Rows from the "one" side of the join can be added as well, if they do not violate referential integrity constraints.

The following query joins the SalesReps (many) table to the Division (one) table:

SELECT DISTINCTROW 
  SalesReps.EmployeeID, 
  SalesReps.LastName, 
  SalesReps.FirstName, 
  SalesReps.DivID, 
  Division.DivID, 
  Division.Description
FROM SalesReps INNER JOIN Division 
ON SalesReps.DivID = Division.DivID;

If, while adding a new SalesReps entry, the user specifies a DivID value that is already present in the Division table, then only the SalesRep row is added. AutoLookup occurs: Values from the corresponding Division record are retrieved.

It is possible to add a new DivID value to the SalesRep table and have that new value automatically added to the Division table if the primary key from the "one" table is included in the join, as in the example above.

Updateable one-to-many outer joins

As described above in the terminology section, an outer join selects all rows from one table in a multi-table join, while only selecting rows with matching values from another table. For rows in which there is no match in one table, artificial NULL column values are supplied. The Jet database engine allows you to "fill in the blanks" in these artificial NULL rows.

Consider the following OUTER JOIN and its result:

SELECT 
  Cust.CustName,
  Cust.CustID,
  Orders.CustID,
  Orders.OrderID
FROM Customers AS Cust LEFT JOIN Orders
ON Cust.CustID = Orders.CustID;

Cust.CustName	Cust.CustID	Orders.CustID	Orders.OrderID
Johnson	1000	1000	1
Johnson	1000	1000	2
Smith	1001	1001	3
Smith	1001	1001	4
Blair	1002	1002	5
Anderson	1003

In this example, the Customers table is outer joined with the Orders table to show all customers regardless of whether or not they have placed an order. Customer "Anderson" (with CustID 1003) has not placed an order. The "fill in the blank with key propagation" rule states that the user can add an order for this customer by filling in the Orders.OrderID field (and any other Order information except OrderID). The value in Cust.CustID is automatically propagated into Orders.CustID. In this update scenario, the child key (Orders.CustID) is read-only. The act of "filling in the blank" locks the new Order row to Customer 1003.

Multi-table Query Updateability Restrictions

There are several requirements that the query must meet in order for it to be fully updateable.

JOIN clauses must be used

You must specify an explicit INNER or OUTER JOIN between the tables. Joins created implicitly in the WHERE clause of the SELECT statement are not updateable. For example, the following join is not updateable:

SELECT 
  Products.ProductID, 
  Products.ProductName, 
  Categories.CategoryID, 
  Categories.CategoryName
FROM Categories, Products
WHERE Products.CategoryID = Categories.CategoryID;

Queries with aggregated data are not updateable

Summary (GROUP BY), UNION, DISTINCT, and cross-tab queries are never updateable. Queries joined to one or more summary queries are not updateable, even if you do not attempt to modify fields from an otherwise updateable table.

Subqueries with aggregated data are updatable

However, a query may be updateable if it refers to a summary query in a sub-SELECT statement, as in the following example:

SELECT DISTINCTROW Orders.*
FROM Orders
WHERE (((Orders.Freight)>
  (SELECT DISTINCTROW Avg(Orders.Freight) AS AvgOfFreight
   FROM Orders;)));

Columns from the Orders table are updateable.

All joined fields must be output in order to insert new records

In order to be able to insert new rows into a table in any query, all primary key columns must be present.

Many-side fields may be non-updateable during updates to the one-side

While updating a single query row, changes to certain fields may render certain other fields non-updateable until the row edit is either saved or canceled. As soon as the user edits data on the "one" side of a query, then the join key on the "many" side can no longer be modified. Usually, the "many" side join key is updateable. However, because data on the "one" side was modified first, this column is temporarily rendered unmodifiable because AutoLookup would discard your changes to the "one" side data. As soon as the change to the "one" side of the query is committed or canceled, the "many" side join key becomes updateable again.

Many-side records cannot be orphaned

A change to a multi-table query must not create "orphaned" records. You can change the join key in the "many" table to a value already present in the "one" table, but you cannot specify a nonexistent value, except in the case of outer joins.