Paul Robertson's words, punctuated

Thoughts on development, user-centered design, code, etc. by Paul Robertson

Multi-table INSERT using one SQL statement in AIR SQLite

This article describes a way that you can use a single INSERT statement to add data to multiple tables in the SQL dialect supported by the SQLite engine in Adobe AIR.

Normally in SQL, including in AIR’s built-in SQLite database engine, you can only add data to one table at a time using an INSERT statement. In some cases, particularly when you’re doing a “bulk loading” operation such as importing data from a text file into a database, it’s convenient to be able to insert data into multiple tables using a single INSERT statement.

For example, suppose you have XML data to import into a database such as the following:

<?xml version="1.0"?>
<employees>
    <employee firstName="Bob" lastName="Smith"
        location="San Francisco" country="USA"/>
    <employee firstName="Harold" lastName="Jones"
        location="San Francisco" country="USA"/>
    <employee firstName="Tom" lastName="Donovan"
        location="Boston" country="USA"/>
    <employee firstName="Mike" lastName="Wilson"
        location="Calgary" country="Canada"/>
    <employee firstName="Steve" lastName="Thomas"
        location="London" country="UK"/>
    <employee firstName="Joe" lastName="Nelson"
        location="London" country="UK"/>
    <employee firstName="Juan" lastName="Varga"
        location="Buenos Aires" country="Argentina"/>
    ...
</employees>

The XML data isn’t normalized, so there is duplicate data between the various records. We will import it into a database with the following (normalized) structure:

Data model for the database

Assuming the data is going to be imported as a single user operation, it would be painful to need to prompt the user or throw errors for every duplicate entry.

Using the technique described here, you can use a single SQL statement to add an employee and if necessary any related data including office location and country. (You would still loop over the data and execute one INSERT statement per employee record – but you wouldn’t need to execute three INSERT statements per employee record, or need to check for duplicate office locations and countries for each employee record to be inserted.)

As mentioned above, this technique is probably only appropriate when you’re doing “bulk importing” of data. If you’re just adding a single conceptual record (even if it includes data in multiple tables) you’ll most likely want to use a series of INSERT statements to add the data, so that you can have more precise error checking and handling.

How to do it

In an attempt to “cut to the chase” I’m going to put the necessary code here. For more detailed explanations on how this works and why it’s necessary, see the ”details” section below.

This technique for inserting data into multiple tables using a single SQL statement consists of three elements:

  1. A view in the database that groups the data to be inserted (from all the tables) into one “table”
  2. An INSERT statement to add the data, using the view as the destination “table” in the INSERT statement. This is the INSERT statement that you’ll run from your application while importing the data
  3. A trigger defined on the view, which runs when any INSERT statement is executed against the view. This trigger does the actual work of checking for existing data and inserting data into individual tables.

Here’s the code for each part:

A view grouping the data to insert

This combines all the tables that potentially need data inserted into a single “table” for the INSERT statement. You run this statement once to create the view in the database.

CREATE VIEW employees_for_insert AS
SELECT e.firstName,
    e.lastName,
    l.name AS locationName,
    c.name AS countryName
FROM employees e
    INNER JOIN locations l ON e.locationId = l.locationId
    INNER JOIN countries c ON l.countryId = c.countryId

An INSERT statement that “inserts” the data into the view

This code isn’t actually run, but it passes all the data to the database engine for use in the trigger. You execute this SQL statement once for each record to add to the tables.

INSERT INTO employees_for_insert
(
    firstName,
    lastName,
    locationName,
    countryName
)
VALUES
(
    :firstName,
    :lastName,
    :locationName,
    :countryName
)

An INSTEAD OF trigger defined on the view

This is the code that actually runs when the INSERT statement above is executed. You run this code once to create the trigger. Then the database runs the code in the trigger body automatically. If a location or country doesn’t exist it is inserted. However, if they do exist, nothing happens (no duplicate record is created). Then the main employee record is inserted.

CREATE TRIGGER insert_employees_locations_countries
INSTEAD OF INSERT
ON employees_for_insert
    
BEGIN
    
INSERT INTO countries (name)
SELECT NEW.countryName
WHERE NOT EXISTS
    (SELECT 1 FROM countries
     WHERE name = NEW.countryName);
        
INSERT INTO locations (name, countryId)
SELECT NEW.locationName, countries.countryId
FROM countries
WHERE countries.name = NEW.countryName
AND NOT EXISTS
    (SELECT locationId
     FROM locations
     WHERE name = NEW.locationName);
    
INSERT INTO employees (firstName, lastName, locationId)
SELECT NEW.firstName,
    NEW.lastName,
    locations.locationId
FROM locations
WHERE locations.name = NEW.locationName;
    
END

Details

The SQL language is designed for working with relational databases, so in a SQL database you usually use multiple tables to represent a single piece of data. That means that in a normal scenario, if you want to add a new record to a table, you may need to add a new row of data to additional tables that the main data is related to.

In this example, we are using a database with the following structure:

Data model for the database

This data represents employees in a large company that has multiple office locations, identified by records in the locations table. In fact, this company is an international company that has offices in different countries, including multiple offices in some countries (represented by the countries table.

Each employee is associated with their primary office location by the locationId field in the employees table, and each location is defined as being in a certain country by the countryId column in the locations table.

In order to add a new employee you would generally have to perform several steps:

  1. Check whether the country where the employee’s office is located exists in the countries table
  2. If not, add it; if so, get its id to create the relationship with the locations table
  3. Make sure the office location exists in the locations table
  4. If necessary add the location record, and get its id to use in the employee record
  5. Add the employee record to the employees table

This is a fairly complicated process because a single SQL INSERT statement can only operate on one table at a time. In a simple case where you are adding a single employee record, this complexity isn’t unbearably difficult. You can execute the series of SQL statements in sequence in a single transaction. If an error occurs, you can break out of the sequence and display an error message or handle the error as desired.

However, if you’re importing a large set of data it’s not convenient to throw errors or display dialogs to the user. Instead, it would be nice to be able to just add any dependent data where appropriate, and insert all the data in one step.

The technique that’s described in this article makes use of database views and a SQLite feature known as INSTEAD OF triggers.

A view is a predefined SELECT statement that’s saved in a database so it can be used in queries as though it was a table. Because it usually includes data from multiple tables, and doesn’t necessarily include all the data from any given table, a view is generally read-only – you can use a SELECT statement to retrieve data from a view, but you can’t use an INSERT, UPDATE, or DELETE statement to modify the view data.

However, in SQLite (and consequently in AIR) you can define a special type of trigger that’s associated with a view known as an INSTEAD OF trigger. (A trigger is a set of SQL code that’s associated with a table. The code is executed when a data manipulation operation is performed on that table.) When a SQL statement attempts to perform an INSERT, DELETE, or UPDATE operation on the view that has an INSTEAD OF trigger defined for that particular operation, the trigger is executed instead of the specified operation. You can only define INSTEAD OF triggers on views (it wouldn’t make sense to use one for a table). For more information about triggers, see the Adobe AIR SQL reference documentation for the CREATE TRIGGER statement.

In this example, a view is defined that includes all the data in all the tables that potentially need data inserted. When the INSERT statement is executed the database runs the trigger instead. The trigger contains code that checks whether related data in the locations and countries tables already exists, and adds it if necessary. Then it adds the employee record with the related key values.

Comments