Member-only story

How to Load Dimension Tables

Data Saint Consulting Inc

--

The Basics: SQL MERGE, UPDATE and DELETE.

These SQL features are the foundation for keeping data up-to-date in Hadoop, so let’s take a quick look at them.

MERGE was standardized in SQL 2008 and is a powerful SQL statement that allows inserting, updating and deleting data in a single statement. MERGE makes it easy to keep two systems consistent. Let’s look at the SQL specification for MERGE (slightly simplified):

MERGE INTO <target table>
USING <table reference>
ON <search condition>
<merge when clause>...
WHEN MATCHED [ AND <search condition> ]
THEN <merge update or delete specification>
WHEN NOT MATCHED [ AND <search condition> ]
THEN <merge insert specification>

In this blog we’ll also use the more familiar UPDATE statement, which looks like this:

UPDATE <target table>
SET <set clause list>
[ WHERE <search condition> ]

Update comes into play when you don’t need to mix inserts and updates together in the same statement.

Get Ready to Keep Data Fresh

Your table must be a transactional table. That means the table must be clustered, stored as ORCFile data and have a table property that says transactional = true. Here’s an example:

--

--

Responses (3)