Remember the scene from The Wizard of Oz where Dorothy and friends first start out for the Emerald City? The Munchkins tell Dorothy how to find her way by singing:
Follow the primary keys, follow the primary keys, follow, follow, follow, follow, follow the primary keys!
Maybe I remember it wrong, there might have been some yellow bricks in there, but we should follow this advice when writing queries from relational databases, and particularly MEDITECH’s Data Repository: if you want the most efficient, best-performing reports, you should learn about and use the primary keys built into the DR databases.
Primary keys: the basics
What are primary keys, you ask? Simply put, a primary key is what makes a record in a database table unique. Imagine a table of customers for a small business, each of whom has a customer number - a number that can only be used once and is different for each of them. This is the primary key.
Structurally, a primary key is defined for a table on the column (or more often, columns) whose values make the record unique. When defined, this is known as a primary key constraint, whose columns can’t have any NULL (empty) values. This primary key also becomes the table’s one and only clustered index, the physical structure that determines what order the data are stored in. The clustered index is also used to quickly retrieve data when called for by a query, stored procedure or application. (This last point will become important a little later).
Another way to think of a primary key is that it helps you understand the level of detail contained in a table. The DR table LabSpecimens, for example, has one record for each unique specimen, but not more detailed data about test results for that specimen. Since our specimen is likely to have more than one test result associated with it, we use LabSpecimenTests to find data about the tests – one record per test, per specimen.
Primary key examples
Let’s see what this looks like when you use SQL Management Studio to browse the tables from a typical live DR database (these examples are based on simplified versions of actual DR tables):
Guess what the key symbol indicates? You’re right, the primary key columns. Notice too that the SourceID column is always the first column in a table, and always part of the primary key – this is universally true for all DR tables. So for our LabSpecimens table on the left, the level of available detail is the SpecimenID – a single lab specimen. Like most DR primary keys, the “ID” portion of this column name indicates this is an internal identifier, not something displayed to the user. For LabSpecimens, you can also see what the user-friendly value is, the SpecimenNumber column toward the bottom of the list.
You’ll notice the LabSpecimenTests table on the right has three primary key columns: the first two identical to LabSpecimens (SourceID and SpecimenID) and a third column, TestPrintNumberID. We can infer a couple of things from this:
- LabSpecimentTests has more detailed data than LabSpecimens; and
- If I want to write a query that shows information about specimens and all their associated tests, I’m going to need both tables.
Using primary keys in joins
It’s not a coincidence that those first two primary key columns are the same. Not only are they named identically in both tables, but the data values they contain are the same too – that is, every unique combination of SourceID and SpecimenID from LabSpecimenTests has to have a related record in LabSpecimens. (How can you have a lab test if there is no specimen?) This is of course driven by how the data are stored natively in MEDITECH – what we see in DR as “SpecimenID” is stored in the LAB.L.SPEC.urn DPM/element for you NPR fans.
This type of relationship between tables can be described as parent-child. In a true relational database design, these relationships are structurally defined and enforced using “foreign keys.” However since the Data Repository doesn’t have a true relational design (which is on purpose, but that’s a story for another time), there are no foreign keys – when you need data from more than one table, you have to tell the server how to do it, using T-SQL join clauses in your queries and stored procedures.
The SQL programming language uses joins to combine fields from two tables when they share common data values. Using our lab examples from above, the relationship between these tables would be written like this:
ON LabSpecimens.SourceID = LabSpecimenTests.SourceID
AND LabSpecimens.SpecimenID = LabSpecimenTests.SpecimenID
If we wanted a report that showed all lab tests by their specimen number, test name and test result, this simplified example shows how we join using both of the primary key columns in our SQL to retrieve rows from both tables.
Using the primary keys in SQL joins not only correctly tells SQL Server how the tables are related to each other, but is also critical for optimal query performance. Many poor performing queries or applications are the result of incomplete or incorrect join statements. You can avoid this pitfall by understanding what the primary keys are for the tables you’re using.
Modules and primary keys
Once you get in the habit of looking at primary keys, you’ll quickly find that in most cases, they are unique to each MEDITECH application or module. So while the keys we use within lab when working with specimens (including blood bank and microbiology) are common to each other, they are different from what you find in admissions, order entry or pharmacy. Let’s see a few examples, again with greatly simplified versions of DR tables:
In each case, our primary key has the SourceID and one other “ID” column, with data values that are specific to that module. So for ADM, we have a patient visit (represented with the user-friendly value of account number). For BAR, we have a patient bill (also using account number). In OE, we have an order number, in PHA we have a prescription (RxNumber). And so on. When you know what a module’s primary keys are, you can very quickly understand how to go from “top to bottom” within a module’s data; that is, from a summary level to more detail. Imagine we want order information for OeOrders, but also want associated query information. If we follow the SourceID and OrderID columns from the first table, we’ll find they also exist in OeOrderQueries, along with QueryID as part of the primary key. Know we know how to join these tables correctly for fast and accurate data presentation.
A note about VisitID
In the table examples above, you’ll notice the VisitID in each table and that it’s only the primary key in AdmVisits. This is arguably the most important field in the data repository – it’s the internal identifier of a patient account. The VisitID column allows you to show patient-specific information on your reports - think clinical documentation, medication lists, scheduled appointments, billing information, etc. While it’s the primary in ADM and EDM, it also exists in just about every other module, at least in “top level” tables like BarVisits, LabSpecimens, PhaRx, OeOrders, etc.
What initially prompted me to write this blog was a Meditech L post with an example query that joined two tables from BAR using VisitID. While it may have returned data correctly, my guess is that it was likely very slow to do so, since BillingID is the primary key within many BAR tables, not VisitID. Adding non-clustered indexes on VisitID on all the tables from the query will solve the speed problem, but adds unnecessary overhead too. A simpler solution is to follow the primary keys within BAR, which will be both fast and accurate.
Wrapping it all up
Knowing which tables to use is one of the biggest challenges when writing DR reports. Understanding those tables, and the relationships between them is part of that challenge. Primary keys are a critical part of any database, but especially so in the DR, since they are the only definition we have for how to correctly and quickly get information. So just think of the munchkins every time you write your SQL code: follow the primary keys, follow the primary keys…
By Ian Proffer at 24 Feb 2015, 11:39 AM