SQL Naming Conventions
In this article, I will survey the common naming conventions for SQL objects in IBM, Microsoft, and Oracle databases. The point is not to prescribe or recommend — there are reasons for each convention, and I'll only try to indicate what those reasons are. I'll look hardest at conventions that are popular, portable, or consistent. When I need to appeal to authority, my sources are:
- Vendor manuals for IBM DB2® 7.2, Microsoft SQL Server 2000, and Oracle9i.
- The documents for the SQL Standard (ISO 9075) and another standard that touches on naming (ISO 11179).
- Books that contain sections about naming conventions; e.g., Koch and Loney's Oracle reference.
- Articles about naming conventions; e.g., Kondreddi (for Microsoft), Sheppard (for Oracle), Mullins (for IBM), or Celko (for standard SQL).
Let's start with a definition for "name." A name is a sequence of characters that should tell both the DBMS and users what an object is. There is actually a distinction between a name and an identifier, to wit: In the string "SELECT * FROM Schema1.Table1" the words Schema1 and Table1 are identifiers. The optional portion [Schema1.] is a qualifier; thus, Table1 is a qualified identifier. The whole thing together -- Schema1.Table1 is a name. When there are no qualifiers, the words name and identifier are interchangeable.
ISO 11179 distinguishes between a name's semantics — its meaning — and its lexical attributes — length, character set, use of abbreviations, and anything to do with spelling. I care most about the lexical attributes.
The following chart shows the maximum length for names of the most important
SQL objects according to ISO and the DBMSs:
The numbers in the chart are either bytes or characters. A maximum character length can be smaller than a maximum byte length if you use a multibyte character set.
The compromise figure appears to be 30 — big enough for a verbose name. But there is an argument for keeping table and column names below the limit: There may be object names that include the table or column name and have suffixes or prefixes; for example, an index named Table1_Index1 or a procedure named UpdateEmployeeName. Such object names would be impossible if the table or column name already had the maximum size. I'll be looking later at the question of whether names should include names.
I want to end most sections of this article by quoting an authority. The quotation is not an endorsement, merely an acknowledgment that someone who knows something has an opinion. Here is the first quote:
"Unless a compelling reason exists, ensure that your standards allow for the length of database object names [except index names] to utilize every character available."
-- Craig Mullins
The next chart shows the characters allowed in names.
|First Character||Letter||Letter, $#@||Letter||Letter, #@|
|Later Characters||Letter, Digit, _||Letter, Digit, $#@_||Letter, Digit, $#_||Letter, Digit, #@_|
|Term||ordinary identifier||nonquoted identifier||regular identifier|
Generally, the first character of a name must be a letter, while subsequent characters may be letters, digits, or _ (underscore). It is true that any DBMS might also allow $, #, @ -- but no DBMS allows all three, and in any case the special characters are not usable everywhere (Microsoft attaches special meaning to names that begin with @ or # and Oracle discourages special characters in the names of certain objects).
But what is a letter? In the original SQL, all letters had to be uppercase Latin so there were only 26 choices. Nowadays the repertoire is more extensive. But be wary of characters outside the Latin-1 character set because:
- IBM cannot always recognize a letter. It just accepts that any multibyte character except space is a letter and will not attempt to determine whether it's uppercase or lowercase.
- IBM and Oracle use the database's character set and so could have a migration problem with exotic letters. (Microsoft uses Unicode and so does not have this problem.)
"The following restrictions apply for Intermediate SQL: a) No <identifier body> shall end in an <underscore>."
-- ISO SQL-92 document
This chart shows the characteristics of delimited identifiers.
|Delimiting Characters||""||""||"" or ||""|
|Term||delimited identifier||delimited identifier||delimited identifier||quoted identifier|
If you find the character-set restrictions of names onerous, you can avoid them by putting identifiers inside double quotes. The result is a delimited identifier (or quoted identifier in Oracle terminology). Delimited identifiers may start with, and contain, any character. (It is a bit uncertain how one can include the " character itself. The standard way is to double it, as in "Empl""oyees" but that's not always documented.)
Support for delimited names is nearly universal, with only two major exceptions: IBM won't allow non-alphanumeric characters for labels and variable names inside stored procedures, and Microsoft won't allow quoted identifiers if the QUOTED_IDENTIFIER switch is off. The reason for the first exception is, perhaps, that IBM converts SQL procedures into another computer language before compilation.
Suppose you make a table with a delimited identifier, e.g.:
CREATE TABLE "t" ("column1" INTEGER);
Now try to get that table with a regular identifier; thus:
SELECT column1 FROM t;
Will this work? Well, according to the SQL Standard, it shouldn't, but with Microsoft, it might. The reason is case sensitivity.
"If you name a schema object using a quoted identifier, then you must use the double quotation marks whenever you refer to that object."
-- Oracle Corporation
I'm pulling two lines out of the charts that came before, and presenting them again, for emphasis:
[From the chart for regular identifiers]
[From the chart for delimited identifiers]
Standard SQL, IBM, and Oracle will convert regular identifiers to uppercase, but will not convert delimited identifiers to uppercase. That is why, for the above example, this statement shouldn't work:
SELECT column1 FROM t;
The reason: the stored name is t but the sought-for name is T.
For Microsoft, the case-sensitivity rule has nothing to do with whether the name is regular or delimited. Instead, identifiers depend on the default collation. If the default collation is case-insensitive, then t equals T. If it's case-sensitive, then t does not equal T.
To sum up, there are two case-sensitivity problems. The first is that the delimited identifier "t" and the regular identifier t differ if one follows the SQL Standard. The second is that Microsoft does not follow the SQL Standard. These problems make it hard to make one naming convention to fit everyone. The possible conventions are:
1) Avoid delimited identifiers. This seems to be what most folks do.
2) Use uppercase always. This is what IBM recommends, and does consistently for the examples in its documentation.
2) Use lowercase except where it would look odd. This is what Microsoft and Oracle do consistently for examples in their documentation.
2) Use lowercase but initial-capitalize. This is what several authoritative sources recommend.
"The <identifier body> of a <regular identifier> is equivalent to an <identifier body> in which every letter that is a lower-case letter is replaced by the equivalent upper-case letter or letters."
-- ISO (SQL-92 Entry Level requirement)
"Identifiers of objects within a database ... are assigned the default collation of the database."
-- Microsoft Corporation
Prefixes and Suffixes
It's very common to see multipart names. The parts are distinguishable by underscore separators or by capitalization changes. Examples:
EMPLOYEES_TAB /* underscore_separator */ TblEmployees /* CapitalizationChange */ T_Employees /* Both */
That's what prefixes or suffixes look like. But what are they for?
There are two prefix/suffix conventions: type indicators and hierarchy indicators.
You probably recognized that in all three of the preceding examples the suffix or prefix meant: This is a table. There is a common rule that a type indicator is either a prefix or a suffix and will always be a contraction of the name of the object type. Beyond that, there is no agreement, but these prefixes or suffixes are shown in the vendors' documentation:
- FUNCTION NAMES: fn_ (Microsoft)
- PROCEDURE NAMES: sp_ (Microsoft, but this isn't recommended)
- INDEX NAMES: _ind (Microsoft), _ix or _idx (Oracle), _bm_ix (Oracle bitmap)
- SEQUENCES: _SEQ (IBM), _seq (Oracle)
- TABLE NAMES: Tbl (Access), _t (Oracle)
- TRIGGER NAMES: trig (Microsoft)
- TYPE NAMES: _t (IBM), _typ (Oracle)
- COLUMN NAMES: _yn (to hint that the data type is Boolean)
Microsoft C programmers know that Hungarian Notation requires type indicators, and VBA programmers use a Hungarian-style convention (Leszynski/Reddick), so tblEmployees is a normal name. SQL programmers have less need for Hungarian Notation because they can get the type information from the metadata tables. Data in relational databases -including metadata - should be atomic and shouldn't be redundant.
The other reason for type indicators is namespace sharing. A legacy example: In Oracle5, tables and indexes were in the same namespace, which meant that you couldn't have both a table named Employees and an index named Employees. The simple solution was to add a suffix to all index names. That particular problem has gone away, but Oracle still has namespaces that cover multiple object types (e.g., sequence names can't equal view names).
"This is the year 2000; the '60s are over! Things like "str_firstname" or "tblPayroll" are redundant."
-- Joe Celko
If object X is part of object Y, then object X's name can contain an abbreviation of object Y's name. For example, the columns in table Employees could be named emp_id and emp_name, or the tables in schema George could be named GeoSales and GeoMedia.
In general, when two objects that are different share a namespace, a hierarchy indicator will help distinguish. It might even help when two objects are in different namespaces but might be found on the same path, as in:
- Two columns are named X and their tables are joined.
- Two schemas are named X and their databases are merged.
In the DBMSs we're examining, a hierarchy indicator is redundant because (a) the information they convey is available in the metadata tables, (b) there is no chance of a namespace conflict, and (c) one could use qualifiers instead. I found no examples of hierarchy indicators in the vendor manuals. I think they're becoming rarer.
"Use the same names to describe the same things across tables. For example, the department number columns of the sample employees and departments tables are both named deptno."
-- Oracle Corporation
Should Table Names be Plural?
Which is correct: Employee or Employees?
Koch and Loney represent the tables-are-singular argument best. They give examples like address book and phone book and car club and restaurant list — notice that these are all singular. Very fairly, they give some counter-examples too. But they find that the singular is more common in typical English. Besides, we're naming a set, and there's only one set.
I have also seen — in Usenet discussions — a suggestion that ISO 11179 says names should be singular. I think, however, that this is a misinterpretation of the standard.
And now the counterattack.
First, when we label a container, we label its contents. So although bean jar is a good English term, the name we paste on the jar says Beans. Second, the use of a singular can give the (presumably false) impression that the set contains only one row.
Looking at vendor manuals, I find that IBM and Microsoft appear to follow no fixed rule, but Oracle examples of table names are consistently plural. The tiebreaker is the ISO document 9075-11, which lists the views in the metadata schema INFORMATION_SCHEMA. These views mostly have plural names.
Naming Conventions for Other Objects
I pass on a pastiche of advice from my sources.
(Constraints) Name every constraint yourself. Otherwise you'll see funny-looking system-generated names in error messages.
Frequently, a constraint name ends with _PK or _FK or _UK to show that it's for a Primary Key, Foreign Key, or Unique Key. It's particularly important to name key constraints because the DBMS might make an index whose name will be the same as the constraint name. The sources that I consider authoritative do not use _PK, _FK, or _UK suffixes on columns, however. If the foreign-key and primary-key column names differ, then NATURAL JOIN expressions won't work and REFERENCES clauses are troublesome.
(Procedures) Use a verb plus object form, e.g., UPDATE_EMPLOYEES. Be wary of the prefix SP_.
(Schemas) Make schema names short and avoid special characters. Don't start with SYS or end with _SCHEMA.
(Sequences) If the sequence is for a particular column, name it <column name> plus _seq.
(Views, triggers, indexes) These objects should begin with the name of the table they depend on. If abbreviation is necessary, use the first two or three letters of the table name or the same abbreviation you'll use for aliases. These are not hierarchy indicators because the objects in question are not "part of" tables.
We all live in fear that whatever name we choose will become a reserved word in the next release of the DBMS, or already is a reserved word in the DBMS we're about to migrate to. Here are some ways to reduce the risk.
- End names with _ (underscore) characters. To date, no reserved words end with _ and ISO sort of promises that none will ever be introduced. There have been troubles with _ use in the past though, so this idea has never caught on.
- Put names inside "" delimiters. This is an easy out - so easy that some programs will automatically convert regular identifiers to delimited identifiers in order to aid portability. As we've seen, such a practice aids portability only among Microsoft products.
- Check reserved-word lists. This is easy to do online with an ODBC call, but far from foolproof because every vendor has a different list. That's why I'm giving you my handy amalgamated list of reserved words in SQL:2003 (the upcoming version of the SQL Standard), IBM 7.2, ODBC 3.5, Oracle9i, Microsoft 2000 (current), and Microsoft (future). If reading 607 random words sounds tedious, please skip ahead to the final sentences of this article.
Amalgamated List of Reserved Words
ABS ABSOLUTE ACCESS ACQUIRE ACTION ADA ADD ADMIN AFTER AGGREGATE ALIAS
ALL ALLOCATE ALLOW ALTER AND ANY ARE ARRAY AS ASC ASENSITIVE ASSERTION
ASUTIME ASYMMETRIC AT ATOMIC AUDIT AUTHORIZATION AUX AUXILIARY AVG
BACKUP BEFORE BEGIN BETWEEN BIGINT BINARY BIT BIT_LENGTH BLOB BOOLEAN
BOTH BREADTH BREAK BROWSE BUFFERPOOL BULK BY
CALL CALLED CAPTURE CARDINALITY CASCADE CASCADED CASE CAST CATALOG
CCSID CEIL CEILING CHAR CHAR_LENGTH CHARACTER CHARACTER_LENGTH CHECK
CHECKPOINT CLASS CLOB CLOSE CLUSTER CLUSTERED COALESCE COLLATE
COLLATION COLLECT COLLECTION COLLID COLUMN COMMENT COMMIT COMPLETION
COMPRESS COMPUTE CONCAT CONDITION CONNECT CONNECTION CONSTRAINT
CONSTRAINTS CONSTRUCTOR CONTAINS CONTAINSTABLE CONTINUE CONVERT CORR
CORRESPONDING COUNT COUNT_BIG COVAR_POP COVAR_SAMP CREATE CROSS CUBE
CUME_DIST CURRENT CURRENT_COLLATION CURRENT_DATE
CURRENT_DEFAULT_TRANSFORM_GROUP CURRENT_LC_PATH CURRENT_PATH
CURRENT_ROLE CURRENT_SERVER CURRENT_TIME CURRENT_TIMESTAMP
CURRENT_TIMEZONE CURRENT_TRANSFORM_GROUP_FOR_TYPE CURRENT_USER CURSOR
DATA DATABASE DATALINK DATE DAY DAYS DB2GENERAL DB2SQL DBA DBCC DBINFO DBSPACE
DEALLOCATE DEC DECIMAL DECLARE DEFAULT DEFERRABLE DEFERRED DELETE
DENSE_RANK DENY DEPTH DEREF DESC DESCRIBE DESCRIPTOR DESTROY
DESTRUCTOR DETERMINISTIC DIAGNOSTICS DICTIONARY DISALLOW DISCONNECT
DISK DISTINCT DISTRIBUTED DLNEWCOPY DLPREVIOUSCOPY DLURLCOMPLETE
DLURLCOMPLETEONLY DLURLCOMPLETEWRITE DLURLPATH DLURLPATHONLY
DLURLPATHWRITE DLURLSCHEME DLURLSERVER DLVALUE DO DOMAIN DOUBLE DROP
DSSIZE DUMMY DUMP DYNAMIC
EACH EDITPROC ELEMENT ELSE ELSEIF END END-EXEC EQUALS ERASE ERRLVL
ESCAPE EVERY EXCEPT EXCEPTION EXCLUSIVE EXEC EXECUTE EXISTS EXIT EXP
EXPLAIN EXTERNAL EXTRACT
FALSE FENCED FETCH FIELDPROC FILE FILLFACTOR FILTER FINAL FIRST FLOAT
FLOOR FOR FOREIGN FORTRAN FOUND FREE FREETEXT FREETEXTTABLE FROM FULL
GENERAL GENERATED GET GLOBAL GO GOTO GRANT GRAPHIC GROUP GROUPING
HANDLER HAVING HOLD HOLDLOCK HOST HOUR HOURS
IDENTIFIED IDENTITY IDENTITY_INSERT IDENTITYCOL IF
IGNORE IMMEDIATE IMPORT IN INCLUDE INCREMENT INDEX INDICATOR INITIAL
INITIALIZE INITIALLY INNER INOUT INPUT INSENSITIVE INSERT INT INTEGER
INTEGRITY INTERSECT INTERSECTION INTERVAL INTO IS ISOBID ISOLATION
JAR JAVA JOIN
LABEL LANGUAGE LARGE LAST LATERAL LC_CTYPE LEADING LEAVE LEFT LESS
LEVEL LIKE LIMIT LINENO LINKTYPE LN LOAD LOCAL LOCALE LOCALTIME
LOCALTIMESTAMP LOCATOR LOCATORS LOCK LOCKSIZE LONG LOOP LOWER
MAP MATCH MAX MAXEXTENTS MEMBER MERGE METHOD MICROSECOND MICROSECONDS
MIN MINUS MINUTE MINUTES MOD MODE MODIFIES MODIFY MODULE MONTH MONTHS
NAME NAMED NAMES NATIONAL NATURAL NCHAR NCLOB NEW NEXT NHEADER NO
NOAUDIT NOCHECK NOCOMPRESS NODENAME NODENUMBER NONCLUSTERED NONE
NORMALIZE NOT NOWAIT NULL NULLIF NULLS NUMBER NUMERIC NUMPARTS
OBID OBJECT OCTET_LENGTH OF OFF OFFLINE OFFSETS OLD ON ONLINE ONLY
OPEN OPENDATASOURCE OPENQUERY OPENROWSET OPENXML OPERATION
OPTIMIZATION OPTIMIZE OPTION OR ORDER ORDINARILITY OUT OUTER OUTPUT
OVER OVERLAPS OVERLAY
PACKAGE PAD PAGE PAGES PARAMETER PARAMETERS PART PARTIAL PARTITION
PASCAL PATH PCTFREE PCTINDEX PERCENT PERCENT_RANK PERCENTILE_CONT
PERCENTILE_DISC PIECESIZE PLAN POSITION POSTFIX POWER PRECISION PREFIX
PREORDER PREPARE PRESERVE PRIMARY PRINT PRIOR PRIQTY PRIVATE
PRIVILEGES PROC PROCEDURE PROGRAM PSID PUBLIC
RAISERROR RANGE RANK RAW READ READS READTEXT REAL RECONFIGURE RECOVERY
RECURSIVE REF REFERENCES REFERENCING REGR_AVGX REGR_AVGY REGR_COUNT
REGR_INTERCEPT REGR_R2 REGR_SLOPE REGR_SXX REGR_SXY REGR_SYY RELATIVE
RELEASE RENAME REPEAT REPLICATION RESET RESIGNAL RESOURCE RESTORE
RESTRICT RESULT RETURN RETURNS REVOKE RIGHT ROLE ROLLBACK ROLLUP
ROUTINE ROW ROW_NUMBER ROWCOUNT ROWGUIDCOL ROWID ROWNUM ROWS RRN RULE
SAVE SAVEPOINT SCHEDULE SCHEMA SCOPE SCRATCHPAD SCROLL SEARCH SECOND
SECONDS SECQTY SECTION SECURITY SELECT SENSITIVE SEQUENCE SESSION
SESSION_USER SET SETS SETUSER SHARE SHUTDOWN SIGNAL SIMILAR SIMPLE
SIZE SMALLINT SOME SOURCE SPACE SPECIFIC SPECIFICTYPE SQL SQLCA
SQLCODE SQLERROR SQLEXCEPTION SQLSTATE SQLWARNING SQRT STANDARD START
STATE STATEMENT STATIC STATISTICS STAY STDDEV_POP STDDEV_SAMP STOGROUP
STORES STORPOOL STRUCTURE STYLESUBPAGES SUBSTRING SUCCESSFUL SUM
SYMMETRIC SYNONYM SYSDATE SYSTEM SYSTEM_USER
TABLE TABLESPACE TEMPORARY TERMINATE TEXTSIZE THAN THEN TIME TIMESTAMP
TIMEZONE_HOUR TIMEZONE_MINUTE TO TOP TRAILING TRAN TRANSACTION
TRANSLATE TRANSLATION TREAT TRIGGER TRIM TRUE TRUNCATE TSEQUAL TYPE
UID UNDER UNDO UNION UNIQUE UNKNOWN UNNEST UNTIL UPDATE UPDATETEXT
UPPER USAGE USE USER USING
VALIDATE VALIDPROC VALUE VALUES VAR_POP VAR_SAMP VARCHAR VARCHAR2
VARIABLE VARIANT VARYING VCAT VIEW VOLUMES
WAITFOR WHEN WHENEVER WHERE WHILE WIDTH_BUCKET WINDOW WITH WITHIN
WITHOUT WLM WORK WRITE WRITETEXT
Identifiers are for identifying! Any convention that tries to do something more than identify, or might fail to identify, needs explanation.
I've tried to find the best representations for different points of view, including programmers, abbreviation fans, and DBAs using IBM, Microsoft, or Oracle.
Bryzek, Michael. Constraint Naming Standards.
Celko, Joe. Ten Things I Hate About You.
Jones, Steve. Standards Part 1 - Abbreviated Programming.
Karbowski, J. Joseph. Synchronize Your Naming Conventions.
Koch, G. and K. Loney. Oracle8i: The Complete Reference (3rd edition). Osborne McGraw Hill, 2000.
Kondreddi, Narayana Vyas. Database object naming conventions.
Mullins, Craig. "What's in a name?"
Sheppard, Simon. Oracle Naming Conventions.
Peter Gulutzan co-wrote a 1078-page book on the current standard: SQL-99 Complete, Really (CMP Books 1999). His understanding of portability grew while he benchmarked IBM, Informix, Ingres, InterBase, Microsoft, MySQL, Oracle, and Sybase DBMSs for a book which Addison-Wesley published approximately yesterday (September 2002): SQL Performance Tuning.
Contributors : Peter Gulutzan
Last modified 2005-05-03 02:08 PM