Wednesday, 9 September 2009

Differences between VARCHAR and NVARCHAR

VARCHAR (variable-length character string) is used to store a string of characters that can be as large as the page size for the database table containing the column (8,196 bytes with no row having more than 8,060 characters). VARCHAR columns are limited to 8,000 bytes. VARCHAR is stored as 8-bit data (1 byte per character). NVARCHAR is similar to VARCHAR but supports two-byte characters (UniCode). Therefore NVARCHAR columns can be used to store data that contains both English and non-English characters. NVARCHAR strings are stored in the database as UTF-16 (two bytes per character) and are converted to the codepage being used by the database connection on output (e.g. UTF-8). NVARCHAR uses two bytes for each character and can therefore only hold a maximum of 4,000 characters. An advantage of VARCHAR over NVARCHAR is that it takes up less space per row. The disadvantage is that you canot store non-English characters.

A note on SQL Server 2005

SQL Server 2005 introduced the MAX identifier which allows columns to go beyond the 8000 character limit and to store up to 2^31-1 (around 2 GB). NB: you still cannnot specify a size greater than 8000 characters (e.g. VARCHAR(12000)) but you need to use MAX (e.g. VARCHAR(MAX) or NVARCHAR(MAX)).