What Are VB Strings?

by Steven Roman

www.romanpress.com

Copyright 2001 © The Roman Press, Inc. All Rights Reserved.

Introduction

This article is devoted to describing the concept of a string as it relates to Visual Basic. It is excerpted and condensed from my book Win32API Programming with Visual Basic (available August 1999). We assume the reader is familiar with the contents of the articles Pointers and Data Types.

Strings

The subject of strings can be quite confusing, but this confusion tends to disappear with some careful attention to detail (as is usually the case). The problem stems from the tendency of many programmers to think of a string as an array of characters.

Indeed, the Visual Basic documentation tends to support this erroneous viewpoint at times. According to the VB documentation, a string is

A data type consisting of a sequence of contiguous characters that represent the characters themselves rather than their numeric values.

It seems to me that Microsoft is trying to say that the underlying set for the VB String data type is the set of finite-length sequences of characters. For Visual Basic, all characters are represented by two-byte Unicode integers. For instance, the ASCII representation for the character h is &H68 so the Unicode representation is &H0068, appearing in memory as 68 00.

Thus, the "string" help is represented as

00 68 00 65 00 6C 00 70

Note, however, that because words are written with their bytes reversed in memory, the "string" help appears in memory as

68 00 65 00 6C 00 70 00

This is fine, but it is definitely not how we should think of strings in VB programming. To avoid any possibility of ambiguity, we will refer to this type of object as a Unicode character array which is, after all, precisely what it is! This also helps distinguish it from an ANSI character array, that is, an array of characters represented using single-byte ANSI character codes.

Here is the key to understanding VB strings. When we write the VB code

Dim str As String
str =
"help"

we are not defining a Unicode character array per se. We are defining a member of a data type called BSTR, which is short for Basic String. A BSTR is, in fact, a pointer to a null-terminated Unicode character array that is preceeded by a 4-byte length field. We had better elaborate on this.

The BSTR

Actually, the VB string data type defined by

Dim str As String

underwent a radical change between versions 3 and 4 of Visual Basic, due in part to an effort to make the type more compatible with the Win32 operating system.

Just for comparison (and to show that we are more fortunate now), Figure 1 shows the format for the VB string data type under Visual Basic 3, called an HLSTR (high-level string).

 

Figure 1 - The High-Level String Format (HLSTR) Used by VB3

The rather complex HLSTR format starts with a pointer to a string descriptor, which contains the 2-byte length of the string along with another pointer to the character array, which is in ANSI format (one byte per character).

With respect to the Win32 API, this string format is a nightmare. Beginning with Visual Basic 4, the VB string data type changed. The new data type, called a BSTR, is shown in Figure 2.

Figure 2 - A BSTR

This data type is actually defined in the OLE 2.0 specifications, that is, it is part of Microsoft's ActiveX specification.

There are several important things to note about the BSTR data type:

We should emphasize that an embedded null Unicode character is a 16-bit 0, not an 8-bit 0. Watch out for this when testing for null characters in Unicode arrays.

Note that it is common practice to speak of "the BSTR help" or to say that a BSTR may contain embedded null characters when what is really being referred to is the character array pointed to by the BSTR.

Because a BSTR may contain embedded null characters, the terminating null is not of much use, at least as far as VB is concerned. However, its presence is extremely important for Win32. The reason is that the Unicode version of a Win32 string (denoted by LPWSTR) is defined as a pointer to a null-terminated Unicode character array which is not allowed to contain embedded null characters.

This makes it clear why BSTR's are null terminated. A BSTR with no embedded nulls is also an LPWSTR. We will not discuss VC++ strings in this article. (For more on VC++ strings, please see my book Win32 API Programming with Visual Basic.)

Let us emphasize that code such as

Dim str As String
str =
"help"

means that str is the name of a BSTR, not a Unicode character array. In other words, str is the name of the variable that holds the address xxxx, as shown in Figure 2.

Here is a brief experiment we can do to test the fact that a VB string is a pointer to a character array and not a character array. Consider the following code, which defines a structure whose members are strings:

Private Type utTest
   astring
As String
   bstring As String
End
Type

Dim
uTest As utTest
Dim s as String

s
= "testing"
uTest.astring
= "testing"
uTest.bstring
= "testing"

Debug.
Print Len(s)
Debug.Print Len(uTest)

The output from this code is

7
8

In the case of the string variable s, the Len function reports the length of the character array, in this case there are 7 characters in the character array 'testing'. However, in the case of the structure variable uTest, the Len function actually reports the length of the structure (in bytes). The return value of 8 clearly indicates that each of the two BSTRs has length 4. This is because a BSTR is a pointer!

VarPtr and StrPtr

The functions VarPtr and StrPtr are not documented by Microsoft, but they can be very useful in understanding the structure of BSTRs.

If var is a variable, then

VarPtr(var)

is the address of that variable, returned as a long. If str is a BSTR variable then

StrPtr(str)

is contents of the BSTR! This contents is the address of the Unicode character array pointed to by the BSTR.

Let us verify these statements. Figure 3 shows a BSTR

Figure 3 - a BSTR

The code for this figure is simply

Dim str As String
str =
"help"

Note that the variable str is located at address aaaa and the character array begins at address xxxx, which is the contents of the pointer variable str.

To see that

VarPtr = aaaa
StrPtr
= xxxx

just run the following code:

Dim lng As Long, i As Integer, s As String
Dim
b(1 To 10) As Byte
Dim
sp As Long, vp As Long
Dim
ct As Long

s
= "help"

sp
= StrPtr(s)
Debug.Print "StrPtr:" & sp

CopyMemory ct, ByVal sp
- 4, 4
Debug.
Print "Length field: " & ct

vp
= VarPtr(s)
Debug.Print "VarPtr:" & vp

' Verify that sp = xxxx and vp = aaaa
' by moving the long pointed to by vp (which is xxxx)
' to the variable lng and then comparing it to sp
CopyMemory lng, ByVal vp, 4
Debug.
Print lng = sp

' To see that sp contains address of char array,
' copy from that address to a byte array and print
' the byte array. We should get "help".
CopyMemory b(1), ByVal sp, 10
For i = 1 To 10
   Debug.
Print b(i);
Next

The output is

StrPtr:1836612
Length field: 8
VarPtr:1243988
True
 104 0 101 0 108 0 112 0 0 0

This shows that the character array in a BSTR is indeed in Unicode format and that the length field does indeed hold the byte count and not the character count.

Finally, we note that you can also simulate StrPtr using VarPtr as follows:

' Simulate StrPtr
Dim lng As Long
CopyMemory lng, ByVal VarPtr(s), 4 ' lng = StrPtr(s)

This code copies the contents of the BSTR pointer, which is the value of StrPtr to a long variable lng.