Playing With Strings

Take the following code:

	CString str1("Startt"),
	str1.SetAt(str1.GetLength()-1, '\0');

	str1 += "End";
	str2 += "End";

What would you see when watching the resulting strings? Probably not what you expect:

This is a simplified version of a much dirtier, very real bug I dealt with recently. Several string and debugger features joined forces to cause this behaviour.

First – the debugger: it apparently watches CStrings as c-strings – displaying their essentially-LPTSTR member m_pszData.  Thus, any null embedded in the string (well, the first null, really) is treated as a terminating null – anything past it would not be displayed. When we force a watch on the full CString buffer, a fuller picture is revealed:

So the ‘End’ suffix was added to str1 after all – but why the difference between str1 and str2?  How can initializing a string with an embedded null be any different than setting that null in the next line? The next clue is obtained by observing GetLength() for both strings. Note that GetLength returns the length of the allocated string buffer, not the strlen of the underlying c-string. (It is utterly unimaginable that such a basic behaviour goes undocumented.)

So, str1 and str2 are indeed somehow different before adding the ‘End’ suffix. In fact, they are fundamentally different even before manually setting the null in str1:

	CString str1("Startt"),

	int len1 = str1.GetLength(),	// gives 6
		len2 = str2.GetLength();	// gives 5 !

The issue now has nowhere left to hide. Stepping with the debugger into the CString ctors reveals the root cause: the constructor used for both CStrings accepts a char*-type as argument (in retrospect – how could it be otherwise?). So, just like in the debugger itself, the first embedded null is treated as a terminating null – anything past it would never make it into the CString. Try the following and see for yourself:

	CString str3("First\0Second"); // str3 now contains only "First" !

Once this root cause was understood, the bug was a half-line fix.

Thanks and kudos go to Alexander M. of wordpress support, who found and fixed within 1 hour (!) a wordpress bug that I reported, to make this post possible: until yesterday, wordpress would ignore explicit nulls (backslash + zero) between quotes, in a sourcecode section.

This entry was posted in VC++, Visual Studio. Bookmark the permalink.

6 Responses to Playing With Strings

  1. rmn says:

    Nice post :)

    I have encountered this myself. To avoid problems with strings containig nulls is specify their exact lengths in the constructor, like this:

    #include <string>
    #include <iostream>
    int main () {
    	std::cout << std::string("hello\0world", 11) << std::endl;
    • rmn says:

      Had a typo up there, the text should read: “..To avoid problems with strings containig nulls you can specify their exact lengths in the constructor, like this..”

      I’m aware of the fact that you’re dealing with an ATL/MFC string, while im using an std string, but its the same problem.

      Another important note is this – This behaviour is perfectly normal: When constructing a string from a c-string, the only way to know the length of the given string is by expecting null termination. For any other behaviour, you’d have to define the length yourself.

      • rmn says:

        Btw, according to the standard, std::string doesn’t even have to be null terminated. So basically MSVS is very wrong here by reading the internal data of the string and displaying that as the value, exepecting null termination.. Not only does it not show correctly a std::string(“hi”,3), but it is also possible it will read uninitialized values – if there’s no null termination at the end of the real string.

        Sorry for spamming a third comment, I wouldn’t mind if you merged all three.

        • rmn says:

          that was

        • Ofek says:

          Thanks rmn. The post already mentions that this CString behaviour is completely normal. I’m actually less sure about the debugger – if a string buffer size info is available, i think the debugger should use it. It may be fixable via autoexp – i’ll look into it some day.

          regarding the standard: CString actually predates std::string, so most probably backward compatibility was favored over standard compliance. If anyone prefers the standard behaviour – and, unlike me, isn’t bound to heavy legacy code – he should definitely use std::string.

          There’s actually a newsgroup discussion somewhere (I’d have to dig it up) where an MS guy explicitly states that MFC’s string and container classes (CArray, CMap etc) were made obsolete by stl, and are all but depracated. They’re kept around just not to break the tons of legacy code using it.

  2. hanoh says:

    FYI : The microsoft documentation for the debugger clearly states that the watch screen does not update every row (actually, in VC6 they introduced the color red for marking which lines were updated). This means that you should log your changes and check the logs, not use the debugger and guess which variables were updated. Other than that – this is a very good example of debug bug, and debug process. I had a lot of fun reading it.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s