Welcome to MSDN Blogs Sign in | Join | Help

Yeah, yeah, I know I haven't posted on this blog since revamping it and removing some of the outdated content, but... but... ok, I don't have a real good excuse.  Not one you would give a damn about.  However, I started posting (like crazy) on my relatively new MSN Spaces blog: http://spaces.msn.com/mfp2/.  I have been posting on topics that range from Microsoft stuff to photography and entertainment.

 Subscribe to MP's Space RSS feed today!  ;-D

Stay tuned.

Heh, I was reading a fellow Exchange Blogger's blog, and noticed he said exactly what I've been thinking with regards to blogging.  Believe me people, I haven't forgotten about it.  To borrow his text, though, "I am sorely overdue in posting, and I apologize about that.  I will hide behind the sorry excuse that the vast majority of what I do nowadays is work on an upcoming release of Exchange that we aren't talking about yet, so I feel like I can't blog all that much about my current work."  Call it an excuse, or call it what you want.  I'm hoping as time goes on I will have more to talk about regarding the things I’ve been working on within the next version of Exchange.  However, for now I'm completely consumed with many aspects of the product cycle and considering there are only a few Supportability PMs for all of Exchange, we all have our hands very full right now.  I will try to post more frequently going forward (frequently is a relative term of course… and should mean I blog more than every 6 months.)  :)

Although I haven't posted anything here in a long while, I have posted a few times.  See this and this.  I have also proofread a few of EvanD's postings lately per his request, so I hope that counts towards giving back to the Exchange community. :) 

A cool little tool was recently posted called “Performance Monitor Wizard” (a.k.a. PerfWiz).  It can be downloaded here.

<Overview>

The Performance Monitor Wizard simplifies the process of gathering performance monitor logs. It configures the correct counters to collect, sample intervals and log file sizes. This wizard can create logs for troubleshooting operating system or Exchange server performance issues.

</Overview>

Additional thoughts... If you have used System Monitor (i.e., Performance Monitor) to collect data, or worse, have had to walk someone over the phone through the steps to do so, or have had to use Q811237, than you know that Perfmon is not the most user-friendly tool.  The current version of Perfmon leaves a lot of room for error when trying to collect important data.  I can’t begin to tell you how many logs I have looked at over the past couple of years that were missing key objects/counters to solve a problem, or the interval was wrong, or the Perfmon log itself was gigantic and hard to open. 

With this tool, you just walk through a series of wizard-based dialog boxes, answer some questions, and then PerfWiz takes care of configuring the Perfmon logs so you never even have to open Perfmon. 

Some of you may have already used this tool while troubleshooting an issue with a PSS Support Engineer.  It actually has been around for a little while, but up until this point was only used by and available through PSS.  Now everyone can have FUN collecting performance counter data!  Hehe!

Keep in mind; this tool is only for system performance data collection; it does not analyze the data; that is still up to you (for now). :)  For help with analyzing the performance data within the logs make sure to follow the appropriate Troubleshooting whitepapers available on the Exchange Performance Tuning webpage: http://www.microsoft.com/exchange/techinfo/administration/finetune.asp 

Oh yeah, I gotta say this about it too... PerfWiz is provided "AS IS" with no warranties. ;)

1 Comments
Filed Under:

When your end-users are complaining of “slow mail” or the dreaded Outlook popup box that says, “Requesting Data from Server… blah blah blah” (It doesn’t really say “Requesting Data from Server…blah blah blah”, but don’t you think it would be cool if it did?? Hehe!), the average Admin usually jumps to blaming the mail server.  Anyways, I’ve come to realize that most folks don’t generally suspect a problem with the Global Catalog server(s) in their environment when the mail server is responding sloooooowly. ;)  So I figured I would jot down some of this since I have seen a few issues within the past week that fell into this category…  
 
First of all, I have to mention the recommended ratio of Exchange 200x CPUs and GC CPUs… there should generally be a 4:1 ratio of Exchange processors to global catalog server processors, assuming the processors are similar models and speeds.  However, depending on your situation, higher global catalog server usage, a large Active Directory, or large distribution lists can necessitate more global catalog servers.

Ok, so there are a ton reasons why your mail server might appear to be performing sluggish to you and the users.  So allow me to add this troubleshooting step to the mix when troubleshooting Exchange Server performance issues (it is also mentioned in the troubleshooting Exchange 200x perf whitepapers).  Besides the slow responses to client requests, you might also notice the mail queues getting backed up over the course of a few hours or less depending on how much mail your organization sends/receives.  You can check the queues in the ESM (Exchange System Manager) under the Queues node.  There is also another important piece of data you can check that is related to the SMTP server using Performance Monitor (perfmon.msc).  Under the “SMTP Server” performance object:

SMTP Server\Categorizer Queue Length: Indicates the number of messages in the SMTP queue for DS attribute searches.  Indicates how well SMTP is processing LDAP lookups against global catalog servers.  This should be at or around zero unless the server is expanding distribution lists. When expanding distribution lists, this counter can occasionally go up higher. This is an excellent counter to tell you how healthy your global catalogs are. If you have slow global catalogs, you will see this counter go up.  Normally, the maximum value should be less than 10.  Just the other day I was helping with a case where this counter was in the 4-5K range on a consistent basis!  Yikes!

 
Another very useful piece of data that can be extracted from a valid perfmon log is the DSAccess LDAP times.  The counters I’m talking about here are:

MSExchangeDSAccess Process\LDAP Read Time (for all processes): Shows the time (in ms) that an LDAP read request takes to be fulfilled.
MSExchangeDSAccess Process\LDAP Search Time (for all processes): Shows the time (in ms) that an LDAP search request takes to be fulfilled.

For both of those counters, the average value should be below 50 ms, and spikes should not be higher than 100 ms.  Again, one of the cases I saw recently showed these way off the deep end! 

Ok, I got you this far down the path, all of the remaining details (and fun) on improving your GC and AD performance can be found in the Troubleshooting Exchange 2003 Performance whitepaper, starting on page 43.  

17 Comments
Filed Under:

With the introduction of Outlook 2002 came the popup dialog that is displayed when Outlook 2002/2003 has to wait longer than 5 seconds for a response from an Exchange server or global catalog server.  You can tell from the dialog box what server might be causing the delay.  From the Understanding and Troubleshooting Directory Access whitepaper: “Tip: If the server name shown in the Requesting data … dialog box is in the FQDN format, Outlook is waiting for a response from the directory service. If you see a short server name, Outlook is waiting for either the mailbox server or public folder server to respond.”

A little info... When using Outlook in MAPI mode, user actions in Outlook translate to remote procedure calls (RPCs) between the clients and the server.  If the user is running in online mode, these RPC calls occur synchronously.  Any delay by the server in fulfilling these synchronous requests directly affects user experience and the responsiveness of Outlook. Conversely, running in cached mode results in the majority of these requests being handled asynchronously.  Asynchronous processing means that the speed at which most user actions are initiated should not translate into the responsiveness or overall experience of Outlook.

So when an end-user complains of seeing the dreaded pop-up in Outlook, there are some common things you, the Exchange admin, can do to try and find where exactly the source of the latency is coming from.  Once you have that information in hand, you should hopefully have a starting place for your troubleshooting efforts.  (These are not listed in any particular order)  

  • Take note what is referenced in the pop-up message.  Using the tip mention above, is it in the NetBIOS or FQDN format?  What server is listed?  This information could certainly be useful when troubleshooting an issue.
  • Use Performance Monitor.  During the time of the pop-ups gather performance data from the server.  Look closely at the “MSExchange IS” object, more precisely, the RPC counters within that object.  Generally, spikes in RPC requests that do not increase RPC operations/sec indicate that there are bottlenecks preventing the store from fulfilling the requests in a timely manner. It is relatively simple to identify where the bottlenecks are occurring with regards to RPC requests and RPC operations/sec. If the client experiences delays, but the RPC requests are zero and the RPC operations/sec are low, the performance problem is happening before Exchange processes the requests (that is, before the Microsoft Exchange Information Store service actually gets the incoming requests). All other combinations point to a problem either while Exchange processes the requests or after Exchange processes those requests.
  • Use Network Monitor.  By capturing concurrent traces from a client seeing the pop-up and that user’s mailbox server, you should be able to see where the source of the latency resides (i.e., determine if it is the Exchange server, or the network, or the user’s PC, etc.).  For example, in the client-side trace you should see the request to the server.  In the server-side trace you should see the request arriving from the client and shortly thereafter see the server response to the client.  Back again to the client trace, you should see the server’s response arrive.  When taking this approach, it is important to consider the amount of time that is spent at each point for each request/response.  Obviously, if you see that the server responses are taking an abnormal amount of time to be generated in the server trace, you have some kind of bottleneck on the server.

For more information on troubleshooting specific bottlenecks, check out the performance whitepapers for either Exchange 2000 or Exchange 2003

13 Comments
Filed Under:

After posting my last couple blog entries on 9582's and VMF, I’ve received a bunch of questions/feedback.  A few in particular stood out to me.  I figured I would share them and my responses with everyone because this is defnitely not the first time I’ve been asked about these very things…

 

Question #1:

 

“My servers typically use around 2.5 GB of the 4GB RAM (Store.exe uses approx 1.3GB) and my server has been registering 1 16MB Free VM block with no 9582 events!?!??. The server seems stable (has been running this way for about a week or two) - on EX2K the server would have BSOD'd by now.”

 

My response:

 

“I want to address your comments about the "registering 1 16MB Free VM block" and "on EX2K the server would have BSOD'd by now". I assume you are checking the Perfmon counter "MSExchangeIS\VM Total 16MB Free Blocks" to get that information. Keep in mind, looking at that one counter doesn't tell you the whole story. You have to also look at the other VM counters. For example, "MSExchangeIS\VM Largest Block Size" - of all the blocks available in the virtual address space, this counter reports the size of the largest block. So in a situation where the “Total number of 16MB blocks” reports 1 and the "VM Largest Block Size" counter reports a large value such as 400MB, the server can be considered very healthy. If on the other hand the 16Mb block count drops to 1 and the largest block size is 19MB, the server should be carefully observed. With only 19MB for the largest block, the server would be generating 9582 warning events. This is true for Exchange 2000 and 2003. The first part of my Troubleshooting Exchange 2000 Performance WebCast talks about this vary topic. http://support.microsoft.com/?id=816893

 

Question #2:

 

“I'm interested to know why you say removing RAM is not a recommended solution?  In fact, on your blog, you state that it is never a good solution.  Why exactly is that?  If Exchange tunes it's VM usage in part based on the amount of physical memory, why is it a "bad" solution to remove memory so that Exchange can correctly tune itself?  You and I both know that there is no way that W2K Standard running Ex2K is going to be able to utilize more than 2gb of physical RAM.  Or do you think that's not true?  I'm not looking for an argument or anything, just some clarification because I have never heard anywhere else that removing RAM is not a solution. Obviously, the best solution would be to correctly spec out the server, and order Advanced Server if it is going to have more than 1gb or RAM, but I am interested in your views on this (or the views of others from the Support team as well).  Feel free to either post replies here, or simply e-mail me directly.”

 

My response:

 

“Hi ### - I think your point is valid, strictly from a troubleshooting perspective.  The same argument could be made for a lot different problems, even outside of memory or computer issues.  For example (and it may be a poor one at that!), a race car team is not going to reduce the engine's horsepower because the engine is starving for oxygen at high altitudes, instead they are going fine tune the engine, fuel intake, or whatever to make sure they maintain top output from the engine.

 

A few points of clarification from your post: Exchange 2000's store cache is hard coded at 858Mb, regardless of the amount of RAM installed.  2003 adjusts the cache size based on the RAM and OS settings.  I think you are referring to the Dynamic Buffer Allocation mechanism.

 

My point, and I'll keep it very brief, is that the Windows 2000 Standard Server OS can support up to 4GB of RAM.  Therefore it was advertised and sold as an OS to support up to that max (though it was also never designed to support the /3GB switch).  Well, then came Exchange 2000 which makes the recommendation that with 1GB RAM or more, you need the /3GB switch in the boot.ini.  So, YES, what you mentioned about the planning part is definitely correct!  We really try to stress that to customers, but like I said in my weblog, with Win2003 this shouldn't be as big a deal.  But for those who don't, tweaking/optimizing the memory usage should always be the way to go, versus MS asking customer's to remove memory from their system that was sold and designed to support the amount of RAM they purchased.  Its just a poor solution in more than one way.  Besides that big "marketability" aspect, we can't forget about the smaller one... when you take memory out of the system; you are restricting the overall scalability, because now the server has less to work with.  There are other reasons too, that are probably outside the scope of this forum.


I hope this makes sense and sheds some light on what I meant.  I'm going to update my blog with this info, so thanks for the question! ;)”

6 Comments
Filed Under:

[This is a continuation of my The Infamous 9582 Event post last week.]

 

Just as I blabbed about last week in my other post, Exchange 2000 turned up a new event ID to Exchange admins and support folks across the lands.  The 9582 events made their first appearance shortly after a lot of admins upgraded their servers to Exchange 2000 SP1 or higher.  As I was saying before, the problem of VMF (virtual memory fragmentation) has been around for a long time, but not until SP1 or higher did we actually come out and report it when the VMF hit a critical state.  For this blog entry I’ll focus on the improvements in Exchange 2003…

 

In E2K, a large portion of the Store.exe's memory is allocated for the ESE Buffer (a.k.a. JET Buffer).  This buffer is placed in the Store.exe VA (i.e., virtual address space).  This buffer acts as a software-based disk buffer to help relieve some of the pressure on the disk subsystem.  Remember, going to memory is much faster than going to disk!  Out of the box, E2K uses a hard-coded buffer size of 858Mb, regardless of the amount of RAM, memory configuration, or OS.  As many people now know, this buffer size can be adjusted through the msExchESEParamCacheSizeMax parameter in AD.  In most E2K 9582 cases, when trying to prolong the VMF, we decrease the size of the ESE cache.  By decreasing the size of the buffer, you leave additional free virtual memory for Store.exe to use during runtime.  Whenever decreasing the size of the buffer, you must consider the potential disk I/O increase.  As you take more of the database out of virtual memory, you run the risk of putting a lot more pressure on the disks where the database resides.

 

In Exchange 2003, the ESE buffer is allocated intelligently based on whether the /3GB switch is set in the boot.ini file.  If the /3GB has been set (per Q266096), then the ESE buffer is automatically tuned to 896Mb.  If the /3GB switch has not been set (indicating less then 1Gb of physical RAM), then the ESE buffer is tuned down to 576Mb.  This auto-tuning means that smaller servers will not run out of virtual memory because the Store.exe is using less of a buffer and leaving more for runtime operations.

 

Another very cool feature/improvement in E2K3 is called “ESE Back Off”.  As you probably have figured out by now, having enough free virtual memory is absolutely crucial when scaling up mailbox servers and keeping them healthy. When the 9582 error events start, you know the Store.exe has determined that the largest contiguous block of free virtual memory is less than 16Mb.  When these events are generated, plans to restart the Store.exe should be made and carried out ASAP.  Along with the enhanced ESE Buffer I mentioned above, the 'ESE Back Off' is the built-in emergency throttle.  If the “MSExchangeIS\VM Largest Block Size” (that is the exact perfmon counter) ever reaches the warning state of 32Mb, the store will request a one-time back-off of the ESE buffer. This will free up an additional 64Mb block of virtual memory; allowing the server to degrade gracefully until you can schedule a reboot during a quiet time. I think that's a pretty nice gesture on the server's part. ;)

 

There is one other enhancement I want to mention… the 9665 event.  This event is just a warning to the admin that during the Store.exe start-up, it recognized that the memory could be optimized a bit for performance and reliability.  The event looks like this:

 

       Event Type: Warning

       Event Source: MSExchangeIS 

       Event ID: 9665

       Description: The memory settings for this server are not optimal for Exchange.

 

The most obvious time this event would be generated is when rebooting a server with 1GB or more of RAM without the /3GB and USERVA=3030 options in the Boot.ini file.  This event will also appear if those two options are in the Boot.ini, but you have not modified the HeapDeCommitFreeBlockThreshold value in the registry.  If the warning still continues, see “How to Optimize Memory Usage in Exchange Server 2003”.

 

Well, I hope between this post and my last one have helped clear-up some of the confusion around 9582's, VMF, and Exchange 2003 enhancements (without putting you to sleep!).  Let me know if you have any questions in the feedback section. 

 

UPDATE: I forgot to mention this, but I talk about VMF in my webcast on Troubleshooting Exchange 2000 Performance.  Enjoy! ;)

 

UPDATE #2: Part 3 is here.

15 Comments
Filed Under:

[I apologize for the length… but I couldn’t come up with a shorter weblog entry on this topic. In fact, this log really is my short, short version.  It leaves out so much of what I wanted to say. ;) ]

9582 Event ID Errors.  Hey Exchange admins, does that ring a bell?! 9582 events have been a huge call generator for PSS.  I know- my group troubleshoots them all the time with Exchange 2000 support cases, not to mention I see the topic come up a good bit on the public newsgroups I regularly check.  Even though the 9582's are not something relatively new, I find there is still a huge misunderstanding around the topic.  I want to use this log entry to talk a little about the infamous 9582 events that appear on many of the Exchange 2000 servers out there (note: 2003 has a lot of built-in self tuning that I’ll talk about in one of my next weblogs). First of all, the 9582 events started showing up with Exchange 2000, but only the event IDs were added to the code in E2K SP1.  Virtual Memory Fragmentation (I’ll refer to it as "VMF") has been around long before Exchange. 

Ok- With today’s servers easily containing more than 1GB RAM, let’s talk about the confusion around the /3GB switch on an Exchange 2000 Server… one point of confusion is taken right from a KB article, “Exchange 2000 requires /3GB switch with more than 1-GB of physical RAM”. The reason why we state this is because when the Information Store process starts up it goes through some calculations.  Some of the variables within this calculation take into account how much physical memory, how many CPUs, etc., are installed on the system.  For the purpose of my point, the important one here is obviously the amount of RAM.  That is, “How much physical memory is in the box?” When the server has 1GB of RAM or more, the Exchange Store.exe assumes that this must be a pretty beefy box… "Because I (Store.exe) have a lot of physical memory, I want to take advantage of more virtual memory too." 

This leads me to a quick review of Virtual Address (VA) space in the NT OS’s.  Windows 2000/2003 implement a virtual memory system based on a flat, linear 32-bit address space; 32 bits of address space translates into 4 GB of virtual memory.  On most systems, Windows 2000/2003 allocates half of this address space, the lower half, to individual processes for their unique private storage, and it uses the other half, the upper half of the address space, for its own protected operating system memory utilization. The mappings of the lower half change to reflect the virtual address space of the currently executing process. But the mapping of the upper half always consists of the operating system's virtual memory.  This VA can be modified by using a few special switches in the Boot.ini.  The /3GB “switch” can be placed in the Boot.ini file (Note: Windows 2000 Standard does NOT support the /3GB).  This gives a 3-GB private address space to processes running specially marked executables with the “large address aware” flag set in the header of the executable, leaving 1 GB for the OS.  This option allows applications, such as database servers (e.g., Exchange and SQL Server) to keep larger portions of the database in the process address space.  Being able to keep more of the database in memory keeps some pressure off of the disk subsystem. 

Anyways, the above is why the 9582’s are generated frequently on Exchange 2000 Servers running on Windows 2000 Standard Servers with >=1GB of RAM.  Without the option to use the /3GB switch, the Store is restricted to just 2GB of VA.  (Remember, the /3GB switch can not be used on Windows 2000 Standard edition, but ALL versions of Windows 2003 support it! Whoot!!)  It's almost like a double-edged sword... Upon startup, the Store sees 1GB+ of RAM installed and it wants to take advantage of more virtual addresses by caching more of the database information, but with W2K Standard edition, we can't use the /3GB switch.  Therefore, when the Store caches more data in the VA, that leaves less free VA to map in and out of during runtime.  With less free VA, it takes less time for the VA to become fragmented.  So you're options are somewhat limited.  It's not an operating system limitation. There are a handful of tweaks, but the major one in this case is unavailable for a server running on Windows 2000 Standard. Unfortunately, there are a lot of customers running in that configuration. Keep in mind too that the 9582 warning is generated when the Store.exe determines the largest contiguious block of memory in the VA is less than 32MB.  If the largest block is less than 16MB, the 9582 error is generated.  When the error is generated, immediate action is required like restarting the Store service.

Allow me to review what VMF is.  It has long been the focus of research by mathematicians and scientists.  It occurs over time because of the varying size of memory allocations in conjunction with the varying lifetimes of those allocations. When I talk to customers about virtual memory fragmentation, I like to use the example of a hard drive in a PC. For example, I have a PC at home for which I just bought a 60-GB hard drive. I just installed it into my PC. The first thing I'm going to do, obviously, is load Windows XP on it.  After that I'm going copy family photos, MP3s, and install other files and applications. All of those files, more than likely, are going to be different sizes. And many of those files I might keep forever and some of those files I might delete after X-amount of time. It's the same concept as VMF- that here are different-sized files that I'm keeping for different amounts of time. So what would happen after, say, six months or a year, when I run the Windows Disk Defragmenter utility? When I analyze my hard disk, it's going to be very colorful. It's going to have a lot of red and green, right?  In other words, the hard disk is going to be fragmented.  The same sort of thing happens in virtual memory as well.  With your hard disk, you combat the problem by running a disk defrag tool.  In memory, you combat the problem by restarting the processes/system to clear the VA and start over. 

Since most 32-bit applications are built on the theory of utilizing different size memory chunks, fragmentation will continue to be a pain point.  However, since Exchange relies a lot on memory, VMF seems to show up more often.  Well, there are some steps that you can take to prolong the affects of VMF if your E2K box is generating 9582s.  Going back to what I mentioned above, one thing to keep in mind is planning for it ahead of time… for instance, not setting up an Exchange 2000 server on Windows 2000 Standard with 1GB or more of RAM.  With all versions of Windows 2003 supporting the /3GB, we shouldn’t have that limitation too much more in future deployments with W2K3 and E2K3.  But, for those still running Exchange 2000 on W2k Std w/1GB+, there are things that can be done, plenty actually.  But, don’t take this literally and go rip out a stick or two of RAM from your server!  That’s never a good solution. If you are getting the 9582 errors on your Exchange 2000 Server then check out *the* KB article on troubleshooting VMF, Q325044.  This KB is setup as a guide.  It has a handful of VMF-related tweaks that can be made on the server, plus other things like what to monitor, links to other KBs, etc.

The combination of Exchange 2003 and Windows 2003 have a lot to offer in terms of prolonging the negatives of VMF.  So much so that I haven't heard of one case in PSS yet where an E2K3 Server was generating 9582's!  I think this is a good pausing place for now...

Update: See my continued log entry here.

18 Comments
Filed Under:

When using perfmon (perfmon.msc) on my XP box, or any box for that matter, the default background color is always black, it always starts up with three local counters, and the chart doesn't have the grids.  I look at customer perfmons on a daily basis, so having to clear and re-format perfmon to the way I like it each time really irritates me.  Until now.  I recently came across these cool steps on how to setup your preferences in perfmon.msc and have your preferences set as defaults.  So, if you use perfmon.msc regularly like me, read on…

There are two ways to do it:

Good Method:
1.) Right-click on perfmon.msc and clear the “read only” setting.
2.) Make your personalized changes to the colors, default local counters, grid settings, etc.
3.) Choose “File Save”.
4.) Right-click on perfmon.msc Read Only. Settings made in step 2 will be preserved between uses.
*****note - leaving the read only flag off of perfmon.msc will cause the console to prompt you to save the changes between each use!****

Best Method:

1.) Set the background color you want, as well as any other config changes you want to make (scale, grid, delete the default local counters, etc)
2.) Save the .MSC to your desktop.
3.) Go into File, Options, and change the Console Mode to “User Mode:Full Access” and check “Do not Save Changes“ (This will prevent the new console from prompting you to save each time you exit).
4.) Chose File, Save As and save your console as perfmon.msc on the desktop.
5.) Rename the default perfmon.msc in the %systemroot%\system32 directory, and move your newly saved perfmon.msc from above into the same folder. Now the Perfmon.exe stub will launch your new MMC version. If you need to, just rename your new version and put the original back to perfmon.msc to revert the changes.