Today I had to dust off the Exchange troubleshooting hat and look at a mail flow issue.
We have one Exchange 2003 server that holds our public folders and a legacy send connector that we cannot seem to migrate to our 2007 HT server. A number of users had called saying that reports and sending mail to public folders were failing.
I quickly tested the internal & external mail and all appeared ok <phew> so it was localised to the 2003 server. I did the basic checks, services started, can I telnet to SMTP port, checked the event logs and the queues. All was looking good except the queues which had around 4000 messages sat in the “Messages with an unreachable destination queue” Looking at the mail addresses
Doing a find messages showed that the first message was just after mid-night on the bank holiday Monday. You cannot force messages to retry from this queue, you can just freeze and unfreeze and delete choosing of to send an NDR.
So I decided that a quick way to get the messages to retry would be to stop and start the SMTP server. Sure enough the queue was flushed and I saw them retrying. This was the first clue. As I saw then hit the routing group connector (RGC) to my 2007 hub transport server and that’s where they stayed for a while before moving to the unreachable destination queue.
So, off to the Hub Transport server. A quick look over the event logs showed an error that occurred about 1 minute before the first mail was submitted and got stuck. The picture below shows the event log message.
I checked the Message Transport service and lone behold it was stopped, a simple restart was all that was required. But what of the mails, they were still stuck.
I restarted the SMTP virtual server again and the mails slowly started to send but not all of them went. About half way through it all stopped. Further checking I found that taking one of the mails out of the queue and then re-sending stopped the service from crashing.
So it would appear that a corrupt mail was the cause. I did do some R&D on that error just to be sure and a lot of people were complaining of corrupt que files and rebuild the MT databases. I will keep an eye on things and update this post should it fail again.