Windows Azure Pack is a great technology but sometimes Microsoft seems to be missing the complexity of providing services to hundreds if not thousands of customers, each of them with different needs. Standardization is good but it should provide enough flexibility to manage things in a reasonable way. So here’s the scenario: you deployed WAP and your customer starts to buzz into your server because how good WAP actually is. Rock solid, able to run most demanding websites. Great technology, I said.
However, one of your customers runs over his allotted quota of one of the seventeen metrics you can monitor for your websites. Since your customer didn’t opt for a pay-per-use billing model, you just suspend his website, like any hoster in the world would do, and your customer calls in to buy more “of that something”. He pays (if needed), if you’re like us he gets an add-on to increase his quota over a standardized plan and, after a while and a little synchying, WAP states that his account is now back in quota. End of story.
Well, almost! Because your customer calls in again stating that his website is still suspended while on his portal he sees no over-quota warnings and you realized that he is right: no warnings, increased quota but website still suspended. Hmmm…
From now on, you start a journey to understand why WAP didn’t re-enable that website. You try to update plans, subscriptions, add-ons, force subscription resynchying, force plans to update, open-up Powershell to look for a well-hidden parameter that will allow you tell WAP to re-evaluate quota and re-enable website, or (at least) a cmdlet to tell WAP to re-enable that #&%$@#!$ website because I’m the one who rules here. Apparently, you discover that there isn’t anything like that: no way to re-enable a website that is over quota. So you basically look at portal where it states that counters will be reset in 29 days and you start thinking if WAP really wants to re-evaluate quota in 29 days from today. You wait in the hope that it will take a little time to start that website again but no luck. After hours, you really start to think that WAP will keep that website suspended until it will be time to reset counters, which is obviously totally unacceptable. But the most scary thing is that you cannot force it to change his mind. You’re basically powerless.
After a little checking, it turns out what I stated is not true for all quota items. For example, a short while after upgrading (by adding an add-on) sent bytes quota, a suspended website has been restarted by WAP itself. However, there are other counters that seem to be evaluated only when their configured period will expire, for example CPU burst and received bytes. No matter how long you wait after changing those, WAP won’t restart the affected website. I’m talking about hours here, not minutes, but I couldn’t wait days to check if it would happen after 29 days as suspected. I can tell you that when the period for CPU burst time was set to 24 hours, website has been re-enabled after 24 hours, not matter if we increased that quota by using add-ons.
That’s why I said that Microsoft didn’t actually understood the complexity of providing services. On what earth could you tell your customer that he “has to wait” until configured period expires to have his website restored, especially if he even paid for quota upgrade ? That has no meaning other than that, since you’re a big corporation, you can do whatever you want and your customer had to think twice before consuming his quota. Or, probably, you only sell pay-per-use accounts so you don’t care about such situations.
We thought about helping customer publishing his website to a new subscription with increased quota but that would have been bad for us so our last chance was to dig into WAP databases, hoping to find a table where a “Suspended” field was set to 1 and turn it back to 0. To be honest, we had little hopes that such setting was part of a database because we thought it could be buried somewhere else, registry, configuration files or who knows where.
We attempted to scout many tables, not knowing if that setting could be part of Websites Controller or maybe was buried inside portals databases or somewhere else. No tables where sites were present had a suspended field or something like that so after a while we decided to look into other less-obvious tables. Long story short: we were a bit lucky because that website was the only one that got suspended on a specific Websites cloud so that helped to spot the setting and confirm that was the one tracing it back to the affected website.
The key table is hosting.runtime.ServerFarms where hosting is the name of the database created by the Websites Controller. In that table, we spotted a IsOverQuota field which was set to 1
Unfortunately, ServerFarms doesn’t directly link to Sites, the table where all websites are listed. Instead you have to use the OwnerID field to understand which site that row is referring to. In our case, that was OwnerId 14 that we hoped to find inside Sites table. Luckily enough, that OwnerID was associated to the suspended website !
So we could hope to set IsOverQuota to 0 and check if WAP would allow our website to start. And, not without joy, we found out that was the case ! Website back to normal operation !
To be honest, I didn’t expect it could be so hard to solve such a simple situation. It looked very weird that we had no other choice than to perform risky (even if calculated) changes to an unknown database in order to restore normal operation for a website. Worse, I can’t understand how WAP guys didn’t think that you might need to re-evaluate quota (which is done) before the actual expiration of configured period (not done) or, simpler than that, you might just need to re-enable a website because you simply decided so.
We will investigate further about this problem and I do really hope to find out that there actually is a way to perform such operation without tampering the main databases. In the meantime, if you find struck with the same problem, here’s how we solved. Of course, remember that applying changes to a database you don’t know is very risky and might lead to unknown results so you have been warned: we accept no responsibilities. At least, perform a full backup before you do anything !
Leave a Response