Good ways to screw up your WFEs part I

Posted: April 2, 2008 in MOSS 2007

                I imagine this will be a multiple part posting as off the top of my head, I think I could write an 1,110 page book on this topic. I got a new one the other day which I had never tried, but really, when it was shown to me, it scared the crap out of me.  One of the biggest things to take from these posts, have a solid and TESTED Backup/Recovery process in your organization. Do NOT just give it lip service, write it on paper, file it away (ironically probably in MOSS), and wait till your first major failure to fully test it.

                I will give you a real life configuration I have worked with which would be majorly impacted. Adjust to your own real world scenario. So, let’s assume you have a 5 server production MOSS farm, 2 A/P clustered SQLs, 1 index, 2 WFEs. You have 3 web apps in your portal. You have a custom membership and role provider.  You have 4 or 5 custom web parts. Maybe a server control or 2 you have thrown in that uses an entry or 2 in your web.config app settings. For fun, let’s throw in a customized CAS file. You also have some SSL settings you had to configure in IIS as well as some Metabase configuration you had to screw with since you are using host headers across SSL. Maybe your did some IPBinding, who knows .

                WFE1 is being a PITA. You get royally annoyed by the WSS Services misbehaving, or you are getting a freaky error on that box, or you are bored and a masochist so you are playing with your production configuration. For whatever reason, you decide to stop and start the WSS services on that WFE. You click stop. Give it a second, click start. Your memory issue or whatever issue prompted you to restart that service disappears. You are happy, for the moment.  Then your custom web parts start kicking off errors. Probably security errors.  As a matter of fact, you are getting sporadic errors where users cannot even log into your web applications through FBA anymore.  Most likely, you get issues where the server cannot resolve the url to the web apps anymore.  All hell is breaking loose. So what happened?

                To see what this actually did, grab a virtual with MOSS on and at least 1 MOSS web app. Open ISS, open a windows explorer session to the inetpub folder for your web apps. Now that you’ve got your window open,  go into Central Administration and stop the WSS services. You will see your IIS web apps totally disappear along with the inetpub folder for your web apps in windows explorer. Start the WSS Services. They reappear.  Man, how did MOSS do that?  Geez, it is smart you say. Not really, it pulled the information it needed out of the MOSS farm configuration DB.  When you reactivated WSS Services, it rebuilt the web applications and your inetpub folder(s).

                Back to your production scenario. You had those SSL settings, the metabase settings for your host header/SSL setup, your web config and Code Access security mods. Then you start to panic as you realize, MOSS rebuilt those web apps using the MOSS configuration DB. Your mods to IIS, web.config, CAS, etc, MOSS knows absolutely nothing about those. You make those settings totally outside of MOSS. In essence you just blew them away. What you have created is an opportunity to discover just how good your backup/recovery process is for the file system of your WFEs.

                If you restart your WSS services through the central administration site, keep in mind, MOSS will rebuild the web applications on your WFE from the MOSS configuration DB. Any modifications you made outside the realm of MOSS, will be gone.  If you have not taken steps to back up your metabase and your file system mods you will now get to do them again, manually.

                So, a couple lessons here. First, backing up MOSS involves more than just your DBs. If your DBs are all you are backing up. You are probably missing something. Second, even seemingly miniscule tasks like turning on and off WSS Services can have MAJOR impacts to your farm. Giving an untrained individual access to central admin and throwing a shiny new “Sharepoint Administrator” title on him/her, is akin to handing them a case of nitro glycerin, labeling them a demolitions expert.  Train your staff for the role and once trained they will give you point number 3….ALWAYS test even simple actions like this on a disposable test system BEFORE production. Not just a VPC sandbox, but a fully fledged test system that mimics production. A single server sandbox and a multi-server farm behave differently in many areas. If you have a multi-server production farm and you do all your testing/validation on a single server sandbox, you will have issues. Sooner or later you will create issues on your production environment, and they will be real fun to debug.  You can virtualize the entire test system, but create one and use it.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s