Archive for the ‘SharePoint 2013’ Category

SharePoint Distributed Cache Service ….DANGER WILL ROBINSON!!

 

This is a late night blog so sorry for the brevity and less cohesive thought pattern. I just needed to get this out there as much for my benefit as anyone elses.

So I have to admit. This service has always been a bit of a mystery to me. Simply put, it always worked. Never had to dig. Never had to care. Just a few commandlets and it was running and I went on my merry way. Well I had a recent case where I lost a host and had to dig when it would not come back.

What a scary experience. All these tools to use, the App Fabric cache tool, App fabric cache commandlets, SP commandlets, Services MMC, and multiple CA points to manage it. And then the warnings. If you use the wrong ones, you may need to rebuild your farm. Yes, rebuild the farm. Rebuild a 13 server farm. Can you say Holy crap!

I found a lot of blogs on fixing it, a lot of MSDN and Technet articles on App Fabric cache architecture then I got a virtual and began dissecting it. I looked here (http://almondlabs.com/blog/manage-the-distributed-cache/ ) got some great pointers. I looked here http://www.microsoft.com/en-us/download/details.aspx?id=35557 and got some good reference material. I found lots of Commandlets on configuring the cluster and manhandling it. Pushing in providers (which they tell you your choices are XML or SQL), and found plenty of hacks. In many areas the lines between SharePoint 2013 and App Fabric blurred and people were unknowingly straying into dangerous areas without realizing it. Then the clouds parted around 2200 after 3 cups of coffee. The answer came to me when I looked at the HKEY_LOCAL_MACHINE -> SOFTWARE\Microsoft\AppFabric\V1.0\Providers\AppFabricCaching registry key. The provider listed was a third…SPDistributedCacheClusterProvider.

The fog lifted

                So for those who want to better understand it. In VERY simple terms. An app fabric cluster is a set of cache hosts tied together through a central point which is a SQL Server DB or a XML file on a file share. Similar to SharePoint and its reliability on a Config Db to help keep all the servers in line. In the case of SharePoint 2013, that cluster is managed through the SharePoint configuration database and the set of Distributed Cache Service instances in the farm. The SharePoint Distributed Cache Service sits on top of it. It manages it. It maintains the App Fabric cluster underneath, and keeps everything in line. This is why you only run SP commandlets to add/remove, etc. It is also why if you use alternate tools like the App Fabric cluster tool you can kill your farm. A SharePoint farms greatest weakness has always been its config DB. It is the 1 point where you can kill the farm. Using another tool risks putting the cache cluster in App Fabric out of sync with the Configuration database in SharePoint. Once that happens, well anything goes. You might be able to recover it, you might never get another DC service ever running on your farm again.

If you want to manage this service then, stick to those commandlets. Stick to the guidance on the Microsoft documents at http://www.microsoft.com/en-us/download/details.aspx?id=35557 for maintaining it. Be VERY careful on any mods outside those commandlets. The farm is literally at stake here. It is a very dangerous game and from the blog posts I have seen out there, a lot of folks are hacking in fixes.

Another great article I found which just about laid it out for me was here(I am dense and did not get the simple truth of it right away when I read this): http://social.technet.microsoft.com/wiki/contents/articles/20348.sharepoint-2013-appfabric-and-distributed-cache-service.aspx

 

SharePoint and Kerberos for the common man

 

So your company decided it wants an enterprise class SharePoint 2013 farm. Oh and they have seen the awesome capability with PerformancePoint and PowerView, and SSRS. So you will be including the fancy Business intelligence add in. Awesome. Oh and they want ad-hoc reporting, and the DBAs and stake holders who own your data warehouse are not too keen on the idea of a random service account being used to access the data.

Sooner or later you will come to the point that you realize you will need to delegate the end users identity to the data store. In short, you will need to utilize use Kerberos Constrained Delegation (KCD). KCD is great, and at the same time utterly unforgiving. I am not diving into the specific settings. There are plenty of guides for those. This is a guide to KCD for those of us who don’t quite get the technical AD guides to it, and for those who need to troubleshoot it.

So first, when it comes to the basics, what do you need to know? You can go to this guide: http://technet.microsoft.com/en-us/library/ee806870(v=office.15).aspx which provides completely accurate guidance. However, for those who know jack about setting up Kerberos, and have no interest in becomes domain admins, what do you need to know? In short 2 things. 1 The SPNs (Service Principal Names), and 2 the service account Delegations.

SPN Delegation Basics

Here is the basic thing to keep in mind, the SPN is the TARGET of your delegation. The delegation itself is run from a source identity to a target (SPN). A SPN has x components. A Service Name, a machine/host name:<optionally a port> and an account name under which the service is being run. In the case of ECS, you set an SPN that would look like this: setSPN –s MSSQLSVC/Servername:1433 Contoso\svcSQLService. Now that the target is set you can set the delegation. This would be set in the AD tool by selecting properties on the account and then selecting the delegation tab…but wait. Your ECS account does have a delegation tab?

Dummy SPNs

              So when looking at guidance on SPNs for SharePoint you see a few on accounts such as your ECS service account that have service names like “SP/Excel”. So what is THIS for? Primarily, this is a “Dummy” SPN. And once you add it to the Excel Service account, the delegation tab we were missing in the last section will show up. You can now set your delegation

Duplicate SPNs

              So MANY domains have duplicate SPNs. You can add them to a domain. What’s the REAL harm? So remember the SPNs were your delegation target? In the previous example I set the SPN for delegating to the SQL server through port 1433 to account svcSQLService. Assume I also set one for : setSPN –s MSSQLSVC/Servername:1433 Contoso\svcSomeOtherAcct? Well, now I have 2 delegation targets for MSSQLSVC/ServerName:1433. When I set my delegation, it MAY set to the duplicate. If it does, my delegation will fail. This type of failure can be tough to locate. A good way us non-AD Admins can find it, running setspn commands. SetSPN –x will list duplicates, but depending on your list it could be HUGE. You could run a SetSPN –l Contoso\ServiceAcctName for all your service account and you may find it. Lastly, assuming you don’t have rights to actually add SPNs, you can run your own setspn –s MSSQLSVC/Servername:1433 Contoso\YOURACCOUNT. By default setspn will then list out all duplicates SPNs on that servicename/server.

Don’t get sloppy

You have to realize what an SPN/Delegation chain is. You are granting an account the right to delegate a person’s identity to another server/service. You need to think about that. In a standard ASP.Net application, if I were to delegate your identity to another web server, I could take actions on your behalf. If I were delegating a high rights service account used across an enterprise, I could wreak havoc. You should not add any delegation that you do not explicitly need. You should always use constrained delegation to enforce that delegation only be permitted by specific accounts to specific accounts on specific service types and ports. It is easy to throw SPNs out there “Just in case” and delegations “just in case” or because it is a totally time consuming frustrating experience of requesting tickets and service requests to have your AD team add additional ones later on that you were unsure of.

 

Other SharePoint SPN/Delegation troubleshooting

So how else can you look and figure out is your SPNs/Delegation point are causing issues with your services and where? In a full farm, there are multiple points. So brace yourself, these steps will have you looking through very large ULS logs. I will give you some pointers to hopefully cut down on that hunt.

Look on your SQL log. If you see NT Authority\Anonymous logging attempts, then your delegation is failing. If you look in your security log (through event viewer) on a server attempting delegation and see something like:

Error Code: 0x7  KDC_ERR_S_PRINCIPAL_UNKNOWN

Extended Error: 0xc0000035 KLIN(0)

Client Realm:

Client Name:

Server Realm: Domain.ACME.com

Server Name: MSSQLSvc/ServerName.Domain.ACME.com:1433

Target Name: MSSQLSvc/ServerName.Domain.ACME.com:1433@Domain.ACME.com

Well you are really lucky. You now know your delegation is failing and you know specifically which SPN is to blame. In the sample above, you have an SPN on MSSQLSVC/ServerName:1433 that is at fault. You now can look for duplicates or an improper service account name which most likely caused this.

If you are still seeing nothing though, quit delaying…it is time for the ULS logs.

So first when doing these steps coordinate. Run them all in sequence, and make sure you and your tester are the only ones on the farm at the time. If not, you will get a very large log and a lot of noise entries. First identify the server(s)/connections you need to validate. It would be good to navigate to within 1 click of the action/page/workbook you need to test. Keep a SP PowerShell window open on a remote session into the server(s) you wish to test.

  1. Execute new-splogfile (this will create a brand new log file for us to look at)
  2. set-sploglevel –traceseverity verboseex (your logs are now growing at a very fast rate)
  3. Run your test/data refresh/etc
  4. Re-execute New-splogfile to create a new log file and stop the growth of the one you will be analyzing
  5. Execute Clear-sploglevel to clear the high log level
  6. Collect logs

 

You will need to crack open your logs around the time your requests were being made. Some key things to look for:

  1. SPTokenCache.ReadTokenXml: Successfully read token XML ‘domain\<user name>’.
    1. This should be the domain\username that you logged into SP for to test. If you see this, then your C2WTS is successfully translating your windows claim into a Windows token. This is one less place you need to worry about in your search for your failure.
  2. PF_CHECK_ERROR returned ‘hresult error’ 0x80040e4d ; Stack Trace:NA
    1. This is a solid confirmation that you are missing a delegation point. Review your delegations on this server and you will likely find one missing, OR you have a duplicate SPN. In some cases, I have seen where a delegation set up with a duplicate SPN that has been deleted later will cause an issue. Sometimes resetting delegations will fix this.
  3. Lastly search for your data connections/connection string, as well as xlsx name (if ECS is what you are troubleshooting), or rdl filename (IS SSRS is what you are troubleshooting).

My PowerPivot Service App has fallen and it can’t get up

This was a rather fun “issue” I ran into on a client farm. I found very little documentation on this so it took me a few minutes to sort through it. Maybe it can help you out.

So I had a great PowerShell script that setup a large farm (13 servers). Had the full BI stack attached to SharePoint. Everything seems good until I started clicking on links in Central Administration, particularly on my PowerPivot Service Application. When I clicked on it, I got a 404. I thought that was weird. All the posts I saw in this issue told me to recreate it. I did this repeatedly and it did not work. It always gave me a 404.

So then I started to look a little more at the architecture for this site and the PowerPivot service itself. For those who have done this configuration there are some “special” things about it. Like for instance, it installs a group of WSPs in your farm to get all the pieces working. I thought, hmmm, that usually means features. I would have bet they were all hidden. But on a whim I went into the Site Settings page on the Central Administration site, opened up the Manage site features page. About ¾ of the way down the list there is a feature called “PowerPivot Administrative Feature” and it is not activated. So I activate this feature. No more 404 but I still get an error. So now I recreate the PowerPivot Service App again. This time is comes up and I am back in business.

So for those seeing this issue, go ahead and give this a try. My particular environment is very heavily GPO’d and I suspect the initial roll out of the corporate GPOs broke the initial service and blocked that feature. Either way though, something to consider.