Certificate-based authorization scenarios in Operations Manager 2007 are something we've tested and documented, but there is one question that's been raised a few times we have yet to address:
What if I assign a certificate to a server in a network across the Internet that has an FQDN that cannot be resolved by the management server?
Let's take the following example:
We have a management server and potential agent-managed computer on distant, untrusted networks connected only by the Internet.
- management server: OPS-MS1.momresources.org
- agent-managed machine: SVR03.contoso.local
You'll notice the agent-managed machine has an FQDN with the non-routable extension of .local. It could be anything really - the point is, servers in protected networks generally have an FQDN that is not resolvable across the Internet on remote networks. Operations Manager is capable of bridging this gap, but there can be some additional difficulty involved in mutual authentication in this case.
Why would this be a problem?
Quite simply, when you assign a certificate to this machine, and the management server attempts to contact this machine by its FQDN, the connection will FAIL if the name cannot be resolved by the management server (and if resolution fails in other direction for that matter). You could try to work around by assigning a DNS alias in a publicly accessible domain (like SVR03.contoso.com, for example) and using that in the certificate request and using that FQDN in the certificate request for the agent.
The result? Mutual authentication will fail. OM 2007 requires that the FQDN reported by the machine match that which is present in the certificate, else, mutual authentication will fail.
Why does Ops Mgr require this?
According to Ian Jirka from the product team, "The fundamental reason for us forcing the FQDN of the local machine is it greatly reduces the chance of having a name collision when using certificate based authentication, and provides a consistent tie back to the actual machine. If you have a name collision you end up with two devices with the same identity. Issues can arise from that ranging from strangely-broken functionality to security problems."
What should you do?
According to Ian, "You should give it (the FQDN you supply in the certificate request) the FQDN of the local machine-- we won't load it otherwise. In general things will work except for that which relies on the name for remotely accessing the machine. If you do need to reach out to that box from within an OpsMgr workflow, and want OpsMgr to give the name to you, it will give that local FQDN."
So you have the issue in this case that the name (svr03.contoso.local in our example) cannot be resolved. To work around this issue, you could simply add a host file entry on our management server (%windir%\system32\drivers\etc\hosts) that contains the PUBLIC IP ( in this example) of the agent-managed machine and the internal, or non-routable name (SVR03.contoso.local). And if the agent-managed computer cannot resolve the management server FQDN, it's feasible you'll need to add a reciprocal entry in the host file on the agent-managed computer.
And let's face it, this doesn't scale well from a process perspective. A gateway server fills the gap nicely when you have to manage multiple machines on a network across the Internet. At worst, you'd need host entries on only the gateway and management servers. From a security perspective, remember that agent communication is encrypted by default, so coupled with the mutual authentication sequence between agent and mgmt server, you have secure communication even across unprotected networks.