We're looking to improve the standardization & quality of data in our active directory. In doing this we've found inconsistencies in the format of the distinguishedName
; specifically the format of the CN
itself. We often have reasonable (though never 100%) consistency within one legal entity (top level OU), but across these entities, various formats are used. The most common are:
GivenName + ' ' + Surname
e.g. "John Bevan"Surname + ', ' + GivenName
e.g. "Bevan, John"string.ToUpper(SURNAME) + ' ' + GivenName
e.g. "BEVAN John"sAmAccountName
e.g. "ukjlb023"
Are there any guidelines around what makes a good format, or considerations to be made in making this decision?
- Values should be unique, so you want something that will be unlikely to have collisions with existing values (e.g. only using
givenName
you'd have to append an additional value to many entries to avoid collisions for all but the rarest of names). - Immutable values are better (e.g. as some systems use the DN as an identifier, so changing the CN which forms part of the DN may have knock on effects on access to such systems).
- Presumably there's a benefit to using meaningful names in that this allows a person to work out who's account a CN/DN refers to without having to look up other attributes... but that's not critical as such information can easily be queried.
- The format
Surname + ', ' + GivenName
includes a comma, which is a special character in theDN
, so must be escaped. By avoiding special characters in the CN we're thereby limiting the impact of bugs in software/queries/scripts (there is also a downside to this in that such bugs therefore are more likely to go unnoticed). - Presumably there is an index applied to this field, so whatever's on the left will have greater impact on the search performance. Given people are more likely to know colleagues given names than surname or sAmAccountName, perhaps there's some small performance benefit in putting the
givenName
first. - Some people are known by other names (e.g. Robert->Bob, Madeeha->Dee, Ashish->Ash / some people simply preferring to be referred to by their middle name or an entirely different name). Where the nickname is not a substring of the real name there may be some benefit in search-ability by including this name in the CN; e.g.
John \"JB\" Bevan
(escape chars required in DN but not in CN).
Are my assumptions correct, or are any of these just overthinking the issue / way off?
Are there other considerations to be taken into account?
NB: There's a related question: https://stackoverflow.com/questions/7814569/what-do-people-use-for-cn-with-inetorgperson-in-ldap-directories - though this is focused around the "how to avoid collisions" part of the discussion.
In my experience there aren't. I've seen huge organizations (100,000+) that have adopted different formats over time and are a mess. I think some organizations use givenName surName because that is the default.
Something to keep in mind for large global organizations, I've seen places where women may not want to use their last name or half the men are named Muhammad, I've seen single character names like 's', or the surname is so long they abbreviate it to 'M', so the givenName/surName combination may not work as well as in western cultures.
There is also the topic of acquisitions. Depending on how an acquisition is handled, it's possible to end up with multiple directories and a meta directory, and multiple formats. Standardization of what is used for cn may not be a high priority. Or they may use the 'standard' cn format for subsequently onboarded employees so it's a bit of a mosh pit.
Cn=samAccountName may seem logical, if samAccountName has a standardized format. Some organizations let people choose their username, so if someone selects 'jj89' as their username it may not produce the desired intended result.
Some organizations don't rely on cn at all, but use an employee id that does have a standard format that is stored in a different attribute.
I wouldn't be concerned about characters that need to be escaped. A distinguished name can have almost any character, and people that work with LDAP directories it's their job to know what characters need to be escaped.