Anyone here have to work with name formatting?

Scarpozzi

Lifer
Jun 13, 2000
26,389
1,778
126
Powershell...

I'm trying to properly format names to add them to Active Directory and a few other systems. Basically, the data I'm getting isn't very clean. I've found some random characters in the past where people have entered parenthesis and apostrophes...of course, I see latin-1 characters too. This is all UTF8, but it catches a lot of junk.

I'm curious if anyone knows of any standard processes to throw these names through to get them all somewhat consistent on the other side. Right now, I'm just taking the fields and making them a little more friendly with a replace command

For example adding a second apostrophe to allow them (they are valid in AD, but must be escaped) and then removing parenthesis:
$FIRST = ($FIRSTNAME -replace "'","''" -replace "\(","" -replace "\)","")

Then, I make sure the first letter is set to TitleCase:
$FIRST = (Get-Culture).TextInfo.ToTitleCase($FIRST)

These two steps handle most of the situations I can think of...someone entered as all caps, or all lowercase....it allows names like O'Malley through, but changes them to O'malley.

Names like Mckay or MacGruber would be Mckay/Macgruber. Anyone know of a good reference to build in the logic to fix those without negatively impacting people with NORMAL names? Sorry...gotta make fun since I'm of Scottish and Irish decent.
 

purbeast0

No Lifer
Sep 13, 2001
52,929
5,802
126
you can do all of this with regular expressions. i'm no pro at regex by any means, but i've done a little bit of it to strip out characters and format things.
 

Aluvus

Platinum Member
Apr 27, 2006
2,913
1
0
you can do all of this with regular expressions. i'm no pro at regex by any means, but i've done a little bit of it to strip out characters and format things.

You can, but as always with regular expressions (and double-always with human names), doing it right can be complicated.

For example, fixing capitalization in the O'Malley case is easy enough (I'm using Perl's regex syntax, which is a bit different from Powershell's):

Code:
s/^O'(.+)$/O'\u$1/i

This will look for a string that starts with "O'", and then capitalize the next character. The /i option just causes the match to be case-insensitive, which probably doesn't matter in this case.

McKay and such are similarly easy, so long as you can assume a mostly American (or similar) workforce - that is to say, so long as you can assume that anything starting with "Mc" will conform to your expected behavior.

Code:
s/^Mc(.+)$/Mc\u$1/i;

You might decide to get brave, and cover MacGruber with the same regex:

Code:
s/^(Ma?c)(.+)$/\u$1\u$2/i;

But, as the link in Scarpozzi's follow-up post notes, this can catch a lot of names that are not normally written with the additional capital letter (ex Macauley). That problem is probably intractable to solve through automated means, except to (at least partially) use a whitelist or blacklist approach.

That same link notes the problems that can come with names like DeBeers, Van Hoff (or was that van Hoff?), etc. These are again very hard to resolve without a whitelist/blacklist or manual intervention.

The link also briefly touches on the extra fun that can come from Roman numerals.

@OP:

As it happens, I have dealt with a similar problem. I have a tool that takes free-form text, usually all-caps, and converts it to something more readable. The only workable solution was to combine a set of rules implemented with regular expressions (things that look like sentences should start with a capital letter, etc) with a list of words that should always contain capital letters. This works reasonably well, but only because it is used within a limited domain (so the list of special cases doesn't blow up). My approach to names was simply to include them in the list of special cases (I got lucky and don't have to deal with many of them).

Out of curiosity, where are the parentheses coming from? Names like John Smith (II)?

As for "standard processes": unless you can do something to clean up the names as sent to you (i.e. improve how they are originally typed in), automation can only take you so far.

Failing that, my experience has been that the standard process is to just use title case and let the exceptions suffer.
 

Scarpozzi

Lifer
Jun 13, 2000
26,389
1,778
126
I posed this question (and posted this question) in the Powershell forums. The Canadian response was to take the authoritative source verbatim and let someone else maintain case and such. I'm sure they get a lot more interesting international names, though we're seeing more and more these days. I'm going to remove or escape the special characters so my script doesn't break and leave the rest to the data entry folks.

To answer your question, the parenthesis I originally noted were actually in a nickname embedded in someone's First name field. It would be like, "John (Tucker) Smith"

I don't know who or how it was entered that way as there may be multiple sources of the data that gets pulled in. In cases like that, I'm assuming it was the end user that entered it on a webform or a temp that didn't know any better.

I can't do much to fix the process and can only *react* when new issues arise. My number one goal is to build something robust enough that can adapt when new characters are introduced so strings don't end up becoming more than strings... Luckily, I don't think double quotes can possibly make it to me, and I only have to worry about apostrophes or parenthesis as special characters (so far it's the only two I've seen).
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |