- Mar 15, 2007
- 2,004
- 1
- 0
I have a FTP site that gets files from time to time and I have a Perl script that runs over the directory, moving things around automatically for me. Trouble is lately I've seen some files with Windows-1252 and Big-5 characters in the file names. Perl is unable to move the files because it's expecting UTF-8 and is scrambling the characters when it reads them from the from the directory. I could easily encode things correctly using Perl's Encode *if* I knew what the encoding of the file name was. Most of the files are properly using UTF-8 file names, so I can't just switch encodings without messing up what I usually do. The files aren't text, so I can't use 'file' to get the encoding.
I've tried things like guessEncoding before but it's been unreliable. Is there a way to get Unix to be consistent with the encoding of its filenames? Maybe get FTP to do it somehow?
I've tried things like guessEncoding before but it's been unreliable. Is there a way to get Unix to be consistent with the encoding of its filenames? Maybe get FTP to do it somehow?