I seem to have some weird encoding issues on my network

Red Squirrel

No Lifer
May 24, 2003
67,882
12,354
126
www.anyf.ca
My coding environment costs of editing files directly on the server via NFS. It's just easier because everything is then backed up centrally, is on raid, and if my workstation crashes or I decide to reload my OS, the files are on the dev server. In the case of web apps it's also nice to be able to run them directly. Ex: simply hit refresh in the browser after editing code.

Anyway I seem to often run into weird issues where even a plain text file does not load, or load correctly, it seems to be random and depends on the text editor I open it with. I have posted about it before but the more I run into these things the more I think I have something weird going on my network.

Basically I often run into a situation where a certain text file fails to open. I've never even heard of that before, it's just text! But it will fail to load like a corrupted file would fail to load in an office suite. Sometimes another editor will open it, or I can use "cat" etc but in a specific editor it will just error out saying the encoding is wrong or something. But now a new problem, I have some files that open fine in my text editor, but in Vim, there are no returns. Everything is just jumbled up together.

I don't really understand how this whole encoding thing works, but from what I do understand it means the file is the "wrong" encoding when it fails to load. I assume it should be UTF-8, on typical storage devices as bytes are 8 bits and 1 byte = 1 character. Somehow, it seems my files are somehow not always registering in that format and that's where I get weird corruption issues. I have even run into files that open fine in one editor but in another look like chinese.

Could it be some kind of NFS setting that is causing this to happen? Or rsync/ssh? Where would be a good place to look to figure out what is causing files to get "formated" wrong?
 

Ken g6

Programming Moderator, Elite Member
Moderator
Dec 11, 1999
16,282
3,903
75
No returns? Sounds like something unix2dos might fix.
 

Red Squirrel

No Lifer
May 24, 2003
67,882
12,354
126
www.anyf.ca
Yeah some of those tools work, but why do I keep having to do that? And what exactly are those tools doing to the file? Since other than it not working in a specific editor it's still the same file, same text etc in it. Does that work at a lower level, like how it's stored on disk? I presume each file has some kind of file system level header, is that what is being written wrong?

Edit: Scratch that. For this particular file dos2unix or unix2dos don't seem to work, but I have seen some weird vim commands work, but it's rather convoluted. Trying to figure out why this is happening in first place, and why to random files.
 
Last edited:

DaveSimmons

Elite Member
Aug 12, 2001
40,730
670
126
I've had a PHP file end up as double-spaced because of CR+LF conversion into 2 lines, and have had some weird code bugs because of pasting in the Visual Studio editor going the other direction -- LF only being treated as 0 returns. It can be annoying.
 

Gryz

Golden Member
Aug 28, 2010
1,551
204
106
As you probably know, Unix OSes, Windows and MacOS all used different encodings for newlines in text-files. I think there is still no single standard. (Someone correct me if I'm wrong).
I think today still Windows uses CR+LF, while most other OSes use just LF.
More details here: https://en.wikipedia.org/wiki/Newline

Nowadays people forgot about that. Because they hardly notice that the formats are different. Because the applications they use to edit text-files actually try to deal with the issue, without bothering the user. An application like vim or NotePad or Emacs sees what encoding a text-file uses for newlines, deal with it, and save it in the same format.

Now I don't know this for sure, but I can image that some (old) software doesn't do this properly. Or maybe recognizes one of the formats, but saves not back in the original format, but in the format it thinks is best. Or whatever. So some software on your system might do this while you don't realize it. Maybe your web-server software ? Maybe some middleware ? No idea.


But there's a more important issue you should deal with.
You should take a day or two and learn git.
http://product.hubspot.com/blog/git-and-github-tutorial-for-beginners
Did you ever use git ? Or another version management system ? (Cvs maybe ?)

What you should do is: create a git-repository on your NFS server.
Then check out a local work-space on a hard-disk on the system on which you are working (editing and compiling).
You can mess around in your local work-space. It's quick. No NFS involved.
It might even solve the problem you are seeing with the newlines (maybe not).
When you are happy with a certain step of progress you have made, you commit and push your changes back into the repository.
And then you continue with the next step. And when it's done, you commit and push again.

This is much safer than just having one copy of your files and editing them directly.
If your local HDD dies, you still got all your stuff in the repository. (Except the changes you made in the day or days after your last commit).
But also, if you mess up your code, and lose track of what you were doing, it is very easy to go back to the last sane version of your code. (Happens more often to me than I'd like to admit).
If you ever want someone else to work on your program, git will make it very easy to work together.
Also, while you work on improving your code, it is very easy to see what you have recently done (git status and git diff).

Git is very very elaborate. But don't let that scare you. Learn to set up a repository. Learn to do the basics: check out a work-space, commit changes, and push your changes back into the repository. That's all. Within a year you'll be happy you took the time to learn git.

Besides, many many commercial companies use git in their development centers. Knowing git is a skill that will be appreciated in many places.
 
Reactions: beginner99

Red Squirrel

No Lifer
May 24, 2003
67,882
12,354
126
www.anyf.ca
I don't want to mess with git when I can edit directly on the server (I do want to learn git for other reasons though, but I'm not changing my main development method as it works), and then simply relaunch what I'm working on to immediately see the changes. I don't want anything stored on my local machine. The server already has a dev->test->prod system for projects. A lot of projects just go straight to test and then to prod, prod and an external script is then called to put it on the live environment. Ex: websites. So to do changes to my website I edit the dev server version then push to test, and then prod which updates the real website. I'm also not sure how git would solve this particular issue. There must be some way I can fix it in rsync/nfs so it uses the right encoding for all the files.

I'm aware of the newline thing, but why would some files have wrong new lines and some files not? Take these files for example:

http://www.uovalor.com/misc/fileissue432424.tar.gz

The file account.t.php is all messed up if you open it in vim (it's fine in other text editors) but the file customfields.t.php is fine. Both these files have been through the exact same development process. Ex: stored on same server, edited with same editor. Why would one behave different than the other?

I have seen the same thing happen with other projects too. I've even run into files where the editor actually refuses to open them at all. Sometimes tools like dos2unix and unix2dos work, but for these particular two files, nothing.
 

Red Squirrel

No Lifer
May 24, 2003
67,882
12,354
126
www.anyf.ca
Upon further inspection it seems the files that work use 0x0d 0x0a style returns, which is correct. For some reason the files that don't work use 0x0d 0x0d style returns. Why would it do this and why would it be so random? I use Kate as my editor.

One way I have found that does fix files is to copy and paste the text into a new file, which is odd. But I'm more wondering why I seem to run into so many random files that are messed up.
 

Gryz

Golden Member
Aug 28, 2010
1,551
204
106
You should have told us exactly what OSes you are using. Are you working on a Mac or Windows or Linux ? What is your server ? Please not that some protocols do automatic conversion of different styles of newline (e.g. ftp does it in text-mode). No idea if ssh does it (it might), or rsync does it (probably not).

"0xd 0xd" isn't a special encoding. It just means 2 newlines back to back. Your file account.t.php has single, double and even triple \r's in it. I don't think there's anything wrong with that.

You use hard-tabs in your files. Don't do that. People hate it. Configure your editor to put 4 spaces in stead of a hard tab. (Although I suspect you're gonna be stubborn again, and ignore common practices).

The benefit of using git would be that if things get messed up, it is very easy to get the original file back. I guess now you are just editing all files to bring them back to original state. Plus the other benefits I wrote about in my previous post.
 
Reactions: beginner99

Merad

Platinum Member
May 31, 2010
2,586
19
81
It sounds like your server is windows and you're editing the files using linux/mac (or vice versa). Editors usually make assumptions about line endings based on their host OS. I really doubt the networked file system has anything to do with it. It's also possible, depending on the OS and editor, that the editor is inserting "magic bytes" at the start of the file to indicate the file encoding, causing it to break on other OS's.

git automatically transforms line endings between windows and unix style as you move the code between OS's. It would also allow you to recover screwed up files in moment if the issue managed to happen anyway. As is you may end up having to inspect the files with a hex editor to figure out the problem.
 

Red Squirrel

No Lifer
May 24, 2003
67,882
12,354
126
www.anyf.ca
My OS is Linux Mint 18, Server is running Cent OS 6.8. Occasionally code from a different machine which is running Mint 17. All connect back to same server. It's the master code repository for all my projects. So far I've had a few projects get hit with this plague, not sure if all of them are now affected, most are dormant/stuff that's not really regularly worked on.

I have a project that I use git for so I can commit to a web accessible server that someone else can access to so we can all commit to same place and keep things synced (well want to, it's not fully setup yet), but I still run it on the dev server, I don't like having anything local, the dev server handles all my projects in one place, and is raided, backed up, has various helper scripts etc. Had this setup for a long time and it's worked fine. A decade+ ago I used Samba instead of NFS but even NFS been using for at least a decade now. What I'm trying to figure out is what would cause some files end up encoded weirdly while some arn't. The one with the 0xd's is the one that causes problems in some editors. Like if I decide to open it in vim, everything is just jumbled up. If I copy and paste the content into a new file then it's ok, so I suspect the problem is more involved, like something in the file descriptor on the hard drive or something causing problems.

Tabs vs spaces seems to be one of those endless debates. Suppose I could switch to spaces but don't think that would really solve anything. I don't really have any preference I just hit tab because it makes a nice long space and it does whatever the editor does by default. Never really had a reason to go change it.
 

Red Squirrel

No Lifer
May 24, 2003
67,882
12,354
126
www.anyf.ca
Would any special signature show up in a hex editor? Or is that something that is at the file system level like a header of sorts? Trying to figure out what the difference between a broken file and non broken one and not much comes up except for the broken ones having 0x0d returns. Instead of 0x0d 0x0a ones.
 

Gryz

Golden Member
Aug 28, 2010
1,551
204
106
If you want to know exactly what is in a file, don't use an editor. Use the od unix command. You can use different flags if you prefer a different format. Most useful is probably: "od -t x1 -c account.t.php".
 

Red Squirrel

No Lifer
May 24, 2003
67,882
12,354
126
www.anyf.ca
Was using a GUI hex editor, also have used hexdump, is that one different? Actually is there a command that shows/converts file encodings? I'd like to just mass encode everything in UTF-8 (or is another format better?), assuming wrong encoding might be part of my issue. would also be interesting to see what encoding the broken files are, if they are different. Maybe my issue is not even encoding related, I just kind of assume it is.
 
Last edited:

Gryz

Golden Member
Aug 28, 2010
1,551
204
106
Unix doesn't know file encodings. A file is a file. It's a list of bytes. In Windows the filename (the extension) matters. In MacOS, the OS used to keep a 2nd file with specifics about the file. Unix/linux doesn't do all that. Everything is inside the file. Changing the filename, moving the file, it all doesn't matter.

hexdump is fine probably. I've always used od, that's why I recommended. My point was that I wouldn't trust any editor here, including hex-editors, because lots of editors try to do "smart things to help the user". And in this case you don't want any help, you want to look at the raw file content. Hexdump and od do that.

As others have said, we suspect that the issue is that you load a file into some program, the program tries to be smart with newlines/carriagereturns/linefeeds. But it has a bug. And when it writes the file, it messes it up. The trick is to find out which program. If I were you, I'd start with a fresh file, with proper newlines, and then do all the usual stuff you do (edit it with your different editors, copy it around, use rsync, etc). Load the file, change 1 character, save the file. At each step, use hexdump/od to see if it has changed. A little bothersome, but I wouldn't know what else to do.
 

Red Squirrel

No Lifer
May 24, 2003
67,882
12,354
126
www.anyf.ca
"A file is a file" see that's what I used to think, till I learned about encodings. I still don't quite understand how it works or where the information is stored, I just know that some files for whatever reason don't "format" correctly when displayed, whether in a text editor or even using something like cat. A text editor should not even care, a file is a file. Just show me the bytes/characters. Just find it odd how some won't open at all. (been a while since I ran into that though) or some show as Chinese (like literally Chinese characters), or some don't have proper returns etc.

For the particular example I posted, it seems copy and pasting the text into a new file fixes it, but I still need to figure out the root cause or I'll keep having to do it. It seems to only be vim that's affected but I've had it where my main text editor was affected too.
 

mxnerd

Diamond Member
Jul 6, 2007
6,799
1,101
126
Do you switch between Windows & Linux editors? Windows editors default to CR+LF (which is hex 0d 0a), and Linux or Unix editors default to LF (hex 0d) only.
 

Red Squirrel

No Lifer
May 24, 2003
67,882
12,354
126
www.anyf.ca
Sometimes, but very rarely. Could that cause an issue?

The documents that do pose an issue seem to be the ones that use only 0xd. Vim seems to interpret that as ^M and then everything is jumbled up together.
 

mxnerd

Diamond Member
Jul 6, 2007
6,799
1,101
126
^M is 0xd (carriage return), isn't it? All text editors in Linux should handle it properly as End of Line character,

don't know why it would cause crashes on your system or mess the file in Vim?

You probably should follow Gryz's suggestions and find what causes this.

===

Found an easy to use hex editor/viewer called Okteta, might be helpful.
 
Last edited:

Red Squirrel

No Lifer
May 24, 2003
67,882
12,354
126
www.anyf.ca
Okteta is the one I was using, and hex dump if I'm SSHed in. So far all my files seem to open in Kate though. I was using Xed before as Kate had a nasty bug that made it useless over NFS (it would write to disk for every character typed) that finally got fixed. Xed would struggle with random files, it would actually refuse to even open them. I don't quite get how a text file can be "corrupted". Vim opens these files but it just messes up. And it's only random files. If I open it in Kate, copy and paste the content into another new file, then it fixes it and vim will open it. I don't use vim a lot other than if I want to do a quick change, it's just that trying to use it pointed me to the fact that I really got some weird issue going on.

I will have to experiment with every possible text editor I've used to see if one of them may be posing an issue I guess.
 

mxnerd

Diamond Member
Jul 6, 2007
6,799
1,101
126
Have you tried Notepadqq++ (a Linux clone for Windows Notepad++ ), Geany, Atom, VSCode, Sublime, etc?

Atom, VScode & Sublime might be larger programs, but also more polished, I think.
 

Red Squirrel

No Lifer
May 24, 2003
67,882
12,354
126
www.anyf.ca
Never heard of Notepadqq++ but I used to like Notepad++ in Windows. The others I never heard of except for Sublime, but don't you have to pay for that one?

Kate works ok though. (unless it turns out to be the source of the issue?)
 

mxnerd

Diamond Member
Jul 6, 2007
6,799
1,101
126
VSCode is by Microsoft, it's a derivative of Atom editor open source project. Both are free. Tons of plugins.

You are supposed to pay for Sublime, but you can keep using it and it won't expire.

Notepadqq++ is a less version of Notepad++, not really a complete clone.

You can see CR and LF characters right inside Geany.
 

mxnerd

Diamond Member
Jul 6, 2007
6,799
1,101
126
I was wrong about end of line charater for Linux. It's LINEFEED LF hex 0a. MacOS use Carriage Return CR, which is hex 0d.

Your uploaded file account.t.php when opened in Vim showed a lot of ^M and won't break lines in the editor. But you can see there is "noeol" indicator showing in Vim status bar at the bottom.

So likely that file was edited on a MacOS machine or imported from a MacOS machine earlier.

Probably have to use text process utility to clean up all the files in your projects to make them all have same EOL character(s).
 
Last edited:

Red Squirrel

No Lifer
May 24, 2003
67,882
12,354
126
www.anyf.ca
Hmm nope I don't own any macs lol. That's weird that would happen.

So what should it be? Is there an utility that will fix it recursively? If not I don't think it would be too hard to write, and I can just have it run in a cron job or something.
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |