windows code interpretation

mutz · Jan 10, 2011

been taking a look at ISA's and code compilation/interpretation, JIT and so on,
what happens to windows code? is there some inside compiler or interpreter which transforms the programs code into machine code?
is this something that is happening within the CPU itself, like higher level code is being emulated to the ISA?
is it like you compile code from C or so into assembly and what happens then?
windows have some internal interpreter which translate the assembly code into machine code or so which is then being executed?
what happens in linux? and is there any use for interpreter through windows or is it being used with other OS's such as IBM's or so?

missing some details here..
thanks!

Modelworks · Jan 10, 2011

It comes down to the kernel. The kernel is the interface between cpu and everything else. http://en.wikipedia.org/wiki/Kernel_%28computing)

exdeath · Jan 11, 2011

C/assembly/Win32 code is all native x86. Calling a function like fopen in the C runtime on a Windows system ultimately calls, via a wrapper, the Win32 API native OpenFile function, all of which ultimately calls the only way to interact with the file system through the kernel via a call to NtCreateFile in ntdll.dll which invokes an "int 2e" instruction, a kernel system service dispatch, which transfers control to the OS to run more x86 code that performs rights management, file system traversing, I/O on the hard disk controller, communicates with drivers, and so forth. *All* of this, from the fopen to the calling of main and crt0startup that was called by CreateProcess from inside explorer.exe when you double clicked the icon to q3arena.exe, is real native hard concrete assembly code per the Intel 80x86 programming interface specifications, there is no interpreter. The bytes in memory are the bytes the CPU "runs" naturally at 3 Ghz.

On a Java or interpreted language system (including C#, wscript, .NET, and a multitude of simultaneous technologies on Win32) you have a VM/interpreter software that runs as user mode process (the interpreter being a real x86 and Win32 coded application, or whatever the host platform is) that acts as an emulator to parse a common single pseudo instruction set (a big switch/case in simple terms). The beauty of this is that a program written in an interpreted language is compiled once, and run the same on any host environment that has a native interpreter for that language.

The JIT system is built into the interpreter on a given host, is just a runtime compiler or cross assembler that translates the psuedo instructions into real instructions, allocates executable virtual memory for them using system calls from the host OS, and simply calls it directly as native code with a JMP or CALL instructions (think function pointer semantics). It has to be JIT because it's dependent on the host environment at run time, and compiling to native in advance defeats the purpose of languages like Java and would require dozens of versions. Think of JIT as delayed final assembly and linking to CPU instructions.

For example, if the Java byte code says

"push 1; push 2; add; pop answer"

Any system with a Java application can read that code and produce the desired results. The JIT cross assembler built into a x86 version of the interpreter might write that as the equivalent x86 code

"mov ax, 1; add ax, 2; mov [answer], ax"

And the instruction op codes in x86 for "mov ax, 1; add ax, 2" are simply "B8 01 00 05 02 00" or 6 bytes written to the address of a function pointer, then called. Two native x86 instructions compared to the hundreds it would take to run two iterations of a main for loop of the interpreter to read the original pseudo code, decode the parameters, access the variables, increment the pseudo code pointer etc. Where the system interacts with the OS, a simple call or jmp instruction is created in the translated target code with the appropriate Windows DLL thunk (or simply dispatched by the virtual machine directly the normal way, since a system call is performance bound anyway).

The OS with the aid of hardware memory and privilege management always has absolute authority, provided it's the first task to run on boot and properly sets up the environment, and any low level system access must be granted by the OS or provided by an embedded device driver with kernel privilege installed with the user's permission with admin rights.

This is the way over simplified explanation (who uses 16 bit x86, honestly!). Also in reality, in silicon, modern CISC cpus operate on the principle of microcode where the CISC assembly instructions, the lowest level x86 that we can program against, that we can place in ram to run, are actually broken down internally by the CPU scheduler as re-arrangeable micro ops that we never see or have any control over, and are for all intents and purposes irrelevant. Even a simple "load register" instruction on a RISC CPU breaks down into numerous micro ops that control various address generators, multiplexers to select the appropriate register, mixed with unrelated micro ops, etc. This is all internal and invisible, what we see as the lowest level possible software programmer is the CPU fetching x86 assembly from RAM.

In summary, no, as far as programming is concerned (if you aren't a CPU silicon designer, pretend micro ops don't exist), something written in C/asm on a Win32 system is as low as it gets, it's the real deal, it's not interpreted at all. The raw x86 assembly code sits in a .exe/.dll file, is loaded into memory, and executed as is. It's all hardware silicon running it natively from that point. The x86 assembly code IS the machine code that the CPU transistors process. Loading a native Windows .exe and calling WinMain is the same thing as sticking ARM 7 instructions in a flash cartridge and booting up your Gameboy Advance.

Keep in mind though that many interpreted languages are also used on Win32 and use native applications (called run times or virtual machines) to process non native languages (cmd.exe or wscript.exe to run a simple .bat or .vbs file for example), however any IO with the system must be native at some point when Windows is called to do something since Windows itself is mostly native x86.

Cross OS application compatibility does not exist, unless you have an entire x86 PC virtual environment (eg: VMWare), or the application is purposely built to abstract and minimize OS dependency, but even then the .exe file will be incompatible and need to be rebuilt only if the software is written for multiple OS (there is far too much information in a PE/Elf file to simply write a loader without writing the entire environment that the executable depends on).

fail · Jan 11, 2011

exdeath said:
C/assembly/Win32 code is all native x86. Calling a function like fopen in the C runtime on a Windows system ultimately calls, via a wrapper, the Win32 API native OpenFile function, all of which ultimately calls the only way to interact with the file system through the kernel via a call to NtCreateFile in ntdll.dll which invokes an "int 2e" instruction, a kernel system service dispatch, which transfers control to the OS to run more x86 code that performs rights management, file system traversing, I/O on the hard disk controller, communicates with drivers, and so forth. *All* of this, from the fopen to the calling of main and crt0startup that was called by CreateProcess from inside explorer.exe when you double clicked the icon to q3arena.exe, is real native hard concrete assembly code per the Intel 80x86 programming interface specifications, there is no interpreter. The bytes in memory are the bytes the CPU "runs" naturally at 3 Ghz.

On a Java or interpreted language system (including C#, wscript, .NET, and a multitude of simultaneous technologies on Win32) you have a VM/interpreter software that runs as user mode process (the interpreter being a real x86 and Win32 coded application, or whatever the host platform is) that acts as an emulator to parse a common single pseudo instruction set (a big switch/case in simple terms). The beauty of this is that a program written in an interpreted language is compiled once, and run the same on any host environment that has a native interpreter for that language.

The JIT system is built into the interpreter on a given host, is just a runtime compiler or cross assembler that translates the psuedo instructions into real instructions, allocates executable virtual memory for them using system calls from the host OS, and simply calls it directly as native code with a JMP or CALL instructions (think function pointer semantics). It has to be JIT because it's dependent on the host environment at run time, and compiling to native in advance defeats the purpose of languages like Java and would require dozens of versions. Think of JIT as delayed final assembly and linking to CPU instructions.

For example, if the Java byte code says

"push 1; push 2; add; pop answer"

Any system with a Java application can read that code and produce the desired results. The JIT cross assembler built into a x86 version of the interpreter might write that as the equivalent x86 code

"mov ax, 1; add ax, 2; mov [answer], ax"

And the instruction op codes in x86 for "mov ax, 1; add ax, 2" are simply "B8 01 00 05 02 00" or 6 bytes written to the address of a function pointer, then called. Two native x86 instructions compared to the hundreds it would take to run two iterations of a main for loop of the interpreter to read the original pseudo code, decode the parameters, access the variables, increment the pseudo code pointer etc. Where the system interacts with the OS, a simple call or jmp instruction is created in the translated target code with the appropriate Windows DLL thunk (or simply dispatched by the virtual machine directly the normal way, since a system call is performance bound anyway).

The OS with the aid of hardware memory and privilege management always has absolute authority, provided it's the first task to run on boot and properly sets up the environment, and any low level system access must be granted by the OS or provided by an embedded device driver with kernel privilege installed with the user's permission with admin rights.

This is the way over simplified explanation (who uses 16 bit x86, honestly!). Also in reality, in silicon, modern CISC cpus operate on the principle of microcode where the CISC assembly instructions, the lowest level x86 that we can program against, that we can place in ram to run, are actually broken down internally by the CPU scheduler as re-arrangeable micro ops that we never see or have any control over, and are for all intents and purposes irrelevant. Even a simple "load register" instruction on a RISC CPU breaks down into numerous micro ops that control various address generators, multiplexers to select the appropriate register, mixed with unrelated micro ops, etc. This is all internal and invisible, what we see as the lowest level possible software programmer is the CPU fetching x86 assembly from RAM.

In summary, no, as far as programming is concerned (if you aren't a CPU silicon designer, pretend micro ops don't exist), something written in C/asm on a Win32 system is as low as it gets, it's the real deal, it's not interpreted at all. The raw x86 assembly code sits in a .exe/.dll file, is loaded into memory, and executed as is. It's all hardware silicon running it natively from that point. The x86 assembly code IS the machine code that the CPU transistors process. Loading a native Windows .exe and calling WinMain is the same thing as sticking ARM 7 instructions in a flash cartridge and booting up your Gameboy Advance.

Keep in mind though that many interpreted languages are also used on Win32 and use native applications (called run times or virtual machines) to process non native languages (cmd.exe or wscript.exe to run a simple .bat or .vbs file for example), however any IO with the system must be native at some point when Windows is called to do something since Windows itself is mostly native x86.

Cross OS application compatibility does not exist, unless you have an entire x86 PC virtual environment (eg: VMWare), or the application is purposely built to abstract and minimize OS dependency, but even then the .exe file will be incompatible and need to be rebuilt only if the software is written for multiple OS (there is far too much information in a PE/Elf file to simply write a loader without writing the entire environment that the executable depends on).

Interesting but fundamentally incorrect. Only binary machine code can be executed by the processor. Assembly code, C code, or Win32 code cannot be executed by a x86 processor. It must be converted to x86 machine code. Also incorrect on RISC micro ops. There are many RISC processors with instructions that are not broken down into micro ops.

Venix · Jan 11, 2011

It just looks like he's using incorrect terminology. Replace "x86 assembly" with "x86 machine code" and it's correct. His comments about C pretty obviously meant "compiled and linked C", not "C source code". He also quite obviously mistyped RISC when he meant CISC.

If you really want to be pedantic, int 2e was deprecated a long time ago in favor of syscall/sysenter. Not that it matters, since the point was just that the process is running directly on the CPU, not indirectly through some kind of OS translation layer. At its most basic conceptual level, the OS is pretty much just some system calls a process can explicitly call into, plus some exception/interrupt handlers that pre-empt the process when the timer interrupt fires, some peripheral raises an interrupt, a page fault occurs, etc.

exdeath · Jan 11, 2011

fail said:
Interesting but fundamentally incorrect. Only binary machine code can be executed by the processor. Assembly code, C code, or Win32 code cannot be executed by a x86 processor. It must be converted to x86 machine code. Also incorrect on RISC micro ops. There are many RISC processors with instructions that are not broken down into micro ops.

Being that assembly and machine code are one and the same to me, that is to say "mov ax, 13" == "B8 13 00", it's very correct, as that is what the CPU is pulling over the data bus on an instruction fetch. It is given by ops post that C must be compile into assembly/machine code, it's obvious everyone here knows that a CPU does not run C source. And given ops prior knowledge and topic, Im confident that he knows what I mean by "running assemby code" means assembled opcodes and not ASCII text .s or .asm files

If youre hung up on the semantic definition of assembly being "english mnemonics" and machine code being the opcodes those mnemonics represent, fine, but its really one and the same. The assembly mneumonic representation is no different than representing binary as hex because it's more convenient.

I was also being as vague and general as possible, not covering the"many" exceptions.

mutz · Jan 11, 2011

guys, take it slowly here (please?),
take away all these "function calls", "privilege rights", all these strange functions, wrappers and so on (please? ) o.k, thank u,
lets make it as simple as possible, as i'm almost 100% unaware of these inter-processes happening within the OS and ~99.9% as to what ever is included within programing..

few questions,
you take a pre-compiled program and you link and compile it, linking is to the DLL's right?
then you got an executable file or a DLL or whatever you'd like.
if you open it up with a debugger, it is put in assembly and/or ASCII.

if you would open it up with a text editor, then you'd see all these strange ASCII marks crumbled together.

now, the OS, or the kernel or whatever (i'll read the article later ModelWorks, thanks), takes this code, uploads it to memory and then it is being injected generally to the CPU.

first, why is it scrambled when you open it up with the text editor? why isn't it shown like 8B C0 03 D1 and so on? that is just a general question,

second, this is a great confusion,
the code you see on the display is actually electrical signals which in this way or\\or the other represents different byte streams which in regards represents different symbols/instructions and so on.

now as i see it, the display is of course only for the user, the HW doesn't need it, it is just like an out side source for checking out what is happening within the CPU HDD and memory.
so it's like in a water facility which operates and you take samples every now or then or constantly monitoring the quality or characteristics of the monitored medium,
water represents the current or byte stream.

the actual fact that what you see on the monitor is byte or shapes for that matter, and what actually happens within the HW is electrical which you cannot see/hear or feel less the heat is like a sort of Fata Morgana causing me a great deal of confusion and grief.

maybe before getting to these sub levels of even the ISA or micro-code itself, it is important to figure out the electrical aspects of the technology, and sorry if it takes this subject a little bit east from it's original meaning or label of the thread.

are we ready for it?
i'll start and see if you'r following.
first what we have is oscillators, right?
we have piezoelectric crystals fit inside these small silvery boxes on the MB which in response to electrical voltage exert or ejects a continuous square signal.
now this signal is being stabilized multiplied or divided by a PLL and goes somewhere.
i'm not sure whether the CPU has PLL's in it one or many or it is getting an outside signal from the MB PLL chip or so and then if the CPU multiplier works on this sole PLL or there are others which are affected from it or...

now we have to sort of cut or shape this signal into ups and downs meaning 0's and 1's right?
every piece of the modern PC, the HDD, the GC's, the RAID cards, everything is working with these electric binary phone calls, high-ways, chit chats, ASIC's, FPGA's, CNC machines, every piece of electrical technology almost on this planet maybe less these newer photon chips which are being developed.

now, we have this technology and we want it to talk with actual printed HW which is amazing,
the fact that these machines, these pieces of metal can actually communicate with each other, even that it cannot actually think, is pretty impressive,
the fact that these small instruments have such a tremendous power and man has managed to make it work is actually -even though it can be obvious to some-
remarkably miraculous or certainly sometimes very hard to believe.

so what happens now?
yes ofcourse you have logic gates and maybe every function or micro-code is printed inside the CPU, the ALU, so different gates to ADD, other gates to JMP and so on.
i'm quite shun by the immense complexity of this and the amount of works it probably takes to create or redesign each of these CPU's they sometimes sell for mere 100$ in the markets.

so what is happening inside?
there are many questions here,
you get this signal that is probably divided to different parts of the CPU (CPU as general term), now what goes then,
if any of you can try and describe what happens inside, where it starts, where it goes,
maybe that can clear up the confusion as it is quite a complex matter which is hard to pull out a simple question or a route to follow on the quest to understanding profoundly, so profoundly that you can actually feel like you hold it in the palm of you'r hand.

exdeath · Jan 11, 2011

The lowest programming by a software developer is assembly. Assembly is just shorthand for the opcodes they represent. "mov ax, 1" stands for the opcode B80100 (3 bytes hex). It is those bytes that an x86 cpu natively consumes and executes. If you look at that in binary, instead of hex, each group of bits of 1s and 0s is an individual field that ultimately ends up as the electrical state of the inputs in the cpu control system, eg the base or gate input to transistors that select the source and destination registers (the strings of an abacus), what opperation the alu performs, etc. Think of the inside of the cpu as a railroad yard, and the instruction bits are the position of the levers in a control room, to put it as simply as possible.

A cpu is a very primitive device like an abacus that performs very simple operations on its strings of beads repeatedly and very fast. Instructions are simple, like multiply string a with string b, copy what's on string b to string c, look at the next value on your piece of paper (ram) and put it on string b, etc.

Bytes are just bytes, regardless of code or data. The value FF0000 can be an instruction, such as "add 0 to string f" but when placed in a graphics frame buffer, its interpreted by the display as "red" pixel (rgb 255 0 0). You can execute data until you crash since the result is random incoherent instructions that have no structure or meaning, or you can display code as data, and see random colors and garbage instead of an organized picture. The ascii you see when you open a binary file us because B8 01 00 does not map to A-Z, 0-9, etc. "0" us actually the ascsii character symbol for a byte with value 30 (hex) not 00 (hex).

You should really learn machine organization with a RISC system like MIPS, where you can see the actual machine functions, eg switch positions for all the components in the rail yard for that instruction, in the instruction bits themselves. x86 is rather complicated and not so straight forward.

Mark R · Jan 11, 2011

mutz said:
few questions,
you take a pre-compiled program and you link and compile it, linking is to the DLL's right?
then you got an executable file or a DLL or whatever you'd like.
if you open it up with a debugger, it is put in assembly and/or ASCII.

if you would open it up with a text editor, then you'd see all these strange ASCII marks crumbled together.

A pre-compiled program consists of 'binary' data - basically, just a stream of numbers, because, for convenience purposes, computers are designed to group 8 bits into bytes - and each byte can therefore represent a number between 0 and 255.

If you load that data into an editor, how the editor displays them is up to the designer of the editor. You can use decimal numbers, hexadecimal or whatever is convenient for you. The thing about computer data is that it is just binary numbers, what they mean is all due to context. The number 169 could represent the copyright symbol (if the file is a text file), or it could represent the command 'load the next value into the accumulator register' (if this is machine code intended to run on a 6502 CPU). The trick is to ensure that your editor (or decoder) is the right type of editor for the data.

So, if I have a set of bytes that read: 169, 255, 141, 4, 0, 240
Written like that it's meaningless to most people. A text editor would render that data as gibberish. A hex editor would render it as A9 FF 8D 04 00 E0 (neater, but still meaningless). A disassembler (which decodes CPU instructions - in this case, this is for a 6502 CPU) would render it as:
LDA #$FF
STA #$0400
RTS
This is a human readable representation of a series of CPU instructions.

An 'assembler' is a program that performs the translation from human readable CPU instructions and converts them into 'machine code' - the stream of numbers that the CPU will pull from memory. E.g. it will convert the line 'LDA #$FF' into the numbers 169, 255.

A compiler takes a human-readable program, which is not directly representative of what the CPU will do, and converts it into some form of binary code (usually machine code). E.g. a compiler will take the line 'B=A *10; Print B' and produce a series of CPU instructions. This is not necessarily the whole conversion into a working program, the instruction sequence corresponding to 'print' may be contained in another file. The job of the linker is to take your compiled program, the compiled program corresponding to the 'print' command, and link them together - into a single working program.

now, the OS, or the kernel or whatever (i'll read the article later ModelWorks, thanks), takes this code, uploads it to memory and then it is being injected generally to the CPU.

first, why is it scrambled when you open it up with the text editor? why isn't it shown like 8B C0 03 D1 and so on? that is just a general question,

When you run a program, the job of the OS is to load the CPU instructions into RAM, so that the CPU can access them. Then instruct the CPU to start executing the instructions.

Seeing as you mentioned DLLs, I'll explain them. We talked about the 'linker' merging multiple compiled (translated) program fragments into a single program. But imagine you've got an app, or an OS, that has hundreds of individual programs. It's a waste of disk space, for every single app to have its own copy of the 'print' (and 1000 other commands) command merged in. The programs get big and bloated, slow to load, need more memory, etc.

So MS invented the DLL. Programs could be left 'unlinked' (or incompletely linked) by the linker. Instead, the OS would have it's own built in linker, which would activate when an incompletely linked program was loaded. The DLL would contain frequently used program fragments, which would be used by multiple programs. When one of the programs is loaded, the fragments from the DLL would be loaded in and merged in RAM.

The display is scrambled because you've used the wrong editor. You've used an editor that assumes that a file is text, and renders the data as if it were text. If you used an editor that assumed the file is hex, then it would show in hex, etc.

second, this is a great confusion,
the code you see on the display is actually electrical signals which in this way or\\or the other represents different byte streams which in regards represents different symbols/instructions and so on.

now as i see it, the display is of course only for the user, the HW doesn't need it, it is just like an out side source for checking out what is happening within the CPU HDD and memory.
so it's like in a water facility which operates and you take samples every now or then or constantly monitoring the quality or characteristics of the monitored medium,
water represents the current or byte stream.

That's right. The code represents electrical signals. The code '8D' would actually mean that the wires connecting RAM to the CPU labelled D7, D3, D2 and D0 were 'on' and D6, D5, D4 and D1 were 'off'. You could write 8D in binary (10001101) to accurately represent the state of the signals. But it's a lot more difficult for humans to read and write the latter.

On the very earliest computers, the programs were indeed entered in binary (either through flipping switches on the front panel, or by punching holes in a card - so that in a card reader the holes would trigger switches that sent signals to the CPU).

maybe before getting to these sub levels of even the ISA or micro-code itself, it is important to figure out the electrical aspects of the technology, and sorry if it takes this subject a little bit east from it's original meaning or label of the thread.

so what is happening inside?
there are many questions here,
you get this signal that is probably divided to different parts of the CPU (CPU as general term), now what goes then,
if any of you can try and describe what happens inside, where it starts, where it goes,
maybe that can clear up the confusion as it is quite a complex matter which is hard to pull out a simple question or a route to follow on the quest to understanding profoundly, so profoundly that you can actually feel like you hold it in the palm of you'r hand.

One of the problems is that modern CPUs are so astonishingly complex that there may be only one or two people (maybe none at all) who actually know how the whole thing works. They are designed by huge teams, each working on a module - or working out ways in which to connect the modules.

But you are right, each operation has a circuit associated with it, and the instruction signals, activate and deactivate the relevant circuits.

In many ways, it's helpful to go back to much earlier computers to see how they work. The 6502 CPU was sufficiently simple that 2 guys designed it by hand, drawing the entire schematic out on a big sheet of paper, on a kitchen table. (Apparently, the name 6502 came from the fact that the schematic had 6502 transistors in it).

mutz · Jan 11, 2011

o.k, a bit more,

The ascii you see when you open a binary file us because B8 01 00 does not map to A-Z, 0-9

so basic notepad has only the capabilities to transform or read code into ASCII marks such as when you open up command prompt and the debugger has built in libraries which allows it to translate B8 or so to ADD, 02 or so to ax or bx and so on?
does it translate the byte stream i.e 00101110 and so on to op-code and assembly which means you have to input these streams when building one or is it going else i.e it reads ASCII from the file like opening it first doing JIT debugging, reading the ASCII letters and marks and translating them to readable output?
(JIT meaning rather then AOT as opening up a 1MB file through notepad can take some time)
though it actually manages to open files bigger than 1MB in almost an instance with the memory table and all, so it probably isn't working such as written above, so basically, it reads binary stream strait from the memory and translate them?
why through notepad it takes so much time? or is it just for non-text files such images and such so the rendering engine takes much longer to process?
it sounds odd actually as it is only ASCII anyways, not like opening any complex fractal or so though still maybe it could take time to process all these thousands of letters.

A pre-compiled program consists of 'binary' data - basically, just a stream of numbers, because, for convenience purposes, computers are designed to group 8 bits into bytes - and each byte can therefore represent a number between 0 and 255.

If you load that data into an editor, how the editor displays them is up to the designer of the editor. You can use decimal numbers, hexadecimal or whatever is convenient for you. The thing about computer data is that it is just binary numbers, what they mean is all due to context. The number 169 could represent the copyright symbol (if the file is a text file), or it could represent the command 'load the next value into the accumulator register' (if this is machine code intended to run on a 6502 CPU). The trick is to ensure that your editor (or decoder) is the right type of editor for the data.

So, if I have a set of bytes that read: 169, 255, 141, 4, 0, 240
Written like that it's meaningless to most people. A text editor would render that data as gibberish. A hex editor would render it as A9 FF 8D 04 00 E0 (neater, but still meaningless). A disassembler (which decodes CPU instructions - in this case, this is for a 6502 CPU) would render it as:
LDA #$FF
STA #$0400
RTS
This is a human readable representation of a series of CPU instructions.

let's just make it clear here,
machine code, means the byte stream, 01010100 and so on, what the CPU actually reads, and not op-code?,
op-code is the commands issued as hex bytes, B0 C1 00 20 F1 and so on,
what is the op-code for?
the assembly instructions are readable, the byte stream is barely readable but is viewable by the debugger or at least some of them,
so the op-code is just another human readable representation in hex or does it have some other usable function? the ASCII as well?

The job of the linker is to take your compiled program, the compiled program corresponding to the 'print' command, and link them together - into a single working program.

so it takes functions from DLL's? take a bit of the DLL file and transforms the "print" mnemonic into program code the compiler adds to the compilation, and transforms it into machine code, so it basically save you the typing time, like an API.

so does it run the DLL?
does the DLL has functions like "if print, then write in all the opcode required for that operation at the input file (the compiled program)" and so on for every command?

btw,

The display is scrambled because you've used the wrong editor. You've used an editor that assumes that a file is text, and renders the data as if it were text. If you used an editor that assumed the file is hex, then it would show in hex, etc.

so thats exactly like trying to open a bmp image with a GIF editor or so?
that's pretty much understood, each editor is for it's own purpose operading with different engine, same like compilers, instruction sets which would fit in this OS and not the other (windows for ARM or RISC processors), there are so many broken ends in this field.
just read an old article at ARS-Technica about the meaning of ISA and evolution of emulated ISA with all the issues the earlier has solved but brought up as well,
the ISA gave chance for developers to use they're programs across many platforms while before they had to fit the program for each specific machine, which means a lot of work i.e a lot of expenditures and being unable to tweak the HW after the software was built in, it might have caused too many instabilities if it would have worked at all,
it's quite unbelievable they used to work like that at these days, seems very tedious.
the other side for this formula was that they built so many programs to the current ISA's that they would not fit to later ones when the need comes to move on,
this is a terribly complex field lol, one solution makes the next problem.
here's that article, it's a bit complex but for people here it should be fairly easy,
it is very interesting, worth having a look in .

exdeath · Jan 12, 2011

If you have XP, go to a command prompt and run the program "debug".

Google for "DOS debug tutorial" or similar search strings.

Learn lots. Real time view of both code and data. A to type in instructions. U to unassemble and see what bytes represent as code, D to dump the same bytes as data and see the same values as both hex and ASCII. T to run the instructions one at a time. R to view the effects on the CPU registers. (just summarizing main commands here, you'll need to see a proper tutorial to understand them).

The interactive instant environment of MS debug is probably the simplest and greatest tool for learning how assembly/machine code/CPU works.

You're asking for a lot of overwhelming info to be dumped here in an instant. Best way to learn is to do.

exdeath · Jan 12, 2011

mutz said:
o.k, a bit more,

so basic notepad has only the capabilities to transform or read code into ASCII marks such as when you open up command prompt and the debugger has built in libraries which allows it to translate B8 or so to ADD, 02 or so to ax or bx and so on?
does it translate the byte stream i.e 00101110 and so on to op-code and assembly which means you have to input these streams when building one or is it going else i.e it reads ASCII from the file like opening it first doing JIT debugging, reading the ASCII letters and marks and translating them to readable output?

1) you type out: mov ax, 1 in a ASCII text file

2) you assemble it with an assembler that writes a binary file of length 3 bytes containing the opcode for mov ax, 1, which is B8 01 00 (in hex) or 101110000000000100000000 (in binary)

3) by some means the contents of that binary file is placed into RAM at an address (typically the OS, as in Windows loading an .exe using NtCreateProcess).

4) the current running process which loaded that binary file into RAM at some decided upon location in memory, or address, then calls a JMP or CALL instruction with the address of where it was loaded. After the CPU executes the CALL or JMP, it's staring right at our instruction B8 01 00 and it becomes the next instruction executed

5) the bytes B8 01 00 are fetched by the CPU over the data bus and executed, the contents of the CPUs AX register is now the number 1 in less than a billionth of a second at 1+ Ghz.

6) CPU continues fetching instructions from where it left off one after the other. In this case, it just happily runs whatever random bytes are in memory ofter our one instruction until it crashes, more likely than not from an invalid opcode (since they are just random bytes and not actually a coherent program or valid x86 formatted instructions).

When you compile in a high level language like, the C to assembly is an intermediate step, you typically go straight from C to a directly to a machine code binary .exe when you hit "build" and compile in one seamless step.

Ok so back to the ASCII file with "mov ax, 1" in it. That is a 9 byte file containing the bytes 6D 6F 76 20 61 78 2C 20 31 which are the ASCII characters for "m" "o" "v" "<space> "a" "x" "," "<space>", "1" This byte sequence coincidentally does not represent any valid 8088 instruction and would fail immediately with an "illegal instruction" if you tried to execute it. Loaded in an assembly editor as opcodes, you would see lots of ?? ?? ?? because the bytes aren't instructions. Loaded in notepad, you'd simply see the ASCII text "mov ax, 1"

Now we use an assembler to assemble that ASCII assembly source into the actual 3 bytes it literally represents, which are B8 01 00. When viewed in notepad, you see the ASCII characters for B8 01 00 which is the 3 character text "¸ " which probably won't show up here because they are special characters and not A-Z a-z 0-9, etc. When you execute it on the CPU, it puts the number 1 in the AX register and it's done with that one instruction in a billionth of a second.

Typically when working at this level you don't have an ASCII text file with the assembler code written out as the former above, you are working in a debugging environment looking at the actual instructions in memory like the latter B8 01 00, and the software interface which is writing out the text lines will interpret that instruction and appropriately display in font "mov ax, 1" on your screen. Debug will do this for example. When you assemble in debug, you write "mov ax, 1" and it writes B8 01 00 in memory, and when you dump or unassemble , it will be shown back to you as B8 01 00 or "mov ax, 1" respectively, the ASCII string 6D 6F 76 20 61 78 2C 20 31 for the characters "m" "o" "v" "<space> "a" "x" "," "<space>", "1" never actually exist, except for in display memory as a font for you to read. It's the debug program translating for you in real time into ASCII text so you can read it. "add 2 to the contents of ax" is much more meaningful to a human than the machine opcode 05 02 00 when writing complex programs. Thus any time the byte sequence 05 02 00 is encountered in RAM your assembly/debug environment showing you that RAM writes the string "add ax, 2" to your screen so you can read it.

Play with debug for a few hours, then it will suddenly click and you'll see how awesome it all is, esp when you start writing random bytes to B800:0000 and see characters magically appearing on your screen Don't worry, in Windows cmd prompt and debug is a 16 bit virtual machine environment, you can't hurt anything. Windows will catch anything dangerous and kill it immediately with the well known "illegal operation" or "privileged instruction" if you try. That leaves flat protected mode, virtual memory, interrupts, multitasking, and kernel mode for you in another topic one day

exdeath · Jan 12, 2011

I cannot stress how awesome debug is for intro to assembly and CPU theory. Do play with it :awe:

Shame they removed it finally in Windows 7.

windows code interpretation

mutz

Senior member

Modelworks

Lifer

exdeath

Lifer

fail

Member

Venix

Golden Member

exdeath

Lifer

mutz

Senior member

exdeath

Lifer

Mark R

Diamond Member

mutz

Senior member

exdeath

Lifer

exdeath

Lifer

exdeath

Lifer

TRENDING THREADS