Change background image
  1. What's up? I see you're viewing as a Guest. How about registering, it only takes like 2 minutes. This will enable you to do more on our forum and stay updated.

Coding in Binary

Discussion in 'Developer General' started by 3nvisi0n, Jun 21, 2011.

Thread Status:
This thread is more than 180 days old.
  1. 3nvisi0n

    3nvisi0n The R3v0lu710n Super-Mod

    Hello Again,
    *Warning: In some areas this is not exactly how it works but its close enough for you to understand the concept and so I can explain it without losing you.
    * The Intel manual i refer to is avaliabl at the link at the bottom of this post.
    This tutorial is pretty much just for knowledge sake, I don't expect any of you to follow along I partly did this just to do it on windows(I've never done this on windows 32bit machine only on MIPS32 machine for a class I took in university)

    Anyway basically what I will be doing is taking some code from a 3rd generation language, move it to a 2nd generation by hand, and finally to 1st generation.

    You area probably wondering what I mean by generation. Well it comes from the development of programming, we started by entering code through switches essentially on or off(1 and 0 respectively) These were executed directly by the CPU, the one and off is what the CPU understood. So digitizing that from the switches 1 and 0s were the first programming language(symbolically.) This is considered the 1st generation. 1st Generation is specific to the individual making porting it to another computer difficult.

    After some time Assembly Languages came about, they are the second generation. This code while still some what difficult to understand could be learned and used by a programmer. Like the 1st generation it is specific to the processor using command words specific to it. This code can be directly turned into 1st generation 1s and 0s.

    Third generation is what most of us are used to, languages like C, C++, Visual Basic are 3rd generation languages. Ideally a 3rd generation language is not processor specific, for example if I write some C code I can compile that code to a MIPS32 machine, or a Intel machine, or PowerPC. Visual Basic is not quite like this, it is a 3rd generation language but it leans heavily upon Windows only commands so even if you could compile it to another system it would not work unless it was running windows...

    So when you write your code in a 3rd generation language your compiler or interpreter(google for the difference) will change the code you gave it into lines of code at the earlier generation(2) and then that 2nd generation can be made into 1st generation.

    While it is translating the code it is also optimizing the code. This is why I don't recommend writing code in binary or actually doing this tutorial. Computer can optimize your code make it better and faster. Not to mention it is a pain to actually write the code in binary(or any number system) Even if you write the code in Assembly which is almost directly to the processor the assembler can still make improvements over your own code ensuring it runs its fastest. I say this because some months ago I heard of some people on XAT that claimed to program directly in binary. Of course I highly doubt that but even if they did, well it is unwise since they forsake the ability to let a computer help speed up their code. Writing in assembly is where you go if you want to write fast with minimal over heard, binary is too direct.

    Moving on...The tutorial is going to cover in general the steps to creating by hand an exe in a Windows 32bit environment.

    The Program we will make is going to be VERY simple, a Windows Message Box containing the message "newhax.com" with the title "3nvisi0n" and containing only an OK button.

    C/C++(NOt very different in this case)
    Code:
    #include <windows.h>
    int main()
    {
        MessageBox(NULL,"3nvisi0n","newhax.com", MB_OK);
    }
    Visual Basic 2010
    Code:
    Module Module1
    
        Sub Main()
            MsgBox("newhax.com", vbOKOnly, "3nvisi0n")
        End Sub
    
    End Module
    
    The First stage is to make this 3rd generation code into Assembly.
    Just so you guys know I'm not doing this exactly how a compiler would, I'm just writing the simplest code I can muster up to do the job in assembly.
    Code:
    extern MessageBoxA
    import MessageBoxA user32.dll
    
    segment .data USE32
    creator	db "3nvisi0n",0
    display db "newhax.com",0
    
    segment .code USE32
    ..start:
    push dword 0h 		;hex code for the value of vbOkayOnly/MB_OK
    push dword creator	;Argument
    push dword display	;Argument
    push dword 0
    call [MessageBoxA]	;Call it
    ;I should be calling ExitProcess but I'm not...less work it will just stop working(error)
    
    The above assembly is what I will be turning into binary. This is where is gets difficult.
    As I said before this assembly code can be directly turned into 1st generation code. I can take these assembly commands like push and turn them into binary how is this done, well by understanding what the processor wants.

    I'm not going to bore you or waste your time explaining how to do this for every line, as the extern and imports get pretty nasty, but I will show how to figure out how to do it. First you need to see how the instructions are set up, I got this diagram from an intel manual...

    http://img824.imageshack.us/img824/7339/instructionformat.png

    In the same manual you can find the 'opcodes' in the appendix, this is what I used to figure it out.
    The line I will demonstrate is 'push dword 0h'
    push - this is the command, it is pushing the value onto the system stack
    dword - the type of data(dword is 32bits) and finally
    0h - the value, 0h means 0 in hex. Translated in binary to (00000000 00000000 00000000 00000000)
    Now you might be wondering why 0h how did I get that, well 0h is the value of vbOKOnly in Visual Basic and of MB_OK in C, try it out in visual masg Debug.print vbOKOnly will print 0, vbOK will output 1, and you can get the values that way, or by reading the C header file.

    On page 415 we find the Opcode for the Push, we are performed a Push imm32 (immediate32, it means we have the 32bit value right there) So according to the manual the opcode is 68(hex) or 0110 1000.

    Just a quick note on the side, converting from hex to binary or binary to hex is very easy as long as you can read both.

    Take this example of hex ABCD = 43981 in normal terms, to convert abcd to binary looks like this. *The number in () is the decimal...normal way or reading the number)
    A(10) | B(11) | C(12) | D(13) | Final || ABCD
    1010 | 1011 | 1100 | 1101 | Results || 1010 1011 1100 1101
    Decimal: 43981



    SO the opcode according to the diagram is 1 or two bytes, according to the manual we see it is one byte(8bits)
    Opcode - Got this above
    ModR/M - for this command we don't use any registers(google if you don't know what those are) so we dont need a mode or register
    SIB - Not using any memory so again we don't use this
    Displacement - is normally 0 but depending on prior memory can be higher
    Immediate - This is the value we are pushing, so 0 in this case and we said it was a dword so 32bits
    | opcode |ModR/M| SIB | Displacement | Immediate |
    | 68(hex) | null |null | 0 | 0 |
    |0110 1000| | | 0000 | 0000 0000 0000 0000 0000 0000 0000 0000 |

    Now that seems like such a waste of space, if I were doing it by hand I'd be stuck with that but if you use an assembler it will catch my mistake of calling it a DWORD and will change the command to 'push 0' it would drop the DWORD i stated and just treat the number as it is an 8bit byte. I guess I shouldn't be copy and pasting code from the internet, :S I guess I don't need DWORD in any of it.

    SO we need to convert that command again:
    'push 0'
    opcode - 6A according to the manual for push imm8
    ModR/M - still not using registers so not used
    SIB - no memory in play so also not used
    Displacement - still 0(4bit)
    Immediate - byte equaling 0 so now we get
    | opcode | displacement| Immediate |
    | 6A(hex) | 0 | 0 |
    |0110 1010| 0000 | 0000 |
    Final:
    Hex: 6A 00
    Binary:0110 1010 0000 0000


    Now, Moving on to the next line of code(remember no need for DWORD like i had it the first source.
    Code:
    push creator	;Argument 
    Okay so this one is a little different
    opcode - Since we are dealing with a memory now(the creator variable is in memory) so this time we really can use the imm32 because we are going to push to location(32bits) of the data we want(the string) not the actual data
    Mod isn't needed no registers this time
    SIB since we are only using the location and not the memory itself, no needed for this
    Note: I am using 40 10 00 as the location where the variable start in memory in this example.
    Displacement - still 0(8bits not because we are using 32bit data)
    Immediate: This is a tricky one, I don't want to get into how the code is stored to actually determine where it will be(this is compiler assemblers and linkers do for you) I'm just going to skip ahead and tell you in this example I will have it starting at 40 10(hex) 00(null ends it) in machine code(binary)
    Offset - We are using an offset in this case, so the 10 from immediate actually goes in offset before the immediate

    | opcode |ModR/M| SIB | displacement | Offset | Immediate |
    | 68(hex) | null | null| 00 | 10 | 40 00 |
    |0110 1000| | | 0000 0000 | 1010 | 0100 0000 0000 0000 |
    Final:
    Hex: 68 00 10 40 00
    Binary:0110 1000 0000 0000 1010 0100 0000 0000 0000

    The next line
    Code:
    push display	;Argument
    Is very similar, this ones location is going to be immediately following the location of variable creator.
    opcode - 68(hex) just like the last line
    Mod - still not used
    SIB - like before not used
    Displacement - we use this for once, since this variable 'display'(value: '3nvisi0n\0', \0 is the null character it ends the string) out data still starts in the same location though 40 10 00, but since we have this variable we need to have a displacement of 9(3nvisi0n=8 characters +1 for null)
    Offset - Same as before, 10
    Immediate - like before we start at 40 10 00(end with 00 null)

    | opcode |ModR/M| SIB | displacement | Offset | Immediate |
    | 68(hex) | null | null| 09 | 10 | 40 00 |
    |0110 1000| | | 0000 1001 | 1010 | 0100 0000 0000 0000 |
    Final:
    Hex: 68 09 10 40 00
    Binary: 0110 1000 0000 1001 1010 0100 0000 0000 0000

    The next line is the same as the first line

    Hex: 6A 00
    Binary:0110 1010 0000 0000

    The final line is
    Code:
    call [MessageBoxA]	;Call it
    This gets a little tricky
    First the opcode: On page 536 in the manual we find the information on Call, we are calling MessageBoxA a procedure included from memory, so it is a Call procedure in same segment and is indirect.
    The manual tells us this about it its form "memory indirect 1111 1111 : mod 010 r/m" so that is what we will start with, opcode: 1111 1111(FF in hex)
    Opcode: FF
    ModR/M - We use this finally, the manual tells us to seperate mod from r/m with 011(this is called to register opcode) so looking at page 26 of the manual knowing we will be using DI with no addition information we only need to look at the first major row, and find DI within in.(if you don't know why DI you need to learn a little assembly, google :))
    So since we are in the first major row the ModCode is 00 so we have 00(mod) 010(opcode) and now the R/M we will be using DI so the manual states the R/M is 101 giving us 00 010 101 (or in hex 15)
    SIB - We don't need this for the modR/M so it is blank again
    Note: I am using 40 30 00 as the location where procedure calls start in memory in this example. (so this particular one after the extern and include starts at 40 30 3C
    Displacement - 3C
    Offset - 30
    Immediate - 40 00(end with 00 null)
    | opcode | ModR/M |SIB |displacement| offset | Immediate |
    | FF(hex) | 15(hex) |null| 3C | 30 | 40 00 |
    |1111 1111|0001 0101 | | 0011 1010 |0011 0000| 0100 0000 0000 0000 |
    Final:
    Hex: FF 15 3C 30 40 00
    Binary: 1111 1111 0001 0101 0011 1010 0011 0000 0100 0000 0000 0000

    And so when all is said and done for those 5 main lines of program code we get:

    Hex:
    Code:
    6A 00 68 00 10 40 00 68 09 10 40 00 6A 00 FF 15 3C 30 40 00
    Binary:
    Code:
    0110 1010 0000 0000 0110 1000 0000 0000 1010 0100 0000 0000 0000 0110 1000 0000 1001 1010 0100 0000 0000 0000 0110 1010 0000 0000 1111 1111 0001 0101 0011 1010 0011 0000 0100 0000 0000 0000
    And that is how your normal code becomes executable by the computer. Of course there is more to it, you need to yet convert the segments and everything to binary, but this is just an introduction.
    Just to prove that this bit is actually correct I have run the code shown in here(The first assembly code, through NASM to assemble it and have it linked with alinker. Then opened it up in a disassembler to see the code it produced, and I am right on :D, it did indeed catch my error of stating dword and change it to fit as it should be. Here is a screenshot of the hex code produced by the program(and not by hand) it is green because it represents code. The ??s are not used.

    [​IMG]

    So as you should be able to see writing you own code in binary...is NOT realistic, at the lest you'll have to write it to assembly anyhow to get the binary. So you are not saving time by doing it yourself, nor are you writing better code, as the assembler can optimize your code better than you can in most cases.

    Intel Manual: http://www.mediafire.com/?p24qzc3r8ku4t5k

    Attached Files:

  2. That One Guy

    That One Guy Junior Member Member

    Envy dude, you make the best tutorials. This might just come in handy someday. +1 Rep!
  3. 3nvisi0n

    3nvisi0n The R3v0lu710n Super-Mod

    I sure hope not :P
    This tutorial really isn't intended to be helpful but to show you what a pain it is, let the assembler do it it does it better.
  4. dns

    dns Active Member Admin

    This truly goes to show everyone just how much simpler and quicker an IDE can help you create a program. Imagine having to write an entire program in only binary (without backtracking from a higher level language). I don't think he meant this as a tutorial, but more-less as a demonstration of the superior abilities of a high level programming language. Thanks for this envy, it's very interesting to see posts like this. +1
Thread Status:
This thread is more than 180 days old.

Share This Page