Bab 3: Dari source code ke biner

Daftar Isi Utama

Sebelum memahami bagaimana mengembalikan dari sebuah aplikasi biner, kita perlu memahami dulu bagaimana source code dikompilasi menjadi format biner. Format biner di sini bisa berarti kode mesin yang langsung dijalankan oleh prosessor, ataupun bytecode yang akan dijalankan oleh virtual machine.

Cara kompilasi berbagai bahasa untuk menjadi bentuk biner berbeda. Kode dalam C bisa dikompilasi menjadi executable langsung (EXE di Windows, binary file di Unix/OS lain) dalam kode mesin. Kode dalam bahasa Java akan dikompilasi menjadi .class yang akan dijalankan oleh Java Virtual Machine.

Di mata awam, program seperti py2exe bisa mengubah kode Python menjadi .exe, namun sebenarnya yang terjadi adalah interpreter python dan kode python dimasukkan ke satu file executable. Ketika file exe dijalankan, kode Python ini akan diekstrak -- bisa ke memori, bisa ke disk -- untuk dijalankan.

Untuk memudahkan, saya akan membahas beberapa proses transformasi yang umum bagaimana sebuah source code bisa menjadi file biner/executable. Saya hanya memberikan beberapa contoh saja. Sebaiknya ketika mempelajari sebuah bahasa, Anda mempelajari juga bagaimana proses transformasi (kompilasi, interpretasi) dan eksekusi kode dalam bahasa tersebut.

Kompilasi C ke native code
Kompilasi Pascal ke native code
Kompilasi Go ke native
Kompilasi Java ke .class
Kompilasi C# ke Common Intermediate Language (CIL)
1. Kompilasi
2. Melihat CIL

Kompilasi C ke native code

C merupakan bahasa yang sangat low level. Sangat mudah untuk melihat transformasi dari sebuah source code C menjadi assembly lalu menjadi biner. Biasanya proses transformasi sebuah kode dalam C ke biner adalah sebagai berikut:

file dalam bahasa C diolah oleh preprocessor menjadi kode C
file dalam bahasa C ditranslasi oleh compiler menjadi assembly
kode assembly di-assemble (oleh assembler) menjadi object/machine code
object code akan digabung dengan kode dari object code lain dan dengan fungsi library oleh linker menjadi executable

Saya menggunakan GNU Compiler Collection dalam berbagai contoh karena berbagai alasan:

Compiler ini berjalan di berbagai sistem operasi (Linux, OS X, dan termasuk juga Windows)
Compiler ini mendukung berbagai CPU (misalnya: Intel, ARM, MIPS)
Compiler ini gratis
Compiler ini open source, kodenya bisa kita lihat

Preprocessor

Preprocessor di C hanya melakukan beberapa hal dasar secara tekstual, tanpa memahami source codenya, hanya sekedar copy paste.

C ke assembly

Mari kita praktikkan dengan contoh kecil dalam bahasa C. Perhatikan bahwa di sini saya belum akan membahas mengenai kode assembly-nya, hanya ingin menunjukkan beberapa hal yaitu:

Compiler bisa diminta untuk menghasilkan assembly saja lalu berhenti
Tingkat optimasi compiler bisa diatur
Dalam tingkat optimasi tertentu, beberapa hal bisa hilang ketika proses kompilasi dilakukan (karena dioptimasi oleh compiler)
Ada berbagai bahasa assembly di dunia ini, tergantung target CPU, bahkan untuk CPU yang sama kadang ada lebih dari satu syntax

Untuk berbagai contoh, defaultnya saya memakai memakai Linux AMD64 (Linux 64 bit di platform Intel).

#include <stdio.h>

static int double_it(int x)
{
    return 2*x;
}

int main(int argc, char *argv[])
{
    printf("hello world\n");
    printf("double of 100 is: %d\n", double_it(100));
    return 0;
}

Sengaja saya menggunakan keyword static di fungsi double_it untuk menyatakan bahwa fungsi itu scopenya hanya di file ini saja (dan boleh dioptimasi compiler).

Jika kita hanya ingin melihat translasi menjadi assembly, kita bisa menggunakan:

 gcc -S simple.c

Hasilnya adalah file simple.s. Sebagian string masih bisa terlihat dengan jelas, nama fungsi double_it juga masih terlihat, angka 100 juga masih terlihat. Compiler gcc menghasilkan assembly dalam syntax AT&T.

Kode assembly hanya berbentuk teks biasa sama seperti kode dalam bahasa C, jadi kita bisa menulis sendiri dengan tangan kode assembly, tidak harus dihasilkan dari program C.

SEKALI LAGI: kode listing assembly pada bagian berikut ini belum saatnya untuk dipahami, hanya supaya Anda mendapatkan gambaran berbagai jenis kode assembly yang ada. Banyak pemula bingung ketika diajari assembly Intel lalu berusaha melakukan reverse engineering iOS dan kode assemblynya benar-benar berbeda. Ini juga sekaligus memberikan gambaran betapa berbedanya kode assembly untuk kode C yang sama jika compiler melakukan optimasi.

    .file   "simple.c"
    .text
    .type   double_it, @function
double_it:
.LFB0:
    .cfi_startproc
    pushq   %rbp
    .cfi_def_cfa_offset 16
    .cfi_offset 6, -16
    movq    %rsp, %rbp
    .cfi_def_cfa_register 6
    movl    %edi, -4(%rbp)
    movl    -4(%rbp), %eax
    addl    %eax, %eax
    popq    %rbp
    .cfi_def_cfa 7, 8
    ret
    .cfi_endproc
.LFE0:
    .size   double_it, .-double_it
    .section    .rodata
.LC0:
    .string "hello world"
.LC1:
    .string "double of 100 is: %d\n"
    .text
    .globl  main
    .type   main, @function
main:
.LFB1:
    .cfi_startproc
    pushq   %rbp
    .cfi_def_cfa_offset 16
    .cfi_offset 6, -16
    movq    %rsp, %rbp
    .cfi_def_cfa_register 6
    subq    $16, %rsp
    movl    %edi, -4(%rbp)
    movq    %rsi, -16(%rbp)
    movl    $.LC0, %edi
    call    puts
    movl    $100, %edi
    call    double_it
    movl    %eax, %esi
    movl    $.LC1, %edi
    movl    $0, %eax
    call    printf
    movl    $0, %eax
    leave
    .cfi_def_cfa 7, 8
    ret
    .cfi_endproc
.LFE1:
    .size   main, .-main
    .ident  "GCC: (Debian 6.1.1-11) 6.1.1 20160802"
    .section    .note.GNU-stack,"",@progbits

Sebagai selingan: eksperimen ini juga bisa dilakukan di Windows dengan Visual studio

cl /FA /c simple.c

Visual studio menghasilkan listing assembly dalam syntax Intel.

; Listing generated by Microsoft (R) Optimizing Compiler Version 19.00.23918.0 

include listing.inc

INCLUDELIB LIBCMT
INCLUDELIB OLDNAMES

_DATA   SEGMENT
$SG4261 DB  'hello world', 0aH, 00H
    ORG $+3
$SG4262 DB  'double of 100 is: %d', 0aH, 00H
_DATA   ENDS
PUBLIC  __local_stdio_printf_options
PUBLIC  _vfprintf_l
PUBLIC  printf
PUBLIC  main
EXTRN   __acrt_iob_func:PROC
EXTRN   __stdio_common_vfprintf:PROC
_DATA   SEGMENT
COMM    ?_OptionsStorage@?1??__local_stdio_printf_options@@9@9:QWORD                            ; `__local_stdio_printf_options'::`2'::_OptionsStorage
_DATA   ENDS
;  COMDAT pdata
pdata   SEGMENT
$pdata$_vfprintf_l DD imagerel $LN3
    DD  imagerel $LN3+67
    DD  imagerel $unwind$_vfprintf_l
pdata   ENDS
;  COMDAT pdata
pdata   SEGMENT
$pdata$printf DD imagerel $LN3
    DD  imagerel $LN3+87
    DD  imagerel $unwind$printf
pdata   ENDS
pdata   SEGMENT
$pdata$main DD  imagerel $LN3
    DD  imagerel $LN3+56
    DD  imagerel $unwind$main
pdata   ENDS
xdata   SEGMENT
$unwind$main DD 010d01H
    DD  0420dH
xdata   ENDS
;  COMDAT xdata
xdata   SEGMENT
$unwind$printf DD 011801H
    DD  06218H
xdata   ENDS
;  COMDAT xdata
xdata   SEGMENT
$unwind$_vfprintf_l DD 011801H
    DD  06218H
xdata   ENDS
; Function compile flags: /Odtp
_TEXT   SEGMENT
argc$ = 48
argv$ = 56
main    PROC
; File c:\gitwork\yohan.es\reverse-engineering\simple.c
; Line 9
$LN3:
    mov QWORD PTR [rsp+16], rdx
    mov DWORD PTR [rsp+8], ecx
    sub rsp, 40                 ; 00000028H
; Line 10
    lea rcx, OFFSET FLAT:$SG4261
    call    printf
; Line 11
    mov ecx, 100                ; 00000064H
    call    double_it
    mov edx, eax
    lea rcx, OFFSET FLAT:$SG4262
    call    printf
; Line 12
    xor eax, eax
; Line 13
    add rsp, 40                 ; 00000028H
    ret 0
main    ENDP
_TEXT   ENDS
; Function compile flags: /Odtp
_TEXT   SEGMENT
x$ = 8
double_it PROC
; File c:\gitwork\yohan.es\reverse-engineering\simple.c
; Line 4
    mov DWORD PTR [rsp+8], ecx
; Line 5
    mov eax, DWORD PTR x$[rsp]
    shl eax, 1
; Line 6
    ret 0
double_it ENDP
_TEXT   ENDS
; Function compile flags: /Odtp
;  COMDAT printf
_TEXT   SEGMENT
_Result$ = 32
_ArgList$ = 40
_Format$ = 64
printf  PROC                        ; COMDAT
; File c:\program files (x86)\windows kits\10\include\10.0.10240.0\ucrt\stdio.h
; Line 950
$LN3:
    mov QWORD PTR [rsp+8], rcx
    mov QWORD PTR [rsp+16], rdx
    mov QWORD PTR [rsp+24], r8
    mov QWORD PTR [rsp+32], r9
    sub rsp, 56                 ; 00000038H
; Line 953
    lea rax, QWORD PTR _Format$[rsp+8]
    mov QWORD PTR _ArgList$[rsp], rax
; Line 954
    mov ecx, 1
    call    __acrt_iob_func
    mov r9, QWORD PTR _ArgList$[rsp]
    xor r8d, r8d
    mov rdx, QWORD PTR _Format$[rsp]
    mov rcx, rax
    call    _vfprintf_l
    mov DWORD PTR _Result$[rsp], eax
; Line 955
    mov QWORD PTR _ArgList$[rsp], 0
; Line 956
    mov eax, DWORD PTR _Result$[rsp]
; Line 957
    add rsp, 56                 ; 00000038H
    ret 0
printf  ENDP
_TEXT   ENDS
; Function compile flags: /Odtp
;  COMDAT _vfprintf_l
_TEXT   SEGMENT
_Stream$ = 64
_Format$ = 72
_Locale$ = 80
_ArgList$ = 88
_vfprintf_l PROC                    ; COMDAT
; File c:\program files (x86)\windows kits\10\include\10.0.10240.0\ucrt\stdio.h
; Line 638
$LN3:
    mov QWORD PTR [rsp+32], r9
    mov QWORD PTR [rsp+24], r8
    mov QWORD PTR [rsp+16], rdx
    mov QWORD PTR [rsp+8], rcx
    sub rsp, 56                 ; 00000038H
; Line 639
    call    __local_stdio_printf_options
    mov rcx, QWORD PTR _ArgList$[rsp]
    mov QWORD PTR [rsp+32], rcx
    mov r9, QWORD PTR _Locale$[rsp]
    mov r8, QWORD PTR _Format$[rsp]
    mov rdx, QWORD PTR _Stream$[rsp]
    mov rcx, QWORD PTR [rax]
    call    __stdio_common_vfprintf
; Line 640
    add rsp, 56                 ; 00000038H
    ret 0
_vfprintf_l ENDP
_TEXT   ENDS
; Function compile flags: /Odtp
;  COMDAT __local_stdio_printf_options
_TEXT   SEGMENT
__local_stdio_printf_options PROC           ; COMDAT
; File c:\program files (x86)\windows kits\10\include\10.0.10240.0\ucrt\corecrt_stdio_config.h
; Line 75
    lea rax, OFFSET FLAT:?_OptionsStorage@?1??__local_stdio_printf_options@@9@9 ; `__local_stdio_printf_options'::`2'::_OptionsStorage
; Line 76
    ret 0
__local_stdio_printf_options ENDP
_TEXT   ENDS
END

Sekarang kembali lagi ke gcc. Sebagai perbandingan, jika kita mengaktifkan optimasi compiler, maka compiler akan menghasilkan kode yang sangat singkat. Di sini, bahkan fungsi double_it sudah hilang, dan langsung digantikan konstanta 200.

 gcc -S -O1 simple.c

    .file   "simple.c"
    .section    .rodata.str1.1,"aMS",@progbits,1
.LC0:
    .string "hello world"
.LC1:
    .string "double of 100 is: %d\n"
    .text
    .globl  main
    .type   main, @function
main:
.LFB12:
    .cfi_startproc
    subq    $8, %rsp
    .cfi_def_cfa_offset 16
    movl    $.LC0, %edi
    call    puts
    movl    $200, %esi
    movl    $.LC1, %edi
    movl    $0, %eax
    call    printf
    movl    $0, %eax
    addq    $8, %rsp
    .cfi_def_cfa_offset 8
    ret
    .cfi_endproc
.LFE12:
    .size   main, .-main
    .ident  "GCC: (Debian 6.1.1-11) 6.1.1 20160802"
    .section    .note.GNU-stack,"",@progbits

Kita bisa menaikkan tingkat optimasi yang dilakukan compiler, dalam contoh ini, sebuah instruksi movl $0, %eax berubah menjadi xorl %eax, %eax. Operasi XOR terhadap suatu bilangan akan menghasilkan 0, ini merupakan optimasi yang sangat low level karena compiler tahu bahwa XOR akan lebih cepat (ini diketahui dari manual CPU).

 gcc -S -O2 simple.c

    .file   "simple.c"
    .section    .rodata.str1.1,"aMS",@progbits,1
.LC0:
    .string "hello world"
.LC1:
    .string "double of 100 is: %d\n"
    .section    .text.startup,"ax",@progbits
    .p2align 4,,15
    .globl  main
    .type   main, @function
main:
.LFB12:
    .cfi_startproc
    subq    $8, %rsp
    .cfi_def_cfa_offset 16
    movl    $.LC0, %edi
    call    puts
    movl    $200, %esi
    movl    $.LC1, %edi
    xorl    %eax, %eax
    call    printf
    xorl    %eax, %eax
    addq    $8, %rsp
    .cfi_def_cfa_offset 8
    ret
    .cfi_endproc
.LFE12:
    .size   main, .-main
    .ident  "GCC: (Debian 6.1.1-11) 6.1.1 20160802"
    .section    .note.GNU-stack,"",@progbits

Jika kita coba naikkan lagi tingkat optimasinya, hasilnya tidak berubah karena contoh ini terlalu kecil, di kasus lain, tiap level optimasi bisa menghasilkan kode yang berbeda.

Itu hanyalah contoh kecil assembly dalam AMD64, ini contoh lain dalam ARM.

    .arch armv6
    .eabi_attribute 27, 3
    .eabi_attribute 28, 1
    .fpu vfp
    .eabi_attribute 20, 1
    .eabi_attribute 21, 1
    .eabi_attribute 23, 3
    .eabi_attribute 24, 1
    .eabi_attribute 25, 1
    .eabi_attribute 26, 2
    .eabi_attribute 30, 6
    .eabi_attribute 34, 1
    .eabi_attribute 18, 4
    .file   "simple.c"
    .text
    .align  2
    .type   double_it, %function
double_it:
    @ args = 0, pretend = 0, frame = 8
    @ frame_needed = 1, uses_anonymous_args = 0
    @ link register save eliminated.
    str fp, [sp, #-4]!
    add fp, sp, #0
    sub sp, sp, #12
    str r0, [fp, #-8]
    ldr r3, [fp, #-8]
    mov r3, r3, asl #1
    mov r0, r3
    sub sp, fp, #0
    @ sp needed
    ldr fp, [sp], #4
    bx  lr
    .size   double_it, .-double_it
    .section    .rodata
    .align  2
.LC0:
    .ascii  "hello world\000"
    .align  2
.LC1:
    .ascii  "double of 100 is: %d\012\000"
    .text
    .align  2
    .global main
    .type   main, %function
main:
    @ args = 0, pretend = 0, frame = 8
    @ frame_needed = 1, uses_anonymous_args = 0
    stmfd   sp!, {fp, lr}
    add fp, sp, #4
    sub sp, sp, #8
    str r0, [fp, #-8]
    str r1, [fp, #-12]
    ldr r0, .L5
    bl  puts
    mov r0, #100
    bl  double_it
    mov r3, r0
    ldr r0, .L5+4
    mov r1, r3
    bl  printf
    mov r3, #0
    mov r0, r3
    sub sp, fp, #4
    @ sp needed
    ldmfd   sp!, {fp, pc}
.L6:
    .align  2
.L5:
    .word   .LC0
    .word   .LC1
    .size   main, .-main
    .ident  "GCC: (Raspbian 4.9.2-10) 4.9.2"
    .section    .note.GNU-stack,"",%progbits

Dan versi ARM yang sudah dioptimasi (O1)

    .arch armv6
    .eabi_attribute 27, 3
    .eabi_attribute 28, 1
    .fpu vfp
    .eabi_attribute 20, 1
    .eabi_attribute 21, 1
    .eabi_attribute 23, 3
    .eabi_attribute 24, 1
    .eabi_attribute 25, 1
    .eabi_attribute 26, 2
    .eabi_attribute 30, 1
    .eabi_attribute 34, 1
    .eabi_attribute 18, 4
    .file   "simple.c"
    .text
    .align  2
    .global main
    .type   main, %function
main:
    @ args = 0, pretend = 0, frame = 0
    @ frame_needed = 0, uses_anonymous_args = 0
    stmfd   sp!, {r3, lr}
    ldr r0, .L3
    bl  puts
    ldr r0, .L3+4
    mov r1, #200
    bl  printf
    mov r0, #0
    ldmfd   sp!, {r3, pc}
.L4:
    .align  2
.L3:
    .word   .LC0
    .word   .LC1
    .size   main, .-main
    .section    .rodata.str1.4,"aMS",%progbits,1
    .align  2
.LC0:
    .ascii  "hello world\000"
.LC1:
    .ascii  "double of 100 is: %d\012\000"
    .ident  "GCC: (Raspbian 4.9.2-10) 4.9.2"
    .section    .note.GNU-stack,"",%progbits

Contoh lain versi MIPS

    .file   1 "simple.c"
    .section .mdebug.abi32
    .previous
    .gnu_attribute 4, 1
    .abicalls
    .text
    .align  2
    .set    nomips16
    .ent    double_it
    .type   double_it, @function
double_it:
    .frame  $fp,8,$31       # vars= 0, regs= 1/0, args= 0, gp= 0
    .mask   0x40000000,-4
    .fmask  0x00000000,0
    .set    noreorder
    .set    nomacro
    
    addiu   $sp,$sp,-8
    sw  $fp,4($sp)
    move    $fp,$sp
    sw  $4,8($fp)
    lw  $2,8($fp)
    nop
    sll $2,$2,1
    move    $sp,$fp
    lw  $fp,4($sp)
    addiu   $sp,$sp,8
    j   $31
    nop

    .set    macro
    .set    reorder
    .end    double_it
    .size   double_it, .-double_it
    .rdata
    .align  2
$LC0:
    .ascii  "hello world\000"
    .align  2
$LC1:
    .ascii  "double of 100 is: %d\012\000"
    .text
    .align  2
    .globl  main
    .set    nomips16
    .ent    main
    .type   main, @function
main:
    .frame  $fp,40,$31      # vars= 0, regs= 3/0, args= 16, gp= 8
    .mask   0xc0010000,-4
    .fmask  0x00000000,0
    .set    noreorder
    .set    nomacro
    
    addiu   $sp,$sp,-40
    sw  $31,36($sp)
    sw  $fp,32($sp)
    sw  $16,28($sp)
    move    $fp,$sp
    lui $28,%hi(__gnu_local_gp)
    addiu   $28,$28,%lo(__gnu_local_gp)
    .cprestore  16
    sw  $4,40($fp)
    sw  $5,44($fp)
    lui $2,%hi($LC0)
    addiu   $4,$2,%lo($LC0)
    lw  $2,%call16(puts)($28)
    nop
    move    $25,$2
    jalr    $25
    nop

    lw  $28,16($fp)
    lui $2,%hi($LC1)
    addiu   $16,$2,%lo($LC1)
    li  $4,100          # 0x64
    .option pic0
    jal double_it
    nop

    .option pic2
    lw  $28,16($fp)
    move    $4,$16
    move    $5,$2
    lw  $2,%call16(printf)($28)
    nop
    move    $25,$2
    jalr    $25
    nop

    lw  $28,16($fp)
    move    $2,$0
    move    $sp,$fp
    lw  $31,36($sp)
    lw  $fp,32($sp)
    lw  $16,28($sp)
    addiu   $sp,$sp,40
    j   $31
    nop

    .set    macro
    .set    reorder
    .end    main
    .size   main, .-main
    .ident  "GCC: (Debian 4.4.5-8) 4.4.5"

Dan yang sudah dioptimasi (O1)

    .file   1 "simple.c"
    .section .mdebug.abi32
    .previous
    .gnu_attribute 4, 1
    .abicalls
    .section    .rodata.str1.4,"aMS",@progbits,1
    .align  2
$LC0:
    .ascii  "hello world\000"
    .align  2
$LC1:
    .ascii  "double of 100 is: %d\012\000"
    .text
    .align  2
    .globl  main
    .set    nomips16
    .ent    main
    .type   main, @function
main:
    .frame  $sp,32,$31      # vars= 0, regs= 1/0, args= 16, gp= 8
    .mask   0x80000000,-4
    .fmask  0x00000000,0
    .set    noreorder
    .set    nomacro
    
    addiu   $sp,$sp,-32
    sw  $31,28($sp)
    lui $28,%hi(__gnu_local_gp)
    addiu   $28,$28,%lo(__gnu_local_gp)
    .cprestore  16
    lui $4,%hi($LC0)
    lw  $25,%call16(puts)($28)
    nop
    jalr    $25
    addiu   $4,$4,%lo($LC0)

    lw  $28,16($sp)
    lui $4,%hi($LC1)
    addiu   $4,$4,%lo($LC1)
    lw  $25,%call16(printf)($28)
    nop
    jalr    $25
    li  $5,200          # 0xc8

    move    $2,$0
    lw  $31,28($sp)
    nop
    j   $31
    addiu   $sp,$sp,32

    .set    macro
    .set    reorder
    .end    main
    .size   main, .-main
    .ident  "GCC: (Debian 4.4.5-8) 4.4.5"

Jika Anda baca sekilas berbagai kode assembly di atas, Anda bisa melihat bahwa: berbagai mnemonik (kode yang bisa dibaca manusia) sangat berbeda antara satu CPU dengan yang lain (misalnya untuk memanggil fungsi: call di Intel, bl di ARM, dan jal di MIPS). Karena penjelasan untuk masing-masing CPU cukup panjang, ini akan dibahas di bab lain.

Assembly ke object code

Listing assembly di bagian sebelumnya masih bisa dibaca manusia. Listing ini bisa diterjemahkan ke bahasa mesin menggunakan assembler. Kita bisa menggunakan program as (assembler) untuk melakukan ini, atau tetap memakai gcc, program gcc ini cukup pintar untuk tahu bahwa jika inputnya adalah .s maka akan otomatis menjalankan program as.

Object code adalah kode dalam bahasa mesin, tapi belum lengkap. Misalnya di kode di atas, kita belum tahu bagaimana implementasi printf. Untuk desktop, kemungkinan besar aksinya adalah mencetak ke layar, tapi jika kita membuat program untuk dijalankan di embedded system, mungkin outputnya adalah serial port. Di titik ini kita belum tahu implementasi fungsi tertentu, jadi akan dibiarkan "kosong".

Nanti di proses berikutnya (linking) barulah kita menggabungkan kode yang sudah dihasilkan oleh assembler ini dengan kode pustaka/library.

Sebelum membahas mengenai library, kita teruskan contoh sebelumnya (saya memakai versi yang tidak dioptimasi):

Untuk memanggil assembler:

as simple.s -o simple.o

Atau lebih mudahnya:

gcc -c simple.s

Hasilnya adalah simple.o.

$ file simple.o
simple.o: ELF 64-bit LSB relocatable, x86-64, version 1 (SYSV), not stripped

File ini adalah file biner, ini bisa kita lihat dengan program hexdump atau editor heksadesimal yang lain. Karena cukup panjang, akan saya tunjukkan bagian awalnya saja. Di sini bisa terlihat string "hello world" dan juga "double of 100 is: %d". Bahkan nama fungsi double_it juga bisa dilihat.

Kode bahasa mesin yang dihasilkan memiliki header, yaitu informasi ekstra yang dibutuhkan oleh compiler (perhatikan ini tidak ada hungannya dengan file header berektensi .h ketika memprogram C). Format header ini tergantung pada compiler yang dipakai dan sistem operasi yang dipakai. Di Linux, format yang dipakai adalah ELF (Executable and Linkable Format). Untuk saat ini tidak perlu mengerti dulu format ini. Untuk file ELF, di header file ada teks yang terbaca "ELF", dan biasanya ada juga versi compiler yang membuat file tersebut ("GCC") berikut versinya. Di Windows format yang dipakai adalah MZ/PE dengan ciri-ciri diawali dengan "MZ" dan beberapa puluh byte berikutnya ada "PE").

$ hexdump -C simple.o
00000000  7f 45 4c 46 02 01 01 00  00 00 00 00 00 00 00 00  |.ELF............|
00000010  01 00 3e 00 01 00 00 00  00 00 00 00 00 00 00 00  |..>.............|
00000020  00 00 00 00 00 00 00 00  88 03 00 00 00 00 00 00  |................|
00000030  00 00 00 00 40 00 00 00  00 00 40 00 0d 00 0a 00  |....@.....@.....|
00000040  55 48 89 e5 89 7d fc 8b  45 fc 01 c0 5d c3 55 48  |UH...}..E...].UH|
00000050  89 e5 48 83 ec 10 89 7d  fc 48 89 75 f0 bf 00 00  |..H....}.H.u....|
00000060  00 00 e8 00 00 00 00 bf  64 00 00 00 e8 cf ff ff  |........d.......|
00000070  ff 89 c6 bf 00 00 00 00  b8 00 00 00 00 e8 00 00  |................|
00000080  00 00 b8 00 00 00 00 c9  c3 68 65 6c 6c 6f 20 77  |.........hello w|
00000090  6f 72 6c 64 00 64 6f 75  62 6c 65 20 6f 66 20 31  |orld.double of 1|
000000a0  30 30 20 69 73 3a 20 25  64 0a 00 00 47 43 43 3a  |00 is: %d...GCC:|
000000b0  20 28 44 65 62 69 61 6e  20 36 2e 31 2e 31 2d 31  | (Debian 6.1.1-1|
000000c0  31 29 20 36 2e 31 2e 31  20 32 30 31 36 30 38 30  |1) 6.1.1 2016080|
000000d0  32 00 00 00 00 00 00 00  14 00 00 00 00 00 00 00  |2...............|
000000e0  01 7a 52 00 01 78 10 01  1b 0c 07 08 90 01 00 00  |.zR..x..........|
000000f0  1c 00 00 00 1c 00 00 00  00 00 00 00 0e 00 00 00  |................|
00000100  00 41 0e 10 86 02 43 0d  06 49 0c 07 08 00 00 00  |.A....C..I......|
00000110  1c 00 00 00 3c 00 00 00  00 00 00 00 3b 00 00 00  |....<.......;...|
00000120  00 41 0e 10 86 02 43 0d  06 76 0c 07 08 00 00 00  |.A....C..v......|
00000130  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000140  00 00 00 00 00 00 00 00  01 00 00 00 04 00 f1 ff  |................|
00000150  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000160  00 00 00 00 03 00 01 00  00 00 00 00 00 00 00 00  |................|
00000170  00 00 00 00 00 00 00 00  00 00 00 00 03 00 03 00  |................|
00000180  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000190  00 00 00 00 03 00 04 00  00 00 00 00 00 00 00 00  |................|
000001a0  00 00 00 00 00 00 00 00  0a 00 00 00 02 00 01 00  |................|
000001b0  00 00 00 00 00 00 00 00  0e 00 00 00 00 00 00 00  |................|
000001c0  00 00 00 00 03 00 05 00  00 00 00 00 00 00 00 00  |................|
000001d0  00 00 00 00 00 00 00 00  00 00 00 00 03 00 07 00  |................|
000001e0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
000001f0  00 00 00 00 03 00 08 00  00 00 00 00 00 00 00 00  |................|
00000200  00 00 00 00 00 00 00 00  00 00 00 00 03 00 06 00  |................|
00000210  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000220  14 00 00 00 12 00 01 00  0e 00 00 00 00 00 00 00  |................|
00000230  3b 00 00 00 00 00 00 00  19 00 00 00 10 00 00 00  |;...............|
00000240  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000250  1e 00 00 00 10 00 00 00  00 00 00 00 00 00 00 00  |................|
00000260  00 00 00 00 00 00 00 00  00 73 69 6d 70 6c 65 2e  |.........simple.|
00000270  63 00 64 6f 75 62 6c 65  5f 69 74 00 6d 61 69 6e  |c.double_it.main|
00000280  00 70 75 74 73 00 70 72  69 6e 74 66 00 00 00 00  |.puts.printf....|
00000290  1e 00 00 00 00 00 00 00  0a 00 00 00 06 00 00 00  |................|
000002a0  00 00 00 00 00 00 00 00  23 00 00 00 00 00 00 00  |........#.......|
000002b0  02 00 00 00 0b 00 00 00  fc ff ff ff ff ff ff ff  |................|
000002c0  34 00 00 00 00 00 00 00  0a 00 00 00 06 00 00 00  |4...............|
000002d0  0c 00 00 00 00 00 00 00  3e 00 00 00 00 00 00 00  |........>.......|
000002e0  02 00 00 00 0c 00 00 00  fc ff ff ff ff ff ff ff  |................|
000002f0  20 00 00 00 00 00 00 00  02 00 00 00 02 00 00 00  | ...............|
00000300  00 00 00 00 00 00 00 00  40 00 00 00 00 00 00 00  |........@.......|
00000310  02 00 00 00 02 00 00 00  0e 00 00 00 00 00 00 00  |................|
00000320  00 2e 73 79 6d 74 61 62  00 2e 73 74 72 74 61 62  |..symtab..strtab|
00000330  00 2e 73 68 73 74 72 74  61 62 00 2e 72 65 6c 61  |..shstrtab..rela|
00000340  2e 74 65 78 74 00 2e 64  61 74 61 00 2e 62 73 73  |.text..data..bss|
00000350  00 2e 72 6f 64 61 74 61  00 2e 63 6f 6d 6d 65 6e  |..rodata..commen|
00000360  74 00 2e 6e 6f 74 65 2e  47 4e 55 2d 73 74 61 63  |t..note.GNU-stac|
00000370  6b 00 2e 72 65 6c 61 2e  65 68 5f 66 72 61 6d 65  |k..rela.eh_frame|
00000380  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*

Format dalam bahasa mesin ini bisa kita kembalikan lagi ke bentuk assembly, tapi tentunya tanpa komentar apapun. Beberapa flag yang saya berikan adalah -a untuk menampikan semua header, -d untuk menampilkan disassembly (ini bisa dilihat di bagian Disassembly of section .text:, -s untuk menampilkan isi section (contents of section), -t untuk menampilkan symbol table (SYMBOL TABLE), dan -r untuk menampilkan informasi relokasi.

Informasi relokasi ini yang nanti akan dipakai oleh linker untuk "mengisi" bagian yang masih belum diketahui saat ini (misalnya puts dan printf).

objdump -a -d -s -t -r simple.o

simple.o:     file format elf64-x86-64
simple.o

SYMBOL TABLE:
0000000000000000 l    df *ABS*  0000000000000000 simple.c
0000000000000000 l    d  .text  0000000000000000 .text
0000000000000000 l    d  .data  0000000000000000 .data
0000000000000000 l    d  .bss   0000000000000000 .bss
0000000000000000 l     F .text  000000000000000e double_it
0000000000000000 l    d  .rodata    0000000000000000 .rodata
0000000000000000 l    d  .note.GNU-stack    0000000000000000 .note.GNU-stack
0000000000000000 l    d  .eh_frame  0000000000000000 .eh_frame
0000000000000000 l    d  .comment   0000000000000000 .comment
000000000000000e g     F .text  000000000000003b main
0000000000000000         *UND*  0000000000000000 puts
0000000000000000         *UND*  0000000000000000 printf


Contents of section .text:
 0000 554889e5 897dfc8b 45fc01c0 5dc35548  UH...}..E...].UH
 0010 89e54883 ec10897d fc488975 f0bf0000  ..H....}.H.u....
 0020 0000e800 000000bf 64000000 e8cfffff  ........d.......
 0030 ff89c6bf 00000000 b8000000 00e80000  ................
 0040 0000b800 000000c9 c3                 .........       
Contents of section .rodata:
 0000 68656c6c 6f20776f 726c6400 646f7562  hello world.doub
 0010 6c65206f 66203130 30206973 3a202564  le of 100 is: %d
 0020 0a00                                 ..              
Contents of section .comment:
 0000 00474343 3a202844 65626961 6e20362e  .GCC: (Debian 6.
 0010 312e312d 31312920 362e312e 31203230  1.1-11) 6.1.1 20
 0020 31363038 303200                      160802.         
Contents of section .eh_frame:
 0000 14000000 00000000 017a5200 01781001  .........zR..x..
 0010 1b0c0708 90010000 1c000000 1c000000  ................
 0020 00000000 0e000000 00410e10 8602430d  .........A....C.
 0030 06490c07 08000000 1c000000 3c000000  .I..........<...
 0040 00000000 3b000000 00410e10 8602430d  ....;....A....C.
 0050 06760c07 08000000                    .v......        

Disassembly of section .text:

0000000000000000 <double_it>:
   0:   55                      push   %rbp
   1:   48 89 e5                mov    %rsp,%rbp
   4:   89 7d fc                mov    %edi,-0x4(%rbp)
   7:   8b 45 fc                mov    -0x4(%rbp),%eax
   a:   01 c0                   add    %eax,%eax
   c:   5d                      pop    %rbp
   d:   c3                      retq   

000000000000000e <main>:
   e:   55                      push   %rbp
   f:   48 89 e5                mov    %rsp,%rbp
  12:   48 83 ec 10             sub    $0x10,%rsp
  16:   89 7d fc                mov    %edi,-0x4(%rbp)
  19:   48 89 75 f0             mov    %rsi,-0x10(%rbp)
  1d:   bf 00 00 00 00          mov    $0x0,%edi
            1e: R_X86_64_32 .rodata
  22:   e8 00 00 00 00          callq  27 <main+0x19>
            23: R_X86_64_PC32   puts-0x4
  27:   bf 64 00 00 00          mov    $0x64,%edi
  2c:   e8 cf ff ff ff          callq  0 <double_it>
  31:   89 c6                   mov    %eax,%esi
  33:   bf 00 00 00 00          mov    $0x0,%edi
            34: R_X86_64_32 .rodata+0xc
  38:   b8 00 00 00 00          mov    $0x0,%eax
  3d:   e8 00 00 00 00          callq  42 <main+0x34>
            3e: R_X86_64_PC32   printf-0x4
  42:   b8 00 00 00 00          mov    $0x0,%eax
  47:   c9                      leaveq 
  48:   c3                      retq

Perhatikan juga bagian ini:

 e: 55                      push   %rbp
 f: 48 89 e5                mov    %rsp,%rbp

Di sebelah kiri adalah alamat dalam heksadesimal (e: artinya 0xe atau 14 desimal, f: artinya 0xf atau 15 desimal), lalu di sebelah kananya ada bilangan yang merupakan representasi dalam bahasa mesin, di contoh di atas, 55 adalah push %rbp di AMD64, dan 48 89 e5 adalah mov %rsp,%rbp.

Dari byte-byte bahasa mesin kita bisa mengubahnya menjadi teks assembly, tapi tentunya kita harus tahu arsitektur apa. Serangkaian byte ketika didisassembly dengan arsitektur yang salah akan keluar serangkaian teks assembly yang tidak masuk akal

Jika kita lihat lagi hexdump di atas:

$ hexdump -C simple.o
00000000  7f 45 4c 46 02 01 01 00  00 00 00 00 00 00 00 00  |.ELF............|
00000010  01 00 3e 00 01 00 00 00  00 00 00 00 00 00 00 00  |..>.............|
00000020  00 00 00 00 00 00 00 00  88 03 00 00 00 00 00 00  |................|
00000030  00 00 00 00 40 00 00 00  00 00 40 00 0d 00 0a 00  |....@.....@.....|
00000040  55 48 89 e5 89 7d fc 8b  45 fc 01 c0 5d c3 55 48  |UH...}..E...].UH|

Bisa dilihat di baris terakhir ada sederetan angka 55 48 89 e5 89 7d fc ... ini merupakan kode bahasa mesin dalam file tersebut (kode ini yang ada si sebelah kiri push %rbp dst).

Informasi debug

Ada bagian yang saya skip di contoh assembly pertama, yaitu mengenai debugging info. Jika kita mengkompilasi kode dengan opsi -g, kita akan meminta compiler menuliskan informasi debugging. Informasi ini misalnya: di baris berapa di baris asli, apa nama variabel asli, dsb.

gcc -S -g simple.c
gcc -c simple.s

Atau bisa disingkat:

gcc -c -g simple.c

Maka kita akan bisa mendapatkan listing assembly yang disertai dengan source codenya:

objdump -d -S simple.o

simple.o:     file format elf64-x86-64


Disassembly of section .text:

0000000000000000 <double_it>:
#include <stdio.h>

static int double_it(int x)
{
   0:   55                      push   %rbp
   1:   48 89 e5                mov    %rsp,%rbp
   4:   89 7d fc                mov    %edi,-0x4(%rbp)
    return 2*x;
   7:   8b 45 fc                mov    -0x4(%rbp),%eax
   a:   01 c0                   add    %eax,%eax
}
   c:   5d                      pop    %rbp
   d:   c3                      retq   

000000000000000e <main>:

int main(int argc, char *argv[])
{
   e:   55                      push   %rbp
   f:   48 89 e5                mov    %rsp,%rbp
  12:   48 83 ec 10             sub    $0x10,%rsp
  16:   89 7d fc                mov    %edi,-0x4(%rbp)
  19:   48 89 75 f0             mov    %rsi,-0x10(%rbp)
    printf("hello world\n");
  1d:   bf 00 00 00 00          mov    $0x0,%edi
  22:   e8 00 00 00 00          callq  27 <main+0x19>
    printf("double of 100 is: %d\n", double_it(100));
  27:   bf 64 00 00 00          mov    $0x64,%edi
  2c:   e8 cf ff ff ff          callq  0 <double_it>
  31:   89 c6                   mov    %eax,%esi
  33:   bf 00 00 00 00          mov    $0x0,%edi
  38:   b8 00 00 00 00          mov    $0x0,%eax
  3d:   e8 00 00 00 00          callq  42 <main+0x34>
    return 0;
  42:   b8 00 00 00 00          mov    $0x0,%eax
}
  47:   c9                      leaveq 
  48:   c3                      retq

Linking

Linking bisa dilakukan dengan gcc:

gcc simple.o -o simple

Atau jika ingin manual juga bisa. Perhatikan bahwa pathnya ini adalah untuk Linux AMD64.

ld -dynamic-linker /lib64/ld-linux-x86-64.so.2 /usr/lib/x86_64-linux-gnu/crt1.o /usr/lib/x86_64-linux-gnu/crti.o  -lc simple.o /usr/lib/x86_64-linux-gnu/crtn.o -o simple

Karena ini bukan tutorial spesifik C, Anda bisa membaca penjelasannya di sini.

Sekarang kita bisa menjalankan hasil executablenya:

$ ./simple
hello world
double of 100 is: 200

File executable ini juga bisa didump dengan objdump, tapi outputnya sekarang akan panjang.

Lihat dump lengkapnya

Perhatikan output yang penting: kita masih bisa melihat kode assembly yang ada di awal. Perhatikan beberapa hal:

Alamat yang digunakan sekarang adalah alamat yang sebenarnya (tapi ini tidak sepenuhnya sebenarnya, nanti akan dibahas ketike mempelajari Adress Space Layout Randomization/ASLR)
Tidak lagi ada komentar
Kita bisa melihat alamat "hello world" ($0x400634) dan "double if 100 is: %d" ($0x400640)

Contents of section .rodata:
 400630 01000200 68656c6c 6f20776f 726c6400  ....hello world.
 400640 646f7562 6c65206f 66203130 30206973  double of 100 is
 400650 3a202564 0a00                        : %d..          


0000000000400566 <double_it>:
  400566:   55                      push   %rbp
  400567:   48 89 e5                mov    %rsp,%rbp
  40056a:   89 7d fc                mov    %edi,-0x4(%rbp)
  40056d:   8b 45 fc                mov    -0x4(%rbp),%eax
  400570:   01 c0                   add    %eax,%eax
  400572:   5d                      pop    %rbp
  400573:   c3                      retq   

0000000000400574 <main>:
  400574:   55                      push   %rbp
  400575:   48 89 e5                mov    %rsp,%rbp
  400578:   48 83 ec 10             sub    $0x10,%rsp
  40057c:   89 7d fc                mov    %edi,-0x4(%rbp)
  40057f:   48 89 75 f0             mov    %rsi,-0x10(%rbp)
  400583:   bf 34 06 40 00          mov    $0x400634,%edi
  400588:   e8 a3 fe ff ff          callq  400430 <puts@plt>
  40058d:   bf 64 00 00 00          mov    $0x64,%edi
  400592:   e8 cf ff ff ff          callq  400566 <double_it>
  400597:   89 c6                   mov    %eax,%esi
  400599:   bf 40 06 40 00          mov    $0x400640,%edi
  40059e:   b8 00 00 00 00          mov    $0x0,%eax
  4005a3:   e8 98 fe ff ff          callq  400440 <printf@plt>
  4005a8:   b8 00 00 00 00          mov    $0x0,%eax
  4005ad:   c9                      leaveq 
  4005ae:   c3                      retq   
  4005af:   90                      nop

Andaikan kita memakai -O2 ketika kompilasi, maka isi main adalah seperti ini (tidak ada lagi double_it):

0000000000400470 <main>:
  400470:   48 83 ec 08             sub    $0x8,%rsp
  400474:   bf 24 06 40 00          mov    $0x400624,%edi
  400479:   e8 b2 ff ff ff          callq  400430 <puts@plt>
  40047e:   be c8 00 00 00          mov    $0xc8,%esi
  400483:   bf 30 06 40 00          mov    $0x400630,%edi
  400488:   31 c0                   xor    %eax,%eax
  40048a:   e8 b1 ff ff ff          callq  400440 <printf@plt>
  40048f:   31 c0                   xor    %eax,%eax
  400491:   48 83 c4 08             add    $0x8,%rsp
  400495:   c3                      retq   
  400496:   66 2e 0f 1f 84 00 00    nopw   %cs:0x0(%rax,%rax,1)

Kompilasi Pascal ke native code

Saya akan menunjukkan contoh lain bagaimana sebuah bahasa (selain C) dikompilasi menjadi assembly dan executable. Untuk contoh, saya menggunakan Freepascal.

program simple;

function double_it(x :integer ):integer;
begin
   double_it := 2*x;
end;


begin
   writeln('hello world');
   writeln('double of 100 is: ', double_it(100));
end.

Kita bisa membuat listing dengan:

fpc -a simple.pas

Tanpa optimasi, hasil assemblynya seperti ini

    .file "simple.pas"
# Begin asmlist al_procedures

.section .text.n_p$simple_$$_double_it$smallint$$smallint
    .balign 16,0x90
.globl  P$SIMPLE_$$_DOUBLE_IT$SMALLINT$$SMALLINT
    .type   P$SIMPLE_$$_DOUBLE_IT$SMALLINT$$SMALLINT,@function
P$SIMPLE_$$_DOUBLE_IT$SMALLINT$$SMALLINT:
.Lc1:
    pushq   %rbp
.Lc3:
.Lc4:
    movq    %rsp,%rbp
.Lc5:
    leaq    -16(%rsp),%rsp
    movw    %di,-8(%rbp)
    movswl  -8(%rbp),%eax
    shll    $1,%eax
    movw    %ax,-12(%rbp)
    movswl  -12(%rbp),%eax
    leave
    ret
.Lc2:
.Le0:
    .size   P$SIMPLE_$$_DOUBLE_IT$SMALLINT$$SMALLINT, .Le0 - P$SIMPLE_$$_DOUBLE_IT$SMALLINT$$SMALLINT

.section .text.n_main
    .balign 16,0x90
.globl  PASCALMAIN
    .type   PASCALMAIN,@function
PASCALMAIN:
.globl  main
    .type   main,@function
main:
.Lc6:
    pushq   %rbp
.Lc8:
.Lc9:
    movq    %rsp,%rbp
.Lc10:
    leaq    -16(%rsp),%rsp
    movq    %rbx,-8(%rbp)
    call    FPC_INITIALIZEUNITS
    call    fpc_get_output
    movq    %rax,%rbx
    movq    $_$SIMPLE$_Ld1,%rdx
    movq    %rbx,%rsi
    movl    $0,%edi
    call    fpc_write_text_shortstr
    call    FPC_IOCHECK
    movq    %rbx,%rdi
    call    fpc_writeln_end
    call    FPC_IOCHECK
    call    fpc_get_output
    movq    %rax,%rbx
    movq    $_$SIMPLE$_Ld2,%rdx
    movq    %rbx,%rsi
    movl    $0,%edi
    call    fpc_write_text_shortstr
    call    FPC_IOCHECK
    movl    $100,%edi
    call    P$SIMPLE_$$_DOUBLE_IT$SMALLINT$$SMALLINT
    movw    %ax,%dx
    movswq  %dx,%rdx
    movq    %rbx,%rsi
    movl    $0,%edi
    call    fpc_write_text_sint
    call    FPC_IOCHECK
    movq    %rbx,%rdi
    call    fpc_writeln_end
    call    FPC_IOCHECK
    call    FPC_DO_EXIT
    movq    -8(%rbp),%rbx
    leave
    ret
.Lc7:
.Le1:
    .size   main, .Le1 - main

.section .text
# End asmlist al_procedures
# Begin asmlist al_globals

.section .data.n_INITFINAL
    .balign 8
.globl  INITFINAL
    .type   INITFINAL,@object
INITFINAL:
    .quad   1,0
    .quad   INIT$_$SYSTEM
    .quad   0
.Le2:
    .size   INITFINAL, .Le2 - INITFINAL

.section .data.n_FPC_THREADVARTABLES
    .balign 8
.globl  FPC_THREADVARTABLES
    .type   FPC_THREADVARTABLES,@object
FPC_THREADVARTABLES:
    .long   1
    .quad   THREADVARLIST_$SYSTEM
.Le3:
    .size   FPC_THREADVARTABLES, .Le3 - FPC_THREADVARTABLES

.section .data.n_FPC_RESOURCESTRINGTABLES
    .balign 8
.globl  FPC_RESOURCESTRINGTABLES
    .type   FPC_RESOURCESTRINGTABLES,@object
FPC_RESOURCESTRINGTABLES:
    .quad   0
.Le4:
    .size   FPC_RESOURCESTRINGTABLES, .Le4 - FPC_RESOURCESTRINGTABLES

.section .data.n_FPC_WIDEINITTABLES
    .balign 8
.globl  FPC_WIDEINITTABLES
    .type   FPC_WIDEINITTABLES,@object
FPC_WIDEINITTABLES:
    .quad   0
.Le5:
    .size   FPC_WIDEINITTABLES, .Le5 - FPC_WIDEINITTABLES

.section .data.n_FPC_RESSTRINITTABLES
    .balign 8
.globl  FPC_RESSTRINITTABLES
    .type   FPC_RESSTRINITTABLES,@object
FPC_RESSTRINITTABLES:
    .quad   0
.Le6:
    .size   FPC_RESSTRINITTABLES, .Le6 - FPC_RESSTRINITTABLES

.section .fpc.n_version
    .balign 8
    .ascii  "FPC 3.0.0+dfsg-8 [2016/09/03] for x86_64 - Linux"

.section .data.n___stklen
    .balign 8
.globl  __stklen
    .type   __stklen,@object
__stklen:
    .quad   8388608

.section .data.n___heapsize
    .balign 8
.globl  __heapsize
    .type   __heapsize,@object
__heapsize:
    .quad   0

.section .data.n___fpc_valgrind
.globl  __fpc_valgrind
    .type   __fpc_valgrind,@object
__fpc_valgrind:
    .byte   0

.section .data.n_FPC_RESLOCATION
    .balign 8
.globl  FPC_RESLOCATION
    .type   FPC_RESLOCATION,@object
FPC_RESLOCATION:
    .quad   0
# End asmlist al_globals
# Begin asmlist al_typedconsts

.section .rodata.n__$SIMPLE$_Ld1
    .balign 8
.globl  _$SIMPLE$_Ld1
_$SIMPLE$_Ld1:
    .ascii  "\013hello world\000"

.section .rodata.n__$SIMPLE$_Ld2
    .balign 8
.globl  _$SIMPLE$_Ld2
_$SIMPLE$_Ld2:
    .ascii  "\022double of 100 is: \000"
# End asmlist al_typedconsts
# Begin asmlist al_dwarf_frame

.section .debug_frame
.Lc11:
    .long   .Lc13-.Lc12
.Lc12:
    .long   -1
    .byte   1
    .byte   0
    .uleb128    1
    .sleb128    -4
    .byte   16
    .byte   12
    .uleb128    7
    .uleb128    8
    .byte   5
    .uleb128    16
    .uleb128    2
    .balign 4,0
.Lc13:
    .long   .Lc15-.Lc14
.Lc14:
    .quad   .Lc11
    .quad   .Lc1
    .quad   .Lc2-.Lc1
    .byte   4
    .long   .Lc3-.Lc1
    .byte   14
    .uleb128    16
    .byte   4
    .long   .Lc4-.Lc3
    .byte   5
    .uleb128    6
    .uleb128    4
    .byte   4
    .long   .Lc5-.Lc4
    .byte   13
    .uleb128    6
    .balign 4,0
.Lc15:
    .long   .Lc17-.Lc16
.Lc16:
    .quad   .Lc11
    .quad   .Lc6
    .quad   .Lc7-.Lc6
    .byte   4
    .long   .Lc8-.Lc6
    .byte   14
    .uleb128    16
    .byte   4
    .long   .Lc9-.Lc8
    .byte   5
    .uleb128    6
    .uleb128    4
    .byte   4
    .long   .Lc10-.Lc9
    .byte   13
    .uleb128    6
    .balign 4,0
.Lc17:
# End asmlist al_dwarf_frame
.section .note.GNU-stack,"",%progbits

Dengan optimasi:

fpc -a -O4 simple.pas

    .file "simple.pas"
# Begin asmlist al_procedures

.section .text.n_p$simple_$$_double_it$smallint$$smallint
    .balign 16,0x90
.globl  P$SIMPLE_$$_DOUBLE_IT$SMALLINT$$SMALLINT
    .type   P$SIMPLE_$$_DOUBLE_IT$SMALLINT$$SMALLINT,@function
P$SIMPLE_$$_DOUBLE_IT$SMALLINT$$SMALLINT:
.Lc1:
    pushq   %rbp
.Lc3:
.Lc4:
    movq    %rsp,%rbp
.Lc5:
    leaq    -16(%rsp),%rsp
    movw    %di,-8(%rbp)
    movswl  -8(%rbp),%eax
    shll    $1,%eax
    movw    %ax,-12(%rbp)
    movswl  -12(%rbp),%eax
    leave
    ret
.Lc2:
.Le0:
    .size   P$SIMPLE_$$_DOUBLE_IT$SMALLINT$$SMALLINT, .Le0 - P$SIMPLE_$$_DOUBLE_IT$SMALLINT$$SMALLINT

.section .text.n_main
    .balign 16,0x90
.globl  PASCALMAIN
    .type   PASCALMAIN,@function
PASCALMAIN:
.globl  main
    .type   main,@function
main:
.Lc6:
    pushq   %rbp
.Lc8:
.Lc9:
    movq    %rsp,%rbp
.Lc10:
    leaq    -16(%rsp),%rsp
    movq    %rbx,-8(%rbp)
    call    FPC_INITIALIZEUNITS
    call    fpc_get_output
    movq    %rax,%rbx
    movq    $_$SIMPLE$_Ld1,%rdx
    movq    %rbx,%rsi
    movl    $0,%edi
    call    fpc_write_text_shortstr
    call    FPC_IOCHECK
    movq    %rbx,%rdi
    call    fpc_writeln_end
    call    FPC_IOCHECK
    call    fpc_get_output
    movq    %rax,%rbx
    movq    $_$SIMPLE$_Ld2,%rdx
    movq    %rbx,%rsi
    movl    $0,%edi
    call    fpc_write_text_shortstr
    call    FPC_IOCHECK
    movl    $100,%edi
    call    P$SIMPLE_$$_DOUBLE_IT$SMALLINT$$SMALLINT
    movw    %ax,%dx
    movswq  %dx,%rdx
    movq    %rbx,%rsi
    movl    $0,%edi
    call    fpc_write_text_sint
    call    FPC_IOCHECK
    movq    %rbx,%rdi
    call    fpc_writeln_end
    call    FPC_IOCHECK
    call    FPC_DO_EXIT
    movq    -8(%rbp),%rbx
    leave
    ret
.Lc7:
.Le1:
    .size   main, .Le1 - main

.section .text
# End asmlist al_procedures
# Begin asmlist al_globals

.section .data.n_INITFINAL
    .balign 8
.globl  INITFINAL
    .type   INITFINAL,@object
INITFINAL:
    .quad   1,0
    .quad   INIT$_$SYSTEM
    .quad   0
.Le2:
    .size   INITFINAL, .Le2 - INITFINAL

.section .data.n_FPC_THREADVARTABLES
    .balign 8
.globl  FPC_THREADVARTABLES
    .type   FPC_THREADVARTABLES,@object
FPC_THREADVARTABLES:
    .long   1
    .quad   THREADVARLIST_$SYSTEM
.Le3:
    .size   FPC_THREADVARTABLES, .Le3 - FPC_THREADVARTABLES

.section .data.n_FPC_RESOURCESTRINGTABLES
    .balign 8
.globl  FPC_RESOURCESTRINGTABLES
    .type   FPC_RESOURCESTRINGTABLES,@object
FPC_RESOURCESTRINGTABLES:
    .quad   0
.Le4:
    .size   FPC_RESOURCESTRINGTABLES, .Le4 - FPC_RESOURCESTRINGTABLES

.section .data.n_FPC_WIDEINITTABLES
    .balign 8
.globl  FPC_WIDEINITTABLES
    .type   FPC_WIDEINITTABLES,@object
FPC_WIDEINITTABLES:
    .quad   0
.Le5:
    .size   FPC_WIDEINITTABLES, .Le5 - FPC_WIDEINITTABLES

.section .data.n_FPC_RESSTRINITTABLES
    .balign 8
.globl  FPC_RESSTRINITTABLES
    .type   FPC_RESSTRINITTABLES,@object
FPC_RESSTRINITTABLES:
    .quad   0
.Le6:
    .size   FPC_RESSTRINITTABLES, .Le6 - FPC_RESSTRINITTABLES

.section .fpc.n_version
    .balign 8
    .ascii  "FPC 3.0.0+dfsg-8 [2016/09/03] for x86_64 - Linux"

.section .data.n___stklen
    .balign 8
.globl  __stklen
    .type   __stklen,@object
__stklen:
    .quad   8388608

.section .data.n___heapsize
    .balign 8
.globl  __heapsize
    .type   __heapsize,@object
__heapsize:
    .quad   0

.section .data.n___fpc_valgrind
.globl  __fpc_valgrind
    .type   __fpc_valgrind,@object
__fpc_valgrind:
    .byte   0

.section .data.n_FPC_RESLOCATION
    .balign 8
.globl  FPC_RESLOCATION
    .type   FPC_RESLOCATION,@object
FPC_RESLOCATION:
    .quad   0
# End asmlist al_globals
# Begin asmlist al_typedconsts

.section .rodata.n__$SIMPLE$_Ld1
    .balign 8
.globl  _$SIMPLE$_Ld1
_$SIMPLE$_Ld1:
    .ascii  "\013hello world\000"

.section .rodata.n__$SIMPLE$_Ld2
    .balign 8
.globl  _$SIMPLE$_Ld2
_$SIMPLE$_Ld2:
    .ascii  "\022double of 100 is: \000"
# End asmlist al_typedconsts
# Begin asmlist al_dwarf_frame

.section .debug_frame
.Lc11:
    .long   .Lc13-.Lc12
.Lc12:
    .long   -1
    .byte   1
    .byte   0
    .uleb128    1
    .sleb128    -4
    .byte   16
    .byte   12
    .uleb128    7
    .uleb128    8
    .byte   5
    .uleb128    16
    .uleb128    2
    .balign 4,0
.Lc13:
    .long   .Lc15-.Lc14
.Lc14:
    .quad   .Lc11
    .quad   .Lc1
    .quad   .Lc2-.Lc1
    .byte   4
    .long   .Lc3-.Lc1
    .byte   14
    .uleb128    16
    .byte   4
    .long   .Lc4-.Lc3
    .byte   5
    .uleb128    6
    .uleb128    4
    .byte   4
    .long   .Lc5-.Lc4
    .byte   13
    .uleb128    6
    .balign 4,0
.Lc15:
    .long   .Lc17-.Lc16
.Lc16:
    .quad   .Lc11
    .quad   .Lc6
    .quad   .Lc7-.Lc6
    .byte   4
    .long   .Lc8-.Lc6
    .byte   14
    .uleb128    16
    .byte   4
    .long   .Lc9-.Lc8
    .byte   5
    .uleb128    6
    .uleb128    4
    .byte   4
    .long   .Lc10-.Lc9
    .byte   13
    .uleb128    6
    .balign 4,0
.Lc17:
# End asmlist al_dwarf_frame
.section .note.GNU-stack,"",%progbits

Kita bisa melihat assembly setelah menjadi executable. Secara default, simbol tidak akan dihasilkan, jadi perlu ditambahkan opsi -g.

fpc -g -O4 simple.pas

    

00000000004001c0 <P$SIMPLE_$$_DOUBLE_IT$SMALLINT$$SMALLINT>:
  4001c0:   66 89 f8                mov    %di,%ax
  4001c3:   0f bf c0                movswl %ax,%eax
  4001c6:   d1 e0                   shl    %eax
  4001c8:   0f bf c0                movswl %ax,%eax
  4001cb:   c3                      retq   
  
00000000004001d0 <PASCALMAIN>:
  4001d0:   53                      push   %rbx
  4001d1:   e8 9a 66 01 00          callq  416870 <FPC_INITIALIZEUNITS>
  4001d6:   e8 75 c0 01 00          callq  41c250 <fpc_get_output>
  4001db:   48 89 c3                mov    %rax,%rbx
  4001de:   48 ba c0 2c 42 00 00    movabs $0x422cc0,%rdx
  4001e5:   00 00 00 
  4001e8:   48 89 de                mov    %rbx,%rsi
  4001eb:   bf 00 00 00 00          mov    $0x0,%edi
  4001f0:   e8 0b c3 01 00          callq  41c500 <FPC_WRITE_TEXT_SHORTSTR>
  4001f5:   e8 d6 64 01 00          callq  4166d0 <FPC_IOCHECK>
  4001fa:   48 89 df                mov    %rbx,%rdi
  4001fd:   e8 2e c2 01 00          callq  41c430 <fpc_writeln_end>
  400202:   e8 c9 64 01 00          callq  4166d0 <FPC_IOCHECK>
  400207:   e8 44 c0 01 00          callq  41c250 <fpc_get_output>
  40020c:   48 89 c3                mov    %rax,%rbx
  40020f:   48 ba d0 2c 42 00 00    movabs $0x422cd0,%rdx
  400216:   00 00 00 
  400219:   48 89 de                mov    %rbx,%rsi
  40021c:   bf 00 00 00 00          mov    $0x0,%edi
  400221:   e8 da c2 01 00          callq  41c500 <FPC_WRITE_TEXT_SHORTSTR>
  400226:   e8 a5 64 01 00          callq  4166d0 <FPC_IOCHECK>
  40022b:   bf 64 00 00 00          mov    $0x64,%edi
  400230:   e8 8b ff ff ff          callq  4001c0 <P$SIMPLE_$$_DOUBLE_IT$SMALLINT$$SMALLINT>
  400235:   66 89 c2                mov    %ax,%dx
  400238:   48 0f bf d2             movswq %dx,%rdx
  40023c:   48 89 de                mov    %rbx,%rsi
  40023f:   bf 00 00 00 00          mov    $0x0,%edi
  400244:   e8 e7 cb 01 00          callq  41ce30 <fpc_write_text_sint>
  400249:   e8 82 64 01 00          callq  4166d0 <FPC_IOCHECK>
  40024e:   48 89 df                mov    %rbx,%rdi
  400251:   e8 da c1 01 00          callq  41c430 <fpc_writeln_end>
  400256:   e8 75 64 01 00          callq  4166d0 <FPC_IOCHECK>
  40025b:   e8 b0 6a 01 00          callq  416d10 <FPC_DO_EXIT>
  400260:   5b                      pop    %rbx
  400261:   c3                      retq

Meskipun sudah memakai tingkat optimasi maksimum, kode assembly yang dihasilkan oleh FPC masih cukup panjang.

Kompilasi Go ke native

Go adalah contoh lain bahasa yang dikompilasi ke native code.

package main

import "fmt"

func double_it(x int) int {
    return 2*x
}

func main() {
    fmt.Println("hello world")
    fmt.Println("double of 100 is: ", double_it(100))
}

go tool compile -S  simple.go

Kode yang dihasilkan cukup panjang, jadi tidak akan saya tampilkan seluruhnya

Disassembly of section .text:

0000000000401000 <main.main>:
  401000:   64 48 8b 0c 25 f8 ff    mov    %fs:0xfffffffffffffff8,%rcx
  401007:   ff ff 
  401009:   48 8d 44 24 f0          lea    -0x10(%rsp),%rax
  40100e:   48 3b 41 10             cmp    0x10(%rcx),%rax
  401012:   0f 86 7b 01 00 00       jbe    401193 <main.main+0x193>
  401018:   48 81 ec 90 00 00 00    sub    $0x90,%rsp
  40101f:   48 89 ac 24 88 00 00    mov    %rbp,0x88(%rsp)
  401026:   00 
  401027:   48 8d ac 24 88 00 00    lea    0x88(%rsp),%rbp
  40102e:   00 
  40102f:   48 8d 05 8b 48 0a 00    lea    0xa488b(%rip),%rax        # 4a58c1 <go.string.*+0xb11>
  401036:   48 89 44 24 58          mov    %rax,0x58(%rsp)
  40103b:   48 c7 44 24 60 0b 00    movq   $0xb,0x60(%rsp)
  401042:   00 00 
  401044:   48 c7 44 24 38 00 00    movq   $0x0,0x38(%rsp)
  40104b:   00 00 
  40104d:   48 c7 44 24 40 00 00    movq   $0x0,0x40(%rsp)
  401054:   00 00 
  401056:   48 8d 05 83 95 08 00    lea    0x89583(%rip),%rax        # 48a5e0 <type.*+0xd5e0>
  40105d:   48 89 04 24             mov    %rax,(%rsp)
  401061:   48 8d 4c 24 58          lea    0x58(%rsp),%rcx
  401066:   48 89 4c 24 08          mov    %rcx,0x8(%rsp)
  40106b:   48 c7 44 24 10 00 00    movq   $0x0,0x10(%rsp)
  401072:   00 00 
  401074:   e8 97 9f 00 00          callq  40b010 <runtime.convT2E>
  401079:   48 8b 44 24 20          mov    0x20(%rsp),%rax
  40107e:   48 8b 4c 24 18          mov    0x18(%rsp),%rcx
  401083:   48 89 4c 24 38          mov    %rcx,0x38(%rsp)
  401088:   48 89 44 24 40          mov    %rax,0x40(%rsp)
  40108d:   48 8d 44 24 38          lea    0x38(%rsp),%rax
  401092:   48 89 04 24             mov    %rax,(%rsp)
  401096:   48 c7 44 24 08 01 00    movq   $0x1,0x8(%rsp)
  40109d:   00 00 
  40109f:   48 c7 44 24 10 01 00    movq   $0x1,0x10(%rsp)
  4010a6:   00 00 
  4010a8:   e8 d3 45 05 00          callq  455680 <fmt.Println>
  4010ad:   48 8d 05 1e 55 0a 00    lea    0xa551e(%rip),%rax        # 4a65d2 <go.string.*+0x1822>
  4010b4:   48 89 44 24 48          mov    %rax,0x48(%rsp)
  4010b9:   48 c7 44 24 50 12 00    movq   $0x12,0x50(%rsp)
  4010c0:   00 00 
  4010c2:   48 c7 44 24 30 c8 00    movq   $0xc8,0x30(%rsp)
  4010c9:   00 00 
  4010cb:   48 c7 44 24 68 00 00    movq   $0x0,0x68(%rsp)
  4010d2:   00 00 
  4010d4:   48 c7 44 24 70 00 00    movq   $0x0,0x70(%rsp)
  4010db:   00 00 
  4010dd:   48 c7 44 24 78 00 00    movq   $0x0,0x78(%rsp)
  4010e4:   00 00 
  4010e6:   48 c7 84 24 80 00 00    movq   $0x0,0x80(%rsp)
  4010ed:   00 00 00 00 00 
  4010f2:   48 8d 05 e7 94 08 00    lea    0x894e7(%rip),%rax        # 48a5e0 <type.*+0xd5e0>
  4010f9:   48 89 04 24             mov    %rax,(%rsp)
  4010fd:   48 8d 44 24 48          lea    0x48(%rsp),%rax
  401102:   48 89 44 24 08          mov    %rax,0x8(%rsp)
  401107:   48 c7 44 24 10 00 00    movq   $0x0,0x10(%rsp)
  40110e:   00 00 
  401110:   e8 fb 9e 00 00          callq  40b010 <runtime.convT2E>
  401115:   48 8b 44 24 18          mov    0x18(%rsp),%rax
  40111a:   48 8b 4c 24 20          mov    0x20(%rsp),%rcx
  40111f:   48 89 44 24 68          mov    %rax,0x68(%rsp)
  401124:   48 89 4c 24 70          mov    %rcx,0x70(%rsp)
  401129:   48 8d 05 f0 8f 08 00    lea    0x88ff0(%rip),%rax        # 48a120 <type.*+0xd120>
  401130:   48 89 04 24             mov    %rax,(%rsp)
  401134:   48 8d 44 24 30          lea    0x30(%rsp),%rax
  401139:   48 89 44 24 08          mov    %rax,0x8(%rsp)
  40113e:   48 c7 44 24 10 00 00    movq   $0x0,0x10(%rsp)
  401145:   00 00 
  401147:   e8 c4 9e 00 00          callq  40b010 <runtime.convT2E>
  40114c:   48 8b 44 24 20          mov    0x20(%rsp),%rax
  401151:   48 8b 4c 24 18          mov    0x18(%rsp),%rcx
  401156:   48 89 4c 24 78          mov    %rcx,0x78(%rsp)
  40115b:   48 89 84 24 80 00 00    mov    %rax,0x80(%rsp)
  401162:   00 
  401163:   48 8d 44 24 68          lea    0x68(%rsp),%rax
  401168:   48 89 04 24             mov    %rax,(%rsp)
  40116c:   48 c7 44 24 08 02 00    movq   $0x2,0x8(%rsp)
  401173:   00 00 
  401175:   48 c7 44 24 10 02 00    movq   $0x2,0x10(%rsp)
  40117c:   00 00 
  40117e:   e8 fd 44 05 00          callq  455680 <fmt.Println>
  401183:   48 8b ac 24 88 00 00    mov    0x88(%rsp),%rbp
  40118a:   00 
  40118b:   48 81 c4 90 00 00 00    add    $0x90,%rsp
  401192:   c3                      retq

Kompilasi Java ke .class

Contoh untuk Java saya samakan dengan C:

Kompilasi

class Simple {

    private static int doubleIt(int d) {
        return d*2;
    }

    public static void main(String argv[]) {
        System.out.println("hello world");
        System.out.println("double of 100 is: " + doubleIt(100));     
    }
    
}

javac Simple.java

Hasilnya adalah langsung sebuah file .class.

hexdump -C Simple.class
00000000  ca fe ba be 00 00 00 34  00 31 0a 00 0d 00 18 09  |.......4.1......|
00000010  00 19 00 1a 08 00 1b 0a  00 1c 00 1d 07 00 1e 0a  |................|
00000020  00 05 00 18 08 00 1f 0a  00 05 00 20 0a 00 0c 00  |........... ....|
00000030  21 0a 00 05 00 22 0a 00  05 00 23 07 00 24 07 00  |!...."....#..$..|
00000040  25 01 00 06 3c 69 6e 69  74 3e 01 00 03 28 29 56  |%...<init>...()V|
00000050  01 00 04 43 6f 64 65 01  00 0f 4c 69 6e 65 4e 75  |...Code...LineNu|
00000060  6d 62 65 72 54 61 62 6c  65 01 00 08 64 6f 75 62  |mberTable...doub|
00000070  6c 65 49 74 01 00 04 28  49 29 49 01 00 04 6d 61  |leIt...(I)I...ma|
00000080  69 6e 01 00 16 28 5b 4c  6a 61 76 61 2f 6c 61 6e  |in...([Ljava/lan|
00000090  67 2f 53 74 72 69 6e 67  3b 29 56 01 00 0a 53 6f  |g/String;)V...So|
000000a0  75 72 63 65 46 69 6c 65  01 00 0b 53 69 6d 70 6c  |urceFile...Simpl|
000000b0  65 2e 6a 61 76 61 0c 00  0e 00 0f 07 00 26 0c 00  |e.java.......&..|
000000c0  27 00 28 01 00 0b 68 65  6c 6c 6f 20 77 6f 72 6c  |'.(...hello worl|
000000d0  64 07 00 29 0c 00 2a 00  2b 01 00 17 6a 61 76 61  |d..)..*.+...java|
000000e0  2f 6c 61 6e 67 2f 53 74  72 69 6e 67 42 75 69 6c  |/lang/StringBuil|
000000f0  64 65 72 01 00 12 64 6f  75 62 6c 65 20 6f 66 20  |der...double of |
00000100  31 30 30 20 69 73 3a 20  0c 00 2c 00 2d 0c 00 12  |100 is: ..,.-...|
00000110  00 13 0c 00 2c 00 2e 0c  00 2f 00 30 01 00 06 53  |....,..../.0...S|
00000120  69 6d 70 6c 65 01 00 10  6a 61 76 61 2f 6c 61 6e  |imple...java/lan|
00000130  67 2f 4f 62 6a 65 63 74  01 00 10 6a 61 76 61 2f  |g/Object...java/|
00000140  6c 61 6e 67 2f 53 79 73  74 65 6d 01 00 03 6f 75  |lang/System...ou|
00000150  74 01 00 15 4c 6a 61 76  61 2f 69 6f 2f 50 72 69  |t...Ljava/io/Pri|
00000160  6e 74 53 74 72 65 61 6d  3b 01 00 13 6a 61 76 61  |ntStream;...java|
00000170  2f 69 6f 2f 50 72 69 6e  74 53 74 72 65 61 6d 01  |/io/PrintStream.|
00000180  00 07 70 72 69 6e 74 6c  6e 01 00 15 28 4c 6a 61  |..println...(Lja|
00000190  76 61 2f 6c 61 6e 67 2f  53 74 72 69 6e 67 3b 29  |va/lang/String;)|
000001a0  56 01 00 06 61 70 70 65  6e 64 01 00 2d 28 4c 6a  |V...append..-(Lj|
000001b0  61 76 61 2f 6c 61 6e 67  2f 53 74 72 69 6e 67 3b  |ava/lang/String;|
000001c0  29 4c 6a 61 76 61 2f 6c  61 6e 67 2f 53 74 72 69  |)Ljava/lang/Stri|
000001d0  6e 67 42 75 69 6c 64 65  72 3b 01 00 1c 28 49 29  |ngBuilder;...(I)|
000001e0  4c 6a 61 76 61 2f 6c 61  6e 67 2f 53 74 72 69 6e  |Ljava/lang/Strin|
000001f0  67 42 75 69 6c 64 65 72  3b 01 00 08 74 6f 53 74  |gBuilder;...toSt|
00000200  72 69 6e 67 01 00 14 28  29 4c 6a 61 76 61 2f 6c  |ring...()Ljava/l|
00000210  61 6e 67 2f 53 74 72 69  6e 67 3b 00 20 00 0c 00  |ang/String;. ...|
00000220  0d 00 00 00 00 00 03 00  00 00 0e 00 0f 00 01 00  |................|
00000230  10 00 00 00 1d 00 01 00  01 00 00 00 05 2a b7 00  |.............*..|
00000240  01 b1 00 00 00 01 00 11  00 00 00 06 00 01 00 00  |................|
00000250  00 01 00 0a 00 12 00 13  00 01 00 10 00 00 00 1c  |................|
00000260  00 02 00 01 00 00 00 04  1a 05 68 ac 00 00 00 01  |..........h.....|
00000270  00 11 00 00 00 06 00 01  00 00 00 04 00 09 00 14  |................|
00000280  00 15 00 01 00 10 00 00  00 46 00 03 00 01 00 00  |.........F......|
00000290  00 26 b2 00 02 12 03 b6  00 04 b2 00 02 bb 00 05  |.&..............|
000002a0  59 b7 00 06 12 07 b6 00  08 10 64 b8 00 09 b6 00  |Y.........d.....|
000002b0  0a b6 00 0b b6 00 04 b1  00 00 00 01 00 11 00 00  |................|
000002c0  00 0e 00 03 00 00 00 08  00 08 00 09 00 25 00 0a  |.............%..|
000002d0  00 01 00 16 00 00 00 02  00 17                    |..........|
000002da

Format JAR

Program Java bisa dipackage dalam sebuah file .jar (untuk library, app desktop, applet), war (web), ear. Semuanya sebenarnya adalah file zip biasa yang bisa dibuka dengan program ekstraksi zip (baik via command line ataupun dengan GUI seperti 7zip).

Untuk membuat file JAR, kita perlu membuat manifest.txt, isinya seperti ini

Main-Class: Simple

Buat file jarnya:

jar cvfm simple.jar manifest.txt Simple.class

Sekarang file jar-nya bisa dijalankan

java -jar simple.jar

Kita juga bisa membuat jar tanpa manifest, tapi untuk mengeksekusi jar-nya perlu diberikan nama kelas utamanya seperti ini:

java -cp simple.jar Simple

Melihat Bytecode

Kita bisa melihat bytecode-nya dengan

javap -c -p -s Simple

Compiled from "Simple.java"
class Simple {
  Simple();
    descriptor: ()V
    Code:
       0: aload_0
       1: invokespecial #1                  // Method java/lang/Object."<init>":()V
       4: return
    LineNumberTable:
      line 1: 0

  private static int doubleIt(int);
    descriptor: (I)I
    Code:
       0: iload_0
       1: iconst_2
       2: imul
       3: ireturn
    LineNumberTable:
      line 4: 0

  public static void main(java.lang.String[]);
    descriptor: ([Ljava/lang/String;)V
    Code:
       0: getstatic     #2                  // Field java/lang/System.out:Ljava/io/PrintStream;
       3: ldc           #3                  // String hello world
       5: invokevirtual #4                  // Method java/io/PrintStream.println:(Ljava/lang/String;)V
       8: getstatic     #2                  // Field java/lang/System.out:Ljava/io/PrintStream;
      11: new           #5                  // class java/lang/StringBuilder
      14: dup
      15: invokespecial #6                  // Method java/lang/StringBuilder."<init>":()V
      18: ldc           #7                  // String double of 100 is:
      20: invokevirtual #8                  // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
      23: bipush        100
      25: invokestatic  #9                  // Method doubleIt:(I)I
      28: invokevirtual #10                 // Method java/lang/StringBuilder.append:(I)Ljava/lang/StringBuilder;
      31: invokevirtual #11                 // Method java/lang/StringBuilder.toString:()Ljava/lang/String;
      34: invokevirtual #4                  // Method java/io/PrintStream.println:(Ljava/lang/String;)V
      37: return
    LineNumberTable:
      line 8: 0
      line 9: 8
      line 10: 37
}

Kompilasi C# ke Common Intermediate Language (CIL)

Contoh teknologi lain yang menggunakan byte code adalah .NET. Semua bahasa .NET (C#, VB.NET dsb) akan dikompilasi ke Common Intermediate Language (CIL).

Kompilasi

Kompilasi di command line:

mcs Simple.cs

public class Simple
{
   private static int DoubleIt(int a) {
       return 2*a;
   }

   public static void Main()
   {
      System.Console.WriteLine("hello world");
      System.Console.WriteLine("double of 100 is: "  + DoubleIt(100));
   }
}

Melihat CIL

Untuk melihat MSIL menggunakan mono:

monodis Simple.exe

Di Windows, kita juga bisa menggunakan ildasm.exe

.assembly extern mscorlib
{
  .ver 4:0:0:0
  .publickeytoken = (B7 7A 5C 56 19 34 E0 89 ) // .z\V.4..
}
.assembly 'Simple'
{
  .custom instance void class [mscorlib]System.Runtime.CompilerServices.RuntimeCompatibilityAttribute::'.ctor'() =  (
        01 00 01 00 54 02 16 57 72 61 70 4E 6F 6E 45 78   // ....T..WrapNonEx
        63 65 70 74 69 6F 6E 54 68 72 6F 77 73 01       ) // ceptionThrows.

  .hash algorithm 0x00008004
  .ver  0:0:0:0
}
.module Simple.exe // GUID = {25A1C972-470F-4604-A6EE-176F3D367F7D}


  .class public auto ansi beforefieldinit Simple
    extends [mscorlib]System.Object
  {

    // method line 1
    .method public hidebysig specialname rtspecialname 
           instance default void '.ctor' ()  cil managed 
    {
        // Method begins at RVA 0x2050
    // Code size 7 (0x7)
    .maxstack 8
    IL_0000:  ldarg.0 
    IL_0001:  call instance void object::'.ctor'()
    IL_0006:  ret 
    } // end of method Simple::.ctor

    // method line 2
    .method private static hidebysig 
           default int32 DoubleIt (int32 a)  cil managed 
    {
        // Method begins at RVA 0x2058
    // Code size 4 (0x4)
    .maxstack 8
    IL_0000:  ldc.i4.2 
    IL_0001:  ldarg.0 
    IL_0002:  mul 
    IL_0003:  ret 
    } // end of method Simple::DoubleIt

    // method line 3
    .method public static hidebysig 
           default void Main ()  cil managed 
    {
        // Method begins at RVA 0x205d
    .entrypoint
    // Code size 38 (0x26)
    .maxstack 8
    IL_0000:  ldstr "hello world"
    IL_0005:  call void class [mscorlib]System.Console::WriteLine(string)
    IL_000a:  ldstr "double of 100 is: "
    IL_000f:  ldc.i4.s 0x64
    IL_0011:  call int32 class Simple::DoubleIt(int32)
    IL_0016:  box [mscorlib]System.Int32
    IL_001b:  call string string::Concat(object, object)
    IL_0020:  call void class [mscorlib]System.Console::WriteLine(string)
    IL_0025:  ret 
    } // end of method Simple::Main

  } // end of class Simple

Links: reverse-engineering