Assembly language (or just ASM) is a low-level programming language that is closely related to machine code, which is directly executed by the computer's CPU. Each instruction in ASM corresponds to a specific operation in the machine code of the CPU. ASM is specific to a particular computer architecture, so different processors have different assembly languages.
For this example, we'll use x86 assembly language, which is used by Intel and AMD processors. We'll write a simple "Hello, World!" program for a 32-bit Linux system.
Here’s the complete code for a "Hello, World!" program in assembly language:
Let's break down the "Hello, World!" program to understand how it works:
section .data
: section .data
hello db 'Hello, World!', 0
0
).db (Define Byte):
db
is used to define a byte or string. It allocates storage space for a variable and initializes it with a specified value.hello db 'Hello, World!', 0
section .text
:.text
section is usually marked as read-only and executable.section .text
global _start
global
:global
directive makes the specified symbol (label) visible to the linker. This is necessary for defining the entry point of the program.global _start
_start
as the entry point of the program, making it accessible to the linker._start:
_start
is the label for the entry point of the program. When the program starts executing, it begins at this label._start:
To compile the assembly code, you will use an assembler like NASM (Netwide Assembler). First, save the above code to a file named hello.asm. Then, you can compile and link the program as follows:
nasm -f elf hello.asm # Assemble the code to an object file
ld -m elf_i386 -s -o hello hello.o # Link the object file to create the executable
./hello # Run the executable
Assembly language provides a way to write programs that are very close to the hardware, giving the programmer fine-grained control over the CPU and memory. Here are some key concepts:
eax
, ebx
, ecx
, and edx
.jmp
), conditional jumps (je
, jne
, etc.), and function calls (call and ret).Understanding assembly language requires a solid grasp of computer architecture and how the CPU executes instructions. By learning assembly, you gain a deeper understanding of how high-level programming languages are translated into machine code and how the hardware executes those instructions.
Instructions in assembly language are the basic building blocks of a program. Each instruction corresponds to a specific operation that the CPU can perform. Here are some common instructions in x86 assembly language:
mov dest, src
: Move data from source to destination.push value
: Push a value onto the stack.pop dest
: Pop a value from the stack into a destination.add dest, src
: Add source to destination.sub dest, src
: Subtract source from destination.imul dest, src
: Multiply destination by source.idiv divisor
: Divide the contents of edx:eax
by the divisor.and dest, src
: Bitwise AND operation.or dest, src
: Bitwise OR operation.xor dest, src
: Bitwise XOR operation.not dest
: Bitwise NOT operation.cmp op1, op2
: Compare op1 and op2.jmp label
: Unconditional jump to a label.je label
: Jump if equal (ZF=1).jne label
: Jump if not equal (ZF=0).jg label
: Jump if greater (SF = OF and ZF = 0).jl label
: Jump if less (SF ≠ OF).call label
: Call a function or subroutine.ret
: Return from a function.Registers are small, fast storage locations within the CPU that hold data to be processed. They are essential for executing instructions and performing calculations. Here are some common registers in the x86 architecture:
Memory addressing modes in assembly language allow you to specify where data is located. Some common addressing modes include:
mov eax, 5
: Loads the constant value 5 into the eax
register.mov eax, ebx
: Copies the value in ebx
to eax
.mov eax, [1234h]
: Loads the value at memory address 1234h
into eax
.mov eax, [ebx]
: Loads the value at the address pointed to by ebx
into eax
.mov eax, [ebx + ecx]
: Adds the values in ebx
and ecx
to form the address from which the value is loaded into eax
.Control flow instructions manage the sequence of execution of instructions in a program. They include jumps, loops, and function calls.
Jumps are used to transfer control to another part of the program. They can be unconditional or conditional.
jmp label
: Jumps to the specified label unconditionally.je label
: Jump if equal (ZF=1).jne label
: Jump if not equal (ZF=0).jg label
: Jump if greater (SF = OF and ZF = 0).jl label
: Jump if less (SF ≠ OF).Example of an unconditional jump:
Example of a conditional jump:
Loops allow repeated execution of a block of code. The loop
instruction decrements the ecx
register and jumps to the specified label if ecx
is not zero.
Example of a loop:
Functions in assembly language are blocks of code that perform a specific task and can be called from various places in the program. Functions typically save the state of the registers they use and restore it before returning to the caller.
Example of a function:
In this function:
push
and pop
are used to save and restore the base pointer (ebp
), ensuring that the caller's stack frame is not disrupted.mov
instructions are used to access function arguments from the stack.ret
returns control to the caller by popping the return address from the stack.System calls are special instructions that allow a program to request services from the operating system kernel, such as input/output operations, process control, and communication. In Linux x86 assembly, the int 0x80
instruction is used to make a system call. The syscall number is loaded into the eax
register, and any arguments are loaded into ebx
, ecx
, edx
, etc.
Example of a system call to write to the screen:
bc1q4uzvtx6nsgt7pt7678p9rqel4hkhskpxvck8uq
0x7a70a0C1889A9956460c3c9DCa8169F25Bb098af
7UcE4PzrHoGqFKHyVgsme6CdRSECCZAoWipsHntu5rZx