Jump to content
xisto Community
vizskywalker

Assembly Tutorial Part I The basics through Hello world

Recommended Posts

Okay, because assembly is an awesome language that is being viewed as archaic in the face of high level languages, I'm writing this series of tutorials.


Note: This tutorial assumes a working knowledge of binary and hex. If you don't have a working knowledge of these number systems, and you are interested in learning assembler, let me know and I'll write a brief introduction to binary. Currently, until I can do some testing, this tutorial also only works for Windows and MS-DOS computers. If anyone with a Linux box is willing to test a routine for me, please let me know.


The first thing you need to write assembly is a compiler. I use TASM which is made by Borland and can be found in their C++ Builder 5. All of the code provided here will be written for TASM. If you don't feel like buying the Borland C++ compiler you can also use NASM or FASM. I will try to point out how to change the code so it compiles under these two compilers, but I won't promise anything because I may forget.


There are also some utilities that will prove very important. The first is Ralf Brown's Interrupt List. The second is the intel software developement manuals, but these are not needed at this point in time.


Unlike higher level languages, assembly does not really have variables. It has two kinds of data storage units. The first, registers, you can actually perform operations on. The second, labels, are simply placeholders for information, like pointers in a higher level language.


The 386 and newer series of chip have four basic registers. They are the accumulator register, ax, the base register, bx, the count register, cx, and the displacement register, dx.


Each register is subdivide into two smaller registers, the high register, xh, and the low register, xl. The large register contains 16 bits, while the sub registers contain 8 bits. There is also the extended form of the register which simply adds an "e" to the beginning of the complete register's name. Thus we have eax, ebx, ecx, and edx. These registers contain 32 bits. (See Figure 1)


Figure 1: The registers using the accumulator as an example

|------------------------------EAX------------------------------|

[][][][][][][][] [][][][][][][][] [][][][][][][][] [][][][][][][][]


|--------------AX--------------|

[][][][][][][][] [][][][][][][][]


|------AH-----||------AL------|

[][][][][][][][] [][][][][][][][]


The 386 and beyond also have segment registers. They are ds, the data segment, cs, the code segment, es, the extra segment, and ss, the stack segment. There are also special registers like si, the source index, di, the destination index, bp, the base pointer, sp, the stack pointer, and ip, the instruction pointer. Finally we have the extra registers, gs, and fs.


Labels appear in the data segment. They are simply words that stand for a spot in the memory. You can use a certain command to place data at that spot in memory, and then use the label to access the data. You can't, however, manipulate the data.


It is also important to understand memory management. RAM is divided into 64kb segments. Every 16 bytes there is another segment, so the segments overlap. This means that the last segments in RAM don't have the full 64kb of memory. Individual bytes are addressed by their offset from the segment. To illustrate this, let's look at segment 1000 and start at the first offset. This is addressed like this 1000:0000. If we go to the 16th offset, we can address it in two ways: 1000:0010 or 1001:0000. What this boils down to is the offset from the beginning of RAM. If you place a 0 at the end of the segment, and a 0 at the beginning of the offset and look at it like this, 100000:00010, and then you add the segment and offset together, you get the offset from the beginning of RAM: 10010. I know this probably isn't that clear, but really, all you need to know is that memory is addressed by segment and offset. (See Figure 2)


Figure 2: Segment and offset Style Memory

Segment: 1000

[0][1][2][3][4][5][6][7][8][9][10][11][12][13][14][15]


Segment: 1001 (But it is 16 bytes past 1000 so it is also 1000:0010)

[0][1][2][3][4][5][6][7][8][9][10][11][12][13][14][15]


Segment: 1002 (But it is 32 bytes past 1000 so it is also 1000:0020)

[0][1][2][3][4][5][6][7][8][9][10][11][12][13][14][15]


Segment: 1003 (But it is 48 bytes past 1000 so it is also 1000:0030)

[0][1][2][3][4][5][6][7][8][9][10][11][12][13][14][15]


Segment: 1004 (But it is 64 bytes past 1000 so it is also 1000:0040)

[0][1][2][3][4][5][6][7][8][9][10][11][12][13][14][15]


Segment: 1005 (But it is 80 bytes past 1000 so it is also 1000:0050)

[0][1][2][3][4][5][6][7][8][9][10][11][12][13][14][15]


And now the moment you have all been waiting for. The sample code. Copy the code below to a blank text document, then follow the instructions to compile it.

---Begin Copy Next Line---

1: .MODEL SMALL ;define the type of program

2: .STACK 200h ;define the stack size

3: .386 ;state the use of 386 instructions

4:

5: .DATA ;define the data segment

6:

7: Message db "Hello World$" ;the data

8:

9: .CODE ;define the code segment

10:

11: Start: ;start the program

12: mov ax, seg Message ;move the segment of Message to the accumulator

13: mov dx, offset Message ;move the offset of Message to displacement register

14: mov ds, ax ;move segment of Message to data segment register

15: mov ah, 09h ;move 9 to the accumulator

16: int 21h ;perform interrupt 33

17:

18: mov ah, 00h ;move 0 to the accumulator

19: int 16h ;perform interrupt 22

20:

21: mov ax, 4c00h ;move the close program directive to the accumulator

22: int 21h ;perform interrupt 33

23: End Start ;the program ends

---End Copy on Previous Line---


Okay, now it's time to compile:

Instructions for everyone

1) First, remove the line numbers, they are for reference only

3) Create a new folder on the C: drive and call it "asm"

2) Save the file as "Hello.asm" in the newly created folder


TASM instructions

1) Open up the commad prompt

2) Change to the directory that contains TASM

3) Type in "tasm c:\asm\hello.asm" and hit enter

4) Type in "tlink c:\asm\hello.obj c:\asm\hello.exe" and hit enter

5) To run the program, type in "c:\asm\hello.exe" and hit enter


FASM instructions

1) Change line 1 to "format MZ"

2) Change line 2 to "stack 200h"

3) Chaneg line 3 to "entry cod:Start"

4) Change line 5 to "segment dat"

5) Change line 9 to "segment cod"

6) Change line 12 to "mov ax, dat"

7) Change line 13 to "mov dx, Message"

8) Remove line 23

9) Save the file

10) Open up the command prompt

11) Change to the directory that contains FASM

12) Type in "fasm c:\asm\hello.asm c:\asm\hello.exe" and hit enter

13) To run the program type in "c:\asm\hello.exe" and hit enter


NASM Instructions

Forthcoming


For the explanation of the above code:


The first three lines set up the format of the file.

Line 1 tells the compiler to use a .EXE header.

Line 2 sets up the stack, a special storage area that will be covered in a later part.

Line 3 sets up the entry point for FASM and says we want to have access to all of the instructions a 386 has for TASM.


Lines 5 through 8 set up the data:

Line 5 defines the data segment.

Line 7 actually uses our first assembly instruction, "db". Db says insert data at this point in memory that will be stored and accessed in bytes.


Lines 9 through 23 contain the code:

Line 9 defines the code segment.

Line 11 sets the start of the code.

Line 12 has the second of our assembly instructions, "mov". Mov takes data from one place and places it in another, frequently one of those places is a register. This line moves the segment number of the data segment into the accumulator

Line 13 moves the offset from the data segment of Message into the displacement register

Line 14 moves the contents of the accumulator (the segment number of the data segment) into the data segment register. The reason it had to go into the accumulator first is that the segment registers cannot be accessed directly, onlyt hrough other registers.

Line 15 moves 9 into the high byte of the accumulator.

Line 16 is another assembly instruction, "int". Int performs a hardware interrupt, a special task. If you downloaded Ralf's interrupt list, open up the program to view the interrupts and open the interrupt list. Scroll down until you see "2109" in the left column. Click on this interrupt. If you notice, when you perform interrupt 21 for this function, writing a string to the screen, ah must be 9h, which we set up. Ds:dx must also point to the segment and offset of the string, which we have. The string must end with a "$", which, if you look at Message, it does.

Line 18 moves 0 into the high byte of the accumulator.

Line 19 performs interupt 22 which, when it has function 0, waits for a keypress to continue.

Line 21 moves 4c00, which is a hex number, into the accumulator.

Line 22 calls the DOS interrupt again, this time the subfunction exits the program.

Line 23 states the end of the code.


All of the "h"'s after every number mean that those numbers are in hexadecimal. Without those h's, the numbers would be assumed to be in decimal. The ";"'s after every line indicate a comma. Everything that comes after a ; on a line is ignored by the compiler.


Review:

Registers

Memory

MOV instruction

DB instruction

INT instruction


Coming up next week:

Basic arithmetic

Explanation of the stack


Questions? Comments? Something you'd like to see? Let me know and I'll add it in.

Share this post


Link to post
Share on other sites

;'s. :D you forgot to define the different types of segmentation schemes, but still, you did a very fine job introducing everything, better than many I've seen. :applause:

Share this post


Link to post
Share on other sites

Update for Linux/NASM:Okay, this update covers NASM for both Linux and Windows users. First, replace all "h"'s at the end of numbers with "0x", this is the NASM format to designate hex numbers. Then, remove lines 1 thorugh 3. Replace line 5 with "segment data", and line 9 with "segment code". Replace line 12 with "mov ax, data" and switch lines 13 and 14. Then, for Linux users, replace line 21 with "mov eax, 0x01" and replace line 22 with "int 0x80". That ought to do it, I haven't yet had a chance to test this for wither format, so if it doesn't work, please let me know.

Share this post


Link to post
Share on other sites
A problem running the applicationAssembly Tutorial Part I

I'm using the FASM compiler. When I go into the command prompt, and type in the directory it says the following:C:asmfasm168tutorials is not recognized as an internal or external, operational program or batch file.

-feedback by steve siverling

 

Share this post


Link to post
Share on other sites

Hehe, I think I understood most of it! Will you make anymore similar tutorials soon?


The post was made over 5 years ago...



Anyways, great tutorial and it helps understand assembly a lot better than I did already. It's something I've been lightly treading over a while but didn't really know where to start.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×
×
  • Create New...

Important Information

Terms of Use | Privacy Policy | Guidelines | We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.