Jump to content
xisto Community
vizskywalker

Assembly Tutorial Part Ii Memory, Arithmatic

Recommended Posts

Sorry this one came out so late, busy week. As recompense, there will be an extra two this week, the next one, and then a VGA intro. Say thank you to my history teacher.

 

Also, in case you've heard about this. The assembler I'm working on is coming along well. I need suggestions for what you'd like to see in an assembler as well as people willing to test it. (I'll be doing extensive Alpha testing, so I promise with a 99.5% guarantee that it won't crash your computer.) Currently it is for windows only, sorry Linux users. And Mac people, I haven't forgotten about you, but I don't have access to an Apple computer, so I can't test anything I would pass on to you, so once I get an Apple, I will write tutorials for you guys. And without further Adieu (yes purposely misspelled, it's a lame pun, but I'm writing these things for you so back off): the tutorial:

 

Okay, if you missed the previous tutorial, you can catch it here.

 

Note: This tutorial is updated for the following OS/Compiler combinations

1: Windows/TASM

2: Windows/FASM

3: Linux/NASM

Others are forthcoming.

 

Last week we ended with a very simple Hello World Program. Today is going to build a little bit off of the code used in that program.

 

But first, something rarely covered this early on in an assembly tutorial set, and at the request of the wonderful osknockout, memory in depth.

 

As stated last week, memory in the computer is stored in segments and offsets. Each segment is 64kb long with a new segment starting every 16 bytes. This leads to overlapping segments, and means the last segments do not contain a full complement of 64kb.

 

Now, the question that should be on your mind is, "So what does this have to do with my programs?" The answer is, in short, everything. A program without variables is worthless to a user, and variables are stored in memory.

 

Note: The example structure I am going to be working with is a Windows .exe file, however, the principles I will cover apply to everything.

 

When an .exe file is loaded by Windows, several segments are defined by Windows to have a special meaning. These are the data segment, the code segment, and the stack segement.

 

The data segment is the segment at which all preset data in a program can be found. While it is possible to access this data using another segment with a greater offset, this is the standard segment address for the data. The reason this predefined segment is important, is that if you look at the binary of an .exe file, the data directly follows the code with no separation. This segment is set up so the program will know where to look for the data.

 

The code segment serves the exact same purpose as the data segment, except for the the information that is stored there. The code segment contains, as the name suggests, all of the instructions that the program must execute. The code segment identifier for the .exe file also labels the entry point, where the program should begin execution of instructions.

 

The stack is a special part of a program. It is like an extra data segment that has special rules. Items are placed on the stack in a last-in first-out method, like stacking plates. (See figure 1)

 

Figure 1:

Note: The data is vertical to help illustrate last in first off

Offset Before First action Second action Third action Fourth action

0 [] [] [] [] []

1 [] [] [] [] []

2 [] [] [] [] []

3 [] [] [] [] []

4 [] [] [] [] []

5 [] [] [] [] []

6 [] [] [] [] []

7 [] [] [] [] []

8 [] [] [22] [] []

9 [] [55] [55] [55] []

 

As the above figure illustrates, data is put on the stack at the end of the stack first. This means that sp, the stack pointer, gets decremented instead of incremented everytime something goes on the stack.

 

Okay, on to accessing memory. Remember last tutorial where I said that variables were declared by labeling a memory offset? Well, here's how you access those varibales. In TASM, which is different from other assemblers, directly stating the label means access he data pointed to by the label. (See figure 2). However, sometimes we need the memory offset of the data, for this purpose we can use either the "offset" keyword or enclose the label in square brackets ([label]). (See figure 2). In FASM and NASM, the situation is reversed. Stating a label directly access its memory offset, while enclosing it in square brackets accesses its data. (Remember, the "db" command defines a label and data)

 

Figure 2:

 

TASM FASM/NASM

Label db Label db

Offset: 0 1 2 3 4 5 6 7 Offset: 0 1 2 3 4 5 6 7

Data: [A][C][D][E][F][G][H] Data: [A][C][D][E][F][G][H]

 

mov al, Label ;al now contains "A" mov ax, Label ;ax now contains 0

mov ax, [Label] ;al now contains 0 mov al, [Label] ;al now contains A

 

Note: I went from a to c instead of a to d because a b in square brackets causes bold text.

 

The astute observers of you (by which I mean those of you not operating on less than 6 hours of sleep, by which I most likely mean no one, we're human after all, except for you, yes you, the one from Mars, sorry I got carried away) noticed that we used ax to store the offset and al to store the data. This leads to an interesting question: why? The answer lies in how we defined the data. Segments and offsets are always 16 bits, so they require a 16 bit register, such as ax. However, we used "db" to define the data. "Db" stands for declare byte, so the data must be accessed as a byte. "Db" is actually a special form of what I like to call the "d" command (this command technically doesn't exist, I use it here because I will then be able to reference it later which will be helpful). The "d" command has many different forms: "db", "dw", "dd", "dq", "dt". They stand for declare byte, declare word, declare double word, declare quad word, declare tenbyte respectively.

 

Perhaps now would be a good time to define word, double word, etc.

A word is simply two bytes, a doubleword is exactly that, so four bytes, a quadword is four words or eight bytes, and a tenbyte is ten bytes.

 

The way you define data determines how it must be accessed. If you define something as a byte, it must always be dealt with as a byte. If it is defined as a word, it must always be dealt with as a word. So how can we use "db" to define multiple bytes, like a string? In realit the "d" command does two things. 1) It defines a label of size byte, word, etc. 2) It places data into the code. The second function of the "d" command is not really related to the first.

 

Okay, I think that about covers memory. On to arithmatic.

 

First, the "add" command. This commad is pretty basic, it adds things. The format is as follows:

 

add register, {memory, immediate, register}

 

Note: the {} means any one of the options in the brackets.

In case you were wondering, an immediate is simply a number written into the code.

 

The "add" command adds whatever the second part is to the first part and stores the result in the first part. So in "add ax, 1", whatever is in ax gets increased by one and stored in ax. Users of higher level languages probably know a special name for what we just did: incrementing. Assembly has the ability to increment using the "inc" command. "Inc" simply is followed by the register you wish to increment.

 

Next up is the "sub" command. Guess what, it works exactly like the "add" command, except it subtracts. There is also an equivalent for incrementing here: decrementing. The command for this is "dec", and it works the same way as "inc".

 

Multiplication and division are a bit harder. Multiplication, the "mul" command takes one parameter, the register to multiply by. The other register is either al, ax, or eax depending on the size of the multiplying register. Acessing the result is also tricky. If the size is one byte, al is used and the result is found in ax. If the size is a word, ax is used, and the result is in ax:dx with ax being the higher part and dx being the lower part. If the size is a double word, then eax is used, and the result is found in eax:edx with eax being the high part and edx being the low part.

 

Division, the "div" command, is similar to "mul". Once again, it only takes only one parameter, the register containing the value to divide by. If this register is a byte, ax is divided by it, and the result is put in al with the remainder in ah. If the register is a word, then ax:dx (ax high part, dx low part) is divided by it, and the result is found in ax with the remainder in dx. If the register is a doubleword, then eax:edx (eax high part, edx low part; whenever you see xxxx:xxxx, the left part is always the high part and the right is always the low part, add them together to get the true value. The exception is when dealing with memory when the left is the segment and the right is the offset) is divided by it, and the result is stored in eax with the remainder in edx.

 

Now, with some basic math and the ability to use variables, let us create a program.

 

Disclaimer: Once again, the program is designed for TASM, Windows FASM adjustments follow the code. I don't have Linux because I need Windows to be compatible with the rest of the house and don't have enough space on my hard drive

for a double boot, so the Linux version is unprovided. Line numbers are for reference only, remove them before compiling.

 

---Begin Copy Next Line---

1: .MODEL SMALL ;define the type of program

2: .STACK 200h ;define the stack size

3: .386 ;state the use of 386 instructions

4:

5: .DATA ;define the data segment

6: Divide db ? ;divide is a byte to hold the result of division. the ?

7: ;means don't specify a value at compilation

8: Remainder db ? ;to hold the remainder form the division

9: Multiply dw ? ;multiply is a word to hold the result of a multiplication

10: Addition db 01h ;initial value for addition (1)

11: Subtraction db 40h ;initial value for subtraction (64)

12:

13: .CODE ;define the code segment

14:

15: Start: ;start the program

16: mov ax, seg Multiply ;move the data segment to the accumulator

17: mov ds, ax ;move data segment to data segment register

18: mov es, ax ;move data segment to es register

19:

20: mov al, Addition ;set al equal to addition

21: add al, 30h ;add 48 to al

22: mov ah, 0eh ;prepare for function 100eh, teletype character (see 23: ;Ralph Brown's Interrupt List)

24: mov bh, 00h ;page 0

25: mov bl, 07h ;color light grey

26: int 10h ;print it

27:

28: mov ah, 00h ;once again, pause until a key is pressed

29: int 16h

30:

31: mov al, Subtraction ;mov the value in subtraction to al

32: dec al ;subtract 1 from al

33: mov ah, 0eh ;prepare for function 100eh, teletype character (see 34: ;Ralph Brown's Interrupt List)

35: mov bh, 00h ;page 0

36: mov bl, 07h ;color light grey

37: int 10h ;print it

38:

39: mov ah, 00h ;once again, pause until a key is pressed

40: int 16h

41:

42: mov al, 0eeh ;set al to 16

43: mov bl, 60h ;set bl to 3

44: mul bl ;multiply al and bl store result in ax

45: mov Multiply, ax ;store the result in Multiply

46:

47: mov ax, 1301h ;Interrupt 13h

48: mov bx, 07h ;page 0, color 7

49: mov cx, 02h ;two charcters

50: mov dx, 00h ;start at top left

51: mov bp, offset Multiply ;move the offset of Multiply to base pointer

52: int 10h

53:

54: mov ah, 00h ;once again, pause until a key is pressed

55: int 16h

56:

57: mov ax, 2141h ;set al to 16

58: mov bl, 80h ;set bl to 3

59: div bl ;multiply al and bl store result in ax

60: mov Divide, al ;store the result in Multiply

61: mov Remainder, ah

62:

63: mov ax, 1301h ;Interrupt 13h

64: mov bx, 07h ;page 0, color 7

65: mov cx, 02h ;two charcters

66: mov dx, 00h ;start at top left

67: mov bp, offset Divide ;move the offset of Divide to base pointer

68: int 10h

69: ;Note: in all of these displays, the printed character is the ascii

70: ;value of the result, not the actual result

71:

72: mov ah, 00h ;once again, pause until a key is pressed

73: int 16h

74:

75: mov ax, 4c00h ;move the close program directive to the accumulator

76: int 21h ;perform interrupt 33

77: End Start ;the program ends

---End Copy on Previous Line---

 

TASM compilation:

1) Save the program as "math.asm" in the asm folder

2) open the command prompt and change to the directory that contains TASM

3) type in "tasm c:\asm\math.asm" and hit enter

4) type in "tlink c:\asm\math.obj" and hit enter

 

Windows FASM

1) change line 1 to "format MZ"

2) change line 2 to "stack 200h"

3) change line 3 to "entry cod:Start"

4) change line 5 to "segment dat"

5) change line 9 to "segment cod"

6) change line 16 to "mov ax, dat"

8) place square brackets around all references to memory labels not preceeded by "offset"

7) remove all instances of the word offset

8) remove line 77

9) Save the program as "math.asm" in the asm folder

10) open the command prompt and change to the directory that contains FASM

11) type in "fasm c:\asm\math.asm c:\asm\math.exe" and hit enter

 

To run the program in Windows:

1) type "c:\asm\math" and hit enter

2) press any key every time the program pauses (4 times)

 

Linux NASM

Forthcoming: I finally downloaded NASM for Windows and found that it does not provide a linker to create .exe files, so I'm looking into this before I provide NASM support. Hopefully I will have another hard drive soon so I can install Linux and provide true tested support for all my Linux readers. Unitl then, the commands are still the same.

 

Program explanation:

There is not a single command in tis program that we have not covered, and the comments explain what each individual piece of

 

code does. So I'm not going to repeat all of that. Instead, I'm going to go over the output.

The first diaplay of a character uses the teletype function. The ascii character for the value of al, in this case 31 which

 

is the character "1", is displayed onscreen at the current cursor location. Then the cursor is moved aherad one. THis is

 

why the next display, the question mark, follows the "1". The question mark is also displayed using the teletype function,

 

al contains 3fh (40h - 01h = 3fh) which is the ascii code for "?". The cursor is once again updated. The next display would

 

folow the question mark, except we used the display string VGA function. This function takes the location to print text, and

 

we specified the upper left corner of the screen. We specified the same location for the next display, so this display is

 

overwritten.

 

If you exmanie the code, you will find that the value of ax that we stored in Multiply is 5940h, yet the characters that were

 

displayed can be coded as 4059h. This is because the PC is a little-endian system. What this means is that in multipart

 

data, the lower part is stored first. (See figure 3) Because of this, if you move ax into data, al is stored before ah is

 

stored. The computer automatically loads the first part of the data into al and the second part of the data into ah when you

 

move data into ax. This automatic switch is why data defined in bytes can only be accessed by byte registers, and data

 

defined in words can only be accessed by word registers. However, the display string VGA interrupt reads straight from

 

memory without adjusting for how the data was defined. So it displays the lower part before the higher part. (In reality,

 

this access limitation is a compiler made limitation. Machine code fully allows you to access data stored in words as bytes

 

or vice versa.)

 

Figure 3:

 

Data dw 00h, 00h, 00h, 00h ;the commas separate different pieces of data

Data: [00][00][00][00][00][00][00][00] (represented in hex)

 

mov Data, 1234h

Data: [34][12][00][00][00][00][00][00] (represented in hex)

 

mov ax, Data

ax: 1234 (represented in hex)

 

Review:

ADD instruction

SUB instruction

MUL instruction

DIV instruction

INC instruction

DEC instruction

 

Coming up next:

Control structures

Procedures

We finally use the stack

 

Questions? Comments? Something you'd like to see? Let me know and I'll add it in.

Share this post


Link to post
Share on other sites

Wow, sounds like you're typing with 4 hours of sleep. Can take some time to process, but very nice. If anyone else happens to be reading this, you can get more information about memory addressing at the Art of Assembly Language sites like this one. I know vizskywalker won't have time to address all types of memory addressing. Wow, I actually learned something from these tuts. Keep it up.

Share this post


Link to post
Share on other sites

Thank you, yeah this was done on snippets of sleep, whenever I had a chance. I've been trying to get a wireless network card to work on my laptop so I can post more frequently and access this site (because the formatting is all weird if I copy from notepad). So far no luck, but I'm trying to be more consistent with the posts.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×
×
  • Create New...

Important Information

Terms of Use | Privacy Policy | Guidelines | We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.