Monday, April 12, 2010

assembler tutorial: 1 - hello, world!


but thunder interrupted all their fears


now that you're ready to begin writing we'll dig right in. here is our first heavily-annotated program, hello.exe.



open up your dosbox and change to the mycode directory.
cd mycode
run editv with a new file, hello.asm.
editv hello.asm
i strongly advise you to turn on line numbering. you can do this by pressing ctrl-o followed by b, or you can just select the options menu with alt-o, scroll down to line numbers and press enter.

we begin with comments. if you've ever read a book on programming it has probably emphasized the need for good comments. in assembler comments are even more important because the language syntax is so terse. a comment begins with a semi-colon and continues to the end of the line of text.


1 ; hello.asm
2 ; Demonstrates how to write an assembly program for DOS with NASM using
3 ; the ubiquitous 'Hello World' string. To create an EXE file we'll
4 ; first assemble an OBJ file then link it. I'm using the public domain
5 ; linker WarpLink.
6 ; To assemble: nasm -f obj hello.asm
7 ; To link: warplink hello.obj
8 ;
9 ; Robert Ritter <rritter@centriq.com>
10 ; 12 Apr 2010
11


remember that dos addresses memory in segments. the first thing we'll need to do is reserve some memory for these segments. since this program doesn't work with a lot of data our data segment is pretty small. it defines a label, message, that will be the memory address of the first byte of the message that we're going to print out. the db (define byte) operator identifies a sequence of bytes that make up our data. there is also a dw (define word) for 16-bit values, and dd (define double word) for 32-bit values. most of the time you'll probably just treat data as a sequence of bytes, so you'll likely use db more than the others.

you may have noticed the characters that follow the obvious string, "Hello World." if we want to advance our output to the next line we must insert a newline character. this is like pressing the enter key on a keyboard. in high-level languages like c we use a string like "\n" to represent a newline, but in dos this is actually a two-byte sequence: 0dh (carriage return) and 0ah (linefeed.) the dollar sign character is a terminator that marks the end of the string. not all strings must be terminated with a dollar sign, but the dos printing service that we're going to use requires it.

notice that the characters that make up a string are enclosed in quotes. double or single quotes, it makes no difference. those characters outside the quotes are treated as literal bytes.


12 ; ----------------------------------------------------------------------
13 segment data
14 ; DOS EXE files use segmented memory which allows them to address more
15 ; than 64KB at a time. Here we define the data segment to store the
16 ; message that we're going to print on the screen.
17 ;
18 message db 'Hello World', 0dh, 0ah, '$'
19


the next thing that we want to do is reserve some memory for our stack segment. the resb (reserve byte) operator is used to set aside an uninitialized piece of memory of a given size. there is also a resw (reserve word) for 16-bit values and resd (reserve double word) for 32-bit values. we're going to allocate a 64-byte hunk'o'ram for the stack and set the label stackTop to point to the address immediately following the stack. for more info on how the stack works, see my previous post.


20 ; ----------------------------------------------------------------------
21 segment stack stack
22 ; The stack is used as temporary storage for values during the
23 ; program's execution. Sometimes we use it in our code, and sometimes
24 ; DOS uses it, especially when we call DOS interrupts. We'll set up a
25 ; small but serviceable stack for this program since we're going to be
26 ; calling on DOS services.
27 ;
28 resb 64
29 stackTop ; The label 'stackTop' is the address of the end (top)
30 ; of the stack. We'll need this to initialize the
31 ; stack pointer in the CPU.
32


the code segment is where the cool stuff happens. remember that a dos exe file may have more than one code segment to get around that pesky 64kb barrier we discussed last time. though multiple code segments are allowed, only one can be the actual entry point of our program. this is defined with a special label, ..start. note that i used a colon at the end of this label. a label may end with a colon, but this is not required. you may find code examples that are pretty inconsistent on the use of colons in labels. even examples in the official nasm documentation waffle a little on this. personally, i choose to use a colon when the label refers to a block of code, and to forgo the colon when the label refers to data. remember, though, that to the assembler they're all just addresses.

we're giving the mov operator a real workout here. the instruction
mov dest, src
tells the assembler to copy the data at src into dest. yes, it goes right to left, but you get used to it pretty quickly. in this instance we're loading segment addresses into their respective cpu registers. since we can't copy immediate data directly into a segment register, we'll use ax for temporary storage.


33 ; ----------------------------------------------------------------------
34 segment code
35 ; The code segment is where our program actually does stuff. Executable
36 ; instructions go here.
37 ;
38 ..start:
39 ; First we need to do some housekeeping. Our program needs to know at
40 ; what addresses its segments can be found. The Intel CPU contains some
41 ; special registers just to hold this information, so we'll load them
42 ; up now. Since we can't put addresses directly into these registers,
43 ; we'll copy them to the AX general purpose register first.
44 mov ax, data
45 mov ds, ax ; DS: data segment register
46 mov ax, stack
47 mov ss, ax ; SS: stack segment register
48 mov sp, stackTop ; SP: stack pointer register
49


now we're going to call on dos to print our message on the screen. dos and the system bios have several services that they offer to our programs. these are accessed by triggering an interrupt with the int instruction. each service has its own requirements, so we need to look up the particular service we want in our handy dos developer's guide in order to properly use it. the dos service we're using here is service 09h of the general purpose interrupt 21h. to use it we place the address of a dollar-sign-terminated string into register dx, place the service id 09h into register ah, then call interrupt 21h.


50 ; We're going to use a DOS service to write a string to the screen.
51 ; The documentation for this service says that we have to terminate the
52 ; string we want to print with a dollar sign (see how we did this in
53 ; the data segment above) and we must put the address of the string
54 ; into the DX register and call the service. DOS interrupt 21h provides
55 ; all kinds of cool services. To use it we place the service ID in
56 ; register AH and call INT 21h.
57 mov dx, message
58 mov ah, 09h
59 int 21h
60


finally we exit the program. we'll use service 4ch of dos interrupt 21h. if you have a specific exit code (for example, to signal an error) you place it into register al. just as before, we put the service id into ah and call the interrupt. since we have no error condition we'll do a clean exit. here we load al and ah at the same time by putting 4c00h into ax.


61 ; We also use INT 21h to exit our program. The exit function is 4Ch,
62 ; which goes into AH. The exit code that is used to report errors back
63 ; to the operating system goes into AL. We'll just load both at the
64 ; same time, then call INT 21h.
65 mov ax, 4c00h
66 int 21h
67


you have just written a program in assembly language. save the file and exit editv. assemble the file with nasm:
nasm -f obj hello.asm
this will create an object file suitable for linking into a dos exe. link the file with warplink:
warplink hello.obj
this creates the file hello.exe. notice that i included these instructions in the comments at the top of the source file. this is useful if you come back to the program at a later time and want to make changes. now run your program and bask in the warmth of the knowledge that you have made this cpu do your explicit bidding. a little bit more of this and you'll be ready for live minions.

next time we'll pass command-line parameters into our program, and we'll shake things up a bit with the dos com file format.

No comments:

Post a Comment