Showing posts with label nasm. Show all posts
Showing posts with label nasm. Show all posts

Friday, April 30, 2010

assembler tutorial: 2 - goodbye, mr. chips


and bade them curt 'hello,' and then 'good-bye.'


for our second heavily-annotated example, we'll modify the 'hello world' program a bit. beyond simply saying "goodbye" this new app will allow the user to provide a name on the command line to which we may bid adieu. if no argument is provided, the program will simply say goodbye to mr. chips. this time we will be assembling a dos com file.

in dosbox, move to the mycode directory and launch editv to create a new file called goodbye.asm.

comment, comment, comment. note that this file will not require a linker.


1 ; goodbye.asm
2 ; This variation on the 'Hello World' program expands upon the original
3 ; a little by accepting a name from the command line and printing that
4 ; in the message. If no parameter is supplied, the program will use a
5 ; default value. This one is written as a COM file, so it doesn't
6 ; require linking.
7 ; To assemble: nasm -o goodbye.com goodbye.asm
8 ;
9 ; Robert Ritter
10 ; 25 Apr 2010
11


we begin with a directive for the assembler. org is not a machine instruction, but rather a note to the assembler so that it can configure all of the segment addresses for you. in a dos com file the entire program fits into a single 64kb segment, so loading up all those segment registers seems rather silly. org 100h tells nasm that this is a com file, so all segments are at exactly the same address, and the program starts at offset 100h. why this offset? well the first 256 bytes (0h through ffh) makes up the program segment prefix, or psp. byte 100h is the first place real code can be loaded. since i'm defining my data at the top of my source file, i'm putting a jmp (unconditional jump) at this address to tell the system to skip right to the good stuff. jmp works like the much-maligned goto statement in other languages: it transfers execution to the code found at the given label. we're going to let the program flow jump to the label called 'Start' and we'll catch up in a bit.


12 ; ----------------------------------------------------------------------
13 org 100h
14 ; We set up a COM file by defining the address of the program location
15 ; in memory, which will always be 100h. Then we jump to the start of
16 ; the code block.
17 ;
18 jmp Start
19


you've seen data before, and you will recognize db from our last program. one new thing here is the equ directive. this creates a constant. data defined with db may be modified during program execution, but data defined with equ cannot. any attempt to change the value stored in endMsgLen in this program will cause the assembler to balk with the message that the label has been redefined.

another new thing is the use of the dollar sign outside the quotation marks. what does that mean? well, we're going to be copying strings into a buffer and we'll need to tell the cpu exactly how many bytes to copy. it's easy to find the size of the beginMsg string: we subtract its address from the address of defaultMsg.
defaultMsg - beginMsg
will give us the length of beginMsg. remember that labels are just aliases for memory addresses. we can use the same technique to find the length of defaultMsg. to find the length of the last string, endMsg, we subtract endMsg from $. the dollar sign in line 31 means "this byte right here." so endMsgLen will contain the difference between endMsgLen and endMsg. that's pretty cool.


20 ; ----------------------------------------------------------------------
21 section .data
22 ; DOS COM files don't use segmented memory. The whole program fits
23 ; into a single 64KB block, so there's no need to worry about segments
24 ; at all. The assembler still expects to find defined data and code
25 ; sections, though, and it helps us to organize our source if we keep
26 ; things compartmentalized like this.
27 ;
28 beginMsg db 'Goodbye, '
29 defaultMsg db 'Mr. Chips'
30 endMsg db '!', 0dh, 0ah, '$'
31 endMsgLen equ $ - endMsg
32


the bss section (so named for historic reasons) contains data that is not initialized to a specific value; at least, no value that we care about. here we're creating a working buffer to which we may copy the elements of our final string before we send it to standard output.


33 ; ----------------------------------------------------------------------
34 section .bss
35 ; This section contains unintialized storage space. We allocate space
36 ; here for data that we won't have until runtime. COM files don't
37 ; require an explicit STACK section. The assembler will take care of
38 ; the stack for us.
39 ;
40 fullMsg resb 1024 ; This is the message we will print.
41 ; We'll assemble it from parts and
42 ; copy each part into this memory area.
43


the rep movsb instructions copy a sequence of bytes from one place in memory to another. the number of bytes that get copied is found in the cx register. so what we're doing here is concatenating strings and storing the result in fullMsg. first strings first...

 
44 ; ----------------------------------------------------------------------
45 section .code
46
47 Start:
48 ; First we'll copy the beginning of the message, 'Goodbye,' to our
49 ; allocated memory. The number of bytes to copy (the length of our
50 ; data) goes into CX.
51 mov cx, defaultMsg - beginMsg
52 ; The address of the data goes into SI (think Source Index) and the
53 ; address of the allocated memory into DI (as in Destination Index.)
54 mov si, beginMsg
55 mov di, fullMsg
56 rep movsb ; REP MOVSB copies CX bytes from SI to DI.
57 ; DI is automatically incremented.
58


remember that psp? the first 128 bytes (00h through 7fh) is full of stuff that we're really not interested in, but the second 128 bytes (80h through ffh) contains information from the command line that we used to run our program. since we want to get a name from the parameter list on the command line, we want to read this part of the psp. byte 80h tells us how long the parameter list is, so if it's zero (the program was run without any parameters) we'll say goodbye to the default name; otherwise we'll read the parameter list and take our name from there.

throughout our sojourn in assembler we've been working with memory addresses. the programming savvy among you may have said to yourself, "ah, these are pointers." most of what we work with in assembler is addresses, or pointers. if you learned in computer programming class that pointers were hard, then you learned them incorrectly; but that's a rant for another post. suffice it to say that pointers are the way to manipulate data in assembler. however, sometimes we need to get at the data in a memory location directly. on line 71 we need to compare zero to the value at address 80h, not the address itself. nasm makes this pretty easy: we use square brackets around an address to access the value inside. the instruction in line 69 means, "copy the value stored at address 80h into the cl register." there, you've just dereferenced a pointer. no big deal.

this section contains a couple of logic branches using the cmp operator to compare two values, a jz operator to jump if zero to a particular label, and an unconditional jmp instruction to skip parts of the program that won't be used if a name was given on the command line. remember that labels are just memory addresses in assembler.


59 ; Next we'll copy the command line parameter into our allocated memory.
60 ; When we start a program, DOS creates a data structure for it called
61 ; the PSP (Program Segment Prefix) that loads ahead of it in the first
62 ; 256 (100h) bytes of memory. (This is why the COM file has to point to
63 ; address 100h to start.) The first 128 bytes of the PSP is "stuff," so
64 ; we won't worry about that. The last half of the PSP contains the
65 ; parameter string. Byte 80h contains the length of the string and the
66 ; remaining bytes contain the parameter string terminated by a carriage
67 ; return (0dh.)
68 xor cx, cx ; Set CX to zero.
69 mov cl, [80h] ; Put the parameter length
70 ; into CL.
71 cmp cl, 0 ; Test CL to see if it's zero.
72 jz NoParam ; If CL contains zero jump to
73 ; another part of the program.
74 ; If the JZ (Jump if Zero) wasn't executed, then the user ran the
75 ; program with a command line parameter. CX now contains the number of
76 ; bytes in the string, but the first byte is always a space, so we'll
77 ; decrement CX and start copying the string from byte 82h. DI already
78 ; points to the end of the last thing we copied to memory.
79 dec cx
80 mov si, 82h
81 rep movsb
82 jmp FinishString ; Skip the NoParam part since
83 ; there was a parameter.
84
85 NoParam:
86 ; No parameter was given on the command line. We'll use the default
87 ; goodbye message.
88 mov cx, endMsg - defaultMsg
89 mov si, defaultMsg
90 rep movsb
91
92 FinishString:
93 ; Now we copy the last part of the message to memory.
94 mov cx, endMsgLen
95 mov si, endMsg
96 rep movsb
97
98 ; Use the DOS service call to print the string that is in memory.
99 mov dx, fullMsg
100 mov ah, 09h
101 int 21h
102
103 ; Exit with no error code.
104 mov ax, 4c00h
105 int 21h
106


save the file and exit editv. assemble the program directly into a com file:
nasm -o goodbye.com goodbye.asm
run the program with and without parameters.

now that was a pretty sophisticated program. you fetched data from the command line and used a condition to branch to a specific part of your program, much like if..then logic found in so-called high-level languages, and you used a default parameter if one wasn't provided. your skills are progressing nicely, padawan. you're training is almost complete.

Monday, April 12, 2010

assembler tutorial: 1 - hello, world!


but thunder interrupted all their fears


now that you're ready to begin writing we'll dig right in. here is our first heavily-annotated program, hello.exe.



open up your dosbox and change to the mycode directory.
cd mycode
run editv with a new file, hello.asm.
editv hello.asm
i strongly advise you to turn on line numbering. you can do this by pressing ctrl-o followed by b, or you can just select the options menu with alt-o, scroll down to line numbers and press enter.

we begin with comments. if you've ever read a book on programming it has probably emphasized the need for good comments. in assembler comments are even more important because the language syntax is so terse. a comment begins with a semi-colon and continues to the end of the line of text.


1 ; hello.asm
2 ; Demonstrates how to write an assembly program for DOS with NASM using
3 ; the ubiquitous 'Hello World' string. To create an EXE file we'll
4 ; first assemble an OBJ file then link it. I'm using the public domain
5 ; linker WarpLink.
6 ; To assemble: nasm -f obj hello.asm
7 ; To link: warplink hello.obj
8 ;
9 ; Robert Ritter <rritter@centriq.com>
10 ; 12 Apr 2010
11


remember that dos addresses memory in segments. the first thing we'll need to do is reserve some memory for these segments. since this program doesn't work with a lot of data our data segment is pretty small. it defines a label, message, that will be the memory address of the first byte of the message that we're going to print out. the db (define byte) operator identifies a sequence of bytes that make up our data. there is also a dw (define word) for 16-bit values, and dd (define double word) for 32-bit values. most of the time you'll probably just treat data as a sequence of bytes, so you'll likely use db more than the others.

you may have noticed the characters that follow the obvious string, "Hello World." if we want to advance our output to the next line we must insert a newline character. this is like pressing the enter key on a keyboard. in high-level languages like c we use a string like "\n" to represent a newline, but in dos this is actually a two-byte sequence: 0dh (carriage return) and 0ah (linefeed.) the dollar sign character is a terminator that marks the end of the string. not all strings must be terminated with a dollar sign, but the dos printing service that we're going to use requires it.

notice that the characters that make up a string are enclosed in quotes. double or single quotes, it makes no difference. those characters outside the quotes are treated as literal bytes.


12 ; ----------------------------------------------------------------------
13 segment data
14 ; DOS EXE files use segmented memory which allows them to address more
15 ; than 64KB at a time. Here we define the data segment to store the
16 ; message that we're going to print on the screen.
17 ;
18 message db 'Hello World', 0dh, 0ah, '$'
19


the next thing that we want to do is reserve some memory for our stack segment. the resb (reserve byte) operator is used to set aside an uninitialized piece of memory of a given size. there is also a resw (reserve word) for 16-bit values and resd (reserve double word) for 32-bit values. we're going to allocate a 64-byte hunk'o'ram for the stack and set the label stackTop to point to the address immediately following the stack. for more info on how the stack works, see my previous post.


20 ; ----------------------------------------------------------------------
21 segment stack stack
22 ; The stack is used as temporary storage for values during the
23 ; program's execution. Sometimes we use it in our code, and sometimes
24 ; DOS uses it, especially when we call DOS interrupts. We'll set up a
25 ; small but serviceable stack for this program since we're going to be
26 ; calling on DOS services.
27 ;
28 resb 64
29 stackTop ; The label 'stackTop' is the address of the end (top)
30 ; of the stack. We'll need this to initialize the
31 ; stack pointer in the CPU.
32


the code segment is where the cool stuff happens. remember that a dos exe file may have more than one code segment to get around that pesky 64kb barrier we discussed last time. though multiple code segments are allowed, only one can be the actual entry point of our program. this is defined with a special label, ..start. note that i used a colon at the end of this label. a label may end with a colon, but this is not required. you may find code examples that are pretty inconsistent on the use of colons in labels. even examples in the official nasm documentation waffle a little on this. personally, i choose to use a colon when the label refers to a block of code, and to forgo the colon when the label refers to data. remember, though, that to the assembler they're all just addresses.

we're giving the mov operator a real workout here. the instruction
mov dest, src
tells the assembler to copy the data at src into dest. yes, it goes right to left, but you get used to it pretty quickly. in this instance we're loading segment addresses into their respective cpu registers. since we can't copy immediate data directly into a segment register, we'll use ax for temporary storage.


33 ; ----------------------------------------------------------------------
34 segment code
35 ; The code segment is where our program actually does stuff. Executable
36 ; instructions go here.
37 ;
38 ..start:
39 ; First we need to do some housekeeping. Our program needs to know at
40 ; what addresses its segments can be found. The Intel CPU contains some
41 ; special registers just to hold this information, so we'll load them
42 ; up now. Since we can't put addresses directly into these registers,
43 ; we'll copy them to the AX general purpose register first.
44 mov ax, data
45 mov ds, ax ; DS: data segment register
46 mov ax, stack
47 mov ss, ax ; SS: stack segment register
48 mov sp, stackTop ; SP: stack pointer register
49


now we're going to call on dos to print our message on the screen. dos and the system bios have several services that they offer to our programs. these are accessed by triggering an interrupt with the int instruction. each service has its own requirements, so we need to look up the particular service we want in our handy dos developer's guide in order to properly use it. the dos service we're using here is service 09h of the general purpose interrupt 21h. to use it we place the address of a dollar-sign-terminated string into register dx, place the service id 09h into register ah, then call interrupt 21h.


50 ; We're going to use a DOS service to write a string to the screen.
51 ; The documentation for this service says that we have to terminate the
52 ; string we want to print with a dollar sign (see how we did this in
53 ; the data segment above) and we must put the address of the string
54 ; into the DX register and call the service. DOS interrupt 21h provides
55 ; all kinds of cool services. To use it we place the service ID in
56 ; register AH and call INT 21h.
57 mov dx, message
58 mov ah, 09h
59 int 21h
60


finally we exit the program. we'll use service 4ch of dos interrupt 21h. if you have a specific exit code (for example, to signal an error) you place it into register al. just as before, we put the service id into ah and call the interrupt. since we have no error condition we'll do a clean exit. here we load al and ah at the same time by putting 4c00h into ax.


61 ; We also use INT 21h to exit our program. The exit function is 4Ch,
62 ; which goes into AH. The exit code that is used to report errors back
63 ; to the operating system goes into AL. We'll just load both at the
64 ; same time, then call INT 21h.
65 mov ax, 4c00h
66 int 21h
67


you have just written a program in assembly language. save the file and exit editv. assemble the file with nasm:
nasm -f obj hello.asm
this will create an object file suitable for linking into a dos exe. link the file with warplink:
warplink hello.obj
this creates the file hello.exe. notice that i included these instructions in the comments at the top of the source file. this is useful if you come back to the program at a later time and want to make changes. now run your program and bask in the warmth of the knowledge that you have made this cpu do your explicit bidding. a little bit more of this and you'll be ready for live minions.

next time we'll pass command-line parameters into our program, and we'll shake things up a bit with the dos com file format.