and bade them curt 'hello,' and then 'good-bye.'
for our second heavily-annotated example, we'll modify the 'hello world' program a bit. beyond simply saying "goodbye" this new app will allow the user to provide a name on the command line to which we may bid adieu. if no argument is provided, the program will simply say
goodbye to mr. chips. this time we will be assembling a dos com file.
in dosbox, move to the mycode directory and launch editv to create a new file called goodbye.asm.
comment, comment, comment. note that this file will not require a linker.
1 ; goodbye.asm
2 ; This variation on the 'Hello World' program expands upon the original
3 ; a little by accepting a name from the command line and printing that
4 ; in the message. If no parameter is supplied, the program will use a
5 ; default value. This one is written as a COM file, so it doesn't
6 ; require linking.
7 ; To assemble: nasm -o goodbye.com goodbye.asm
8 ;
9 ; Robert Ritter
10 ; 25 Apr 2010
11
we begin with a directive for the assembler.
org is not a machine instruction, but rather a note to the assembler so that it can configure all of the segment addresses for you. in a dos com file the entire program fits into a single 64kb segment, so loading up all those segment registers seems rather silly.
org 100h tells nasm that this is a com file, so all segments are at exactly the same address, and the program starts at offset
100h. why this offset? well the first 256 bytes (
0h through
ffh) makes up the
program segment prefix, or psp. byte
100h is the first place real code can be loaded. since i'm defining my data at the top of my source file, i'm putting a
jmp (
unconditional jump) at this address to tell the system to skip right to the good stuff.
jmp works like the much-maligned
goto statement in other languages: it transfers execution to the code found at the given label. we're going to let the program flow jump to the label called 'Start' and we'll catch up in a bit.
12 ; ----------------------------------------------------------------------
13 org 100h
14 ; We set up a COM file by defining the address of the program location
15 ; in memory, which will always be 100h. Then we jump to the start of
16 ; the code block.
17 ;
18 jmp Start
19
you've seen data before, and you will recognize
db from our last program. one new thing here is the
equ directive. this creates a constant. data defined with
db may be modified during program execution, but data defined with
equ cannot. any attempt to change the value stored in endMsgLen in this program will cause the assembler to balk with the message that the label has been redefined.
another new thing is the use of the dollar sign
outside the quotation marks. what does that mean? well, we're going to be copying strings into a buffer and we'll need to tell the cpu exactly how many bytes to copy. it's easy to find the size of the beginMsg string: we subtract its address from the address of defaultMsg.
defaultMsg - beginMsg
will give us the length of beginMsg. remember that labels are just aliases for memory addresses. we can use the same technique to find the length of defaultMsg. to find the length of the last string, endMsg, we subtract endMsg from $. the dollar sign in line 31 means "this byte right here." so endMsgLen will contain the difference between endMsgLen and endMsg. that's pretty cool.
20 ; ----------------------------------------------------------------------
21 section .data
22 ; DOS COM files don't use segmented memory. The whole program fits
23 ; into a single 64KB block, so there's no need to worry about segments
24 ; at all. The assembler still expects to find defined data and code
25 ; sections, though, and it helps us to organize our source if we keep
26 ; things compartmentalized like this.
27 ;
28 beginMsg db 'Goodbye, '
29 defaultMsg db 'Mr. Chips'
30 endMsg db '!', 0dh, 0ah, '$'
31 endMsgLen equ $ - endMsg
32
the bss section (so named for historic reasons) contains data that is not initialized to a specific value; at least, no value that we care about. here we're creating a working buffer to which we may copy the elements of our final string before we send it to standard output.
33 ; ----------------------------------------------------------------------
34 section .bss
35 ; This section contains unintialized storage space. We allocate space
36 ; here for data that we won't have until runtime. COM files don't
37 ; require an explicit STACK section. The assembler will take care of
38 ; the stack for us.
39 ;
40 fullMsg resb 1024 ; This is the message we will print.
41 ; We'll assemble it from parts and
42 ; copy each part into this memory area.
43
the
rep movsb instructions copy a sequence of bytes from one place in memory to another. the number of bytes that get copied is found in the
cx register. so what we're doing here is concatenating strings and storing the result in fullMsg. first strings first...
44 ; ----------------------------------------------------------------------
45 section .code
46
47 Start:
48 ; First we'll copy the beginning of the message, 'Goodbye,' to our
49 ; allocated memory. The number of bytes to copy (the length of our
50 ; data) goes into CX.
51 mov cx, defaultMsg - beginMsg
52 ; The address of the data goes into SI (think Source Index) and the
53 ; address of the allocated memory into DI (as in Destination Index.)
54 mov si, beginMsg
55 mov di, fullMsg
56 rep movsb ; REP MOVSB copies CX bytes from SI to DI.
57 ; DI is automatically incremented.
58
remember that psp? the first 128 bytes (
00h through
7fh) is full of stuff that we're really not interested in, but the second 128 bytes (
80h through
ffh) contains information from the command line that we used to run our program. since we want to get a name from the parameter list on the command line, we want to read this part of the psp. byte
80h tells us how long the parameter list is, so if it's zero (the program was run without any parameters) we'll say goodbye to the default name; otherwise we'll read the parameter list and take our name from there.
throughout our sojourn in assembler we've been working with memory addresses. the programming savvy among you may have said to yourself, "ah, these are pointers." most of what we work with in assembler is addresses, or pointers. if you learned in computer programming class that pointers were hard, then you learned them incorrectly; but that's a rant for another post. suffice it to say that pointers are
the way to manipulate data in assembler. however, sometimes we need to get at the data in a memory location directly. on line 71 we need to compare zero to the
value at address 80h, not the address itself. nasm makes this pretty easy: we use square brackets around an address to access the value inside. the instruction in line 69 means, "copy the value stored at address 80h into the cl register." there, you've just dereferenced a pointer. no big deal.
this section contains a couple of logic branches using the
cmp operator to
compare two values, a
jz operator to
jump if zero to a particular label, and an unconditional
jmp instruction to skip parts of the program that won't be used if a name was given on the command line. remember that labels are just memory addresses in assembler.
59 ; Next we'll copy the command line parameter into our allocated memory.
60 ; When we start a program, DOS creates a data structure for it called
61 ; the PSP (Program Segment Prefix) that loads ahead of it in the first
62 ; 256 (100h) bytes of memory. (This is why the COM file has to point to
63 ; address 100h to start.) The first 128 bytes of the PSP is "stuff," so
64 ; we won't worry about that. The last half of the PSP contains the
65 ; parameter string. Byte 80h contains the length of the string and the
66 ; remaining bytes contain the parameter string terminated by a carriage
67 ; return (0dh.)
68 xor cx, cx ; Set CX to zero.
69 mov cl, [80h] ; Put the parameter length
70 ; into CL.
71 cmp cl, 0 ; Test CL to see if it's zero.
72 jz NoParam ; If CL contains zero jump to
73 ; another part of the program.
74 ; If the JZ (Jump if Zero) wasn't executed, then the user ran the
75 ; program with a command line parameter. CX now contains the number of
76 ; bytes in the string, but the first byte is always a space, so we'll
77 ; decrement CX and start copying the string from byte 82h. DI already
78 ; points to the end of the last thing we copied to memory.
79 dec cx
80 mov si, 82h
81 rep movsb
82 jmp FinishString ; Skip the NoParam part since
83 ; there was a parameter.
84
85 NoParam:
86 ; No parameter was given on the command line. We'll use the default
87 ; goodbye message.
88 mov cx, endMsg - defaultMsg
89 mov si, defaultMsg
90 rep movsb
91
92 FinishString:
93 ; Now we copy the last part of the message to memory.
94 mov cx, endMsgLen
95 mov si, endMsg
96 rep movsb
97
98 ; Use the DOS service call to print the string that is in memory.
99 mov dx, fullMsg
100 mov ah, 09h
101 int 21h
102
103 ; Exit with no error code.
104 mov ax, 4c00h
105 int 21h
106
save the file and exit editv. assemble the program directly into a com file:
nasm -o goodbye.com goodbye.asm
run the program with and without parameters.
now that was a pretty sophisticated program. you fetched data from the command line and used a condition to branch to a specific part of your program, much like
if..then logic found in so-called high-level languages, and you used a default parameter if one wasn't provided. your skills are progressing nicely, padawan. you're training is almost complete.