a traveller from an antique land: assembler tutorial: 2

and bade them curt 'hello,' and then 'good-bye.'

for our second heavily-annotated example, we'll modify the 'hello world' program a bit. beyond simply saying "goodbye" this new app will allow the user to provide a name on the command line to which we may bid adieu. if no argument is provided, the program will simply say goodbye to mr. chips. this time we will be assembling a dos com file.

in dosbox, move to the mycode directory and launch editv to create a new file called goodbye.asm.

comment, comment, comment. note that this file will not require a linker.


1   ; goodbye.asm
2   ; This variation on the 'Hello World' program expands upon the original
3   ; a little by accepting a name from the command line and printing that 
4   ; in the message. If no parameter is supplied, the program will use a
5   ; default value. This one is written as a COM file, so it doesn't
6   ; require linking.
7   ; To assemble: nasm -o goodbye.com goodbye.asm
8   ;
9   ; Robert Ritter 
10  ; 25 Apr 2010
11

we begin with a directive for the assembler. org is not a machine instruction, but rather a note to the assembler so that it can configure all of the segment addresses for you. in a dos com file the entire program fits into a single 64kb segment, so loading up all those segment registers seems rather silly. org 100h tells nasm that this is a com file, so all segments are at exactly the same address, and the program starts at offset 100h. why this offset? well the first 256 bytes (0h through ffh) makes up the program segment prefix, or psp. byte 100h is the first place real code can be loaded. since i'm defining my data at the top of my source file, i'm putting a jmp (unconditional jump) at this address to tell the system to skip right to the good stuff. jmp works like the much-maligned goto statement in other languages: it transfers execution to the code found at the given label. we're going to let the program flow jump to the label called 'Start' and we'll catch up in a bit.


12  ; ----------------------------------------------------------------------
13  org 100h
14  ; We set up a COM file by defining the address of the program location
15  ; in memory, which will always be 100h. Then we jump to the start of 
16  ; the code block.
17  ;
18                  jmp     Start
19

you've seen data before, and you will recognize db from our last program. one new thing here is the equ directive. this creates a constant. data defined with db may be modified during program execution, but data defined with equ cannot. any attempt to change the value stored in endMsgLen in this program will cause the assembler to balk with the message that the label has been redefined.

another new thing is the use of the dollar sign outside the quotation marks. what does that mean? well, we're going to be copying strings into a buffer and we'll need to tell the cpu exactly how many bytes to copy. it's easy to find the size of the beginMsg string: we subtract its address from the address of defaultMsg.

defaultMsg - beginMsg

will give us the length of beginMsg. remember that labels are just aliases for memory addresses. we can use the same technique to find the length of defaultMsg. to find the length of the last string, endMsg, we subtract endMsg from $. the dollar sign in line 31 means "this byte right here." so endMsgLen will contain the difference between endMsgLen and endMsg. that's pretty cool.


20  ; ----------------------------------------------------------------------
21  section .data
22  ; DOS COM files don't use segmented memory. The whole program fits
23  ; into a single 64KB block, so there's no need to worry about segments
24  ; at all. The assembler still expects to find defined data and code
25  ; sections, though, and it helps us to organize our source if we keep
26  ; things compartmentalized like this.
27  ;
28  beginMsg        db      'Goodbye, '
29  defaultMsg      db      'Mr. Chips'
30  endMsg          db      '!', 0dh, 0ah, '$'
31  endMsgLen       equ     $ - endMsg
32

the bss section (so named for historic reasons) contains data that is not initialized to a specific value; at least, no value that we care about. here we're creating a working buffer to which we may copy the elements of our final string before we send it to standard output.


33  ; ----------------------------------------------------------------------
34  section .bss
35  ; This section contains unintialized storage space. We allocate space
36  ; here for data that we won't have until runtime. COM files don't
37  ; require an explicit STACK section. The assembler will take care of
38  ; the stack for us.
39  ;
40  fullMsg         resb    1024    ; This is the message we will print.
41                                  ; We'll assemble it from parts and
42                                  ; copy each part into this memory area.
43

the rep movsb instructions copy a sequence of bytes from one place in memory to another. the number of bytes that get copied is found in the cx register. so what we're doing here is concatenating strings and storing the result in fullMsg. first strings first...

 
44  ; ----------------------------------------------------------------------
45  section .code
46  
47  Start:
48  ; First we'll copy the beginning of the message, 'Goodbye,' to our
49  ; allocated memory. The number of bytes to copy (the length of our
50  ; data) goes into CX.
51                  mov     cx, defaultMsg - beginMsg
52  ; The address of the data goes into SI (think Source Index) and the
53  ; address of the allocated memory into DI (as in Destination Index.)
54                  mov     si, beginMsg
55                  mov     di, fullMsg
56          rep     movsb   ; REP MOVSB copies CX bytes from SI to DI.
57                          ; DI is automatically incremented.
58

remember that psp? the first 128 bytes (00h through 7fh) is full of stuff that we're really not interested in, but the second 128 bytes (80h through ffh) contains information from the command line that we used to run our program. since we want to get a name from the parameter list on the command line, we want to read this part of the psp. byte 80h tells us how long the parameter list is, so if it's zero (the program was run without any parameters) we'll say goodbye to the default name; otherwise we'll read the parameter list and take our name from there.

throughout our sojourn in assembler we've been working with memory addresses. the programming savvy among you may have said to yourself, "ah, these are pointers." most of what we work with in assembler is addresses, or pointers. if you learned in computer programming class that pointers were hard, then you learned them incorrectly; but that's a rant for another post. suffice it to say that pointers are the way to manipulate data in assembler. however, sometimes we need to get at the data in a memory location directly. on line 71 we need to compare zero to the value at address 80h, not the address itself. nasm makes this pretty easy: we use square brackets around an address to access the value inside. the instruction in line 69 means, "copy the value stored at address 80h into the cl register." there, you've just dereferenced a pointer. no big deal.

this section contains a couple of logic branches using the cmp operator to compare two values, a jz operator to jump if zero to a particular label, and an unconditional jmp instruction to skip parts of the program that won't be used if a name was given on the command line. remember that labels are just memory addresses in assembler.


59  ; Next we'll copy the command line parameter into our allocated memory.
60  ; When we start a program, DOS creates a data structure for it called
61  ; the PSP (Program Segment Prefix) that loads ahead of it in the first
62  ; 256 (100h) bytes of memory. (This is why the COM file has to point to
63  ; address 100h to start.) The first 128 bytes of the PSP is "stuff," so
64  ; we won't worry about that. The last half of the PSP contains the
65  ; parameter string. Byte 80h contains the length of the string and the
66  ; remaining bytes contain the parameter string terminated by a carriage
67  ; return (0dh.)
68                  xor     cx, cx          ; Set CX to zero.
69                  mov     cl, [80h]       ; Put the parameter length
70                                          ; into CL.
71                  cmp     cl, 0           ; Test CL to see if it's zero.
72                  jz      NoParam         ; If CL contains zero jump to
73                                          ; another part of the program.
74  ; If the JZ (Jump if Zero) wasn't executed, then the user ran the
75  ; program with a command line parameter. CX now contains the number of
76  ; bytes in the string, but the first byte is always a space, so we'll
77  ; decrement CX and start copying the string from byte 82h. DI already
78  ; points to the end of the last thing we copied to memory.
79                  dec     cx
80                  mov     si, 82h
81          rep     movsb
82                  jmp     FinishString    ; Skip the NoParam part since
83                                          ; there was a parameter.
84  
85  NoParam:
86  ; No parameter was given on the command line. We'll use the default
87  ; goodbye message.
88                  mov     cx, endMsg - defaultMsg
89                  mov     si, defaultMsg
90          rep     movsb
91  
92  FinishString:
93  ; Now we copy the last part of the message to memory.
94                  mov     cx, endMsgLen
95                  mov     si, endMsg
96          rep     movsb
97  
98  ; Use the DOS service call to print the string that is in memory.
99                  mov     dx, fullMsg
100                 mov     ah, 09h
101                 int     21h
102 
103 ; Exit with no error code.
104                 mov     ax, 4c00h
105                 int     21h
106

save the file and exit editv. assemble the program directly into a com file:

nasm -o goodbye.com goodbye.asm

run the program with and without parameters.

now that was a pretty sophisticated program. you fetched data from the command line and used a condition to branch to a specific part of your program, much like if..then logic found in so-called high-level languages, and you used a default parameter if one wasn't provided. your skills are progressing nicely, padawan. you're training is almost complete.

a traveller from an antique land

Friday, April 30, 2010

assembler tutorial: 2 - goodbye, mr. chips

No comments:

Post a Comment

fellow travellers

look upon my works

nothing beside remains

and on the pedestal these words appear: