Author: Andre Adrian
Date: 2025-01-01
The very early computers worked with integer numbers and used a
"fixed point" interpretation of the bit pattern. No bit set was
the real number 0.000.. . All bit set was the real number 0.999..
. Today, a typical Digital Signal Processor (DSP) uses fixed point
calculations. Many calculations need very small and very big
numbers. One solution is to increase the number of digits in the
fixed number. This is important in finance. If you are a multi
billionaire, you still want a cent accurate account balance.
Scientific computation uses "floating point". A larger part
of a floating point bit pattern is used as "fixed point"
mantissa (or fraction). A smaller part is used as exponent.
The well known 32bit
IEEE754 floating point number uses 24 bits for
mantissa with mantissa sign and 8 bits for exponent with
exponent sign.
A computer with
floating point (FP) hardware can directly add, subtract,
multiply IEEE754 numbers. The
traditional approach for a computer without FP
hardware was using one "fixed
point" binary number for mantissa with mantissa sign
and another integer binary
number for exponent with exponent sign.
The computer is totally happy with these binary, or radix 2,
numbers. But, input and output needs to be done in the
decimal number, or radix 10, system. The conversion of
floating point numbers between radix 2 and radix 10 is done
in software, even if the computer has a hardware floating
point unit.
The details of FP
radix conversion are tricky. The range 0 to 0.999.. for
mantissa has no bit pattern for 1. The range 1 to 1.999..
has no bit pattern for 0. The IEEE754 people (the standards
body) decided that a special bit pattern for 0 is better
then a special bit pattern for 1. Let's convert the radix 10
FP number 1e2 to a radix 2 number. 1e2 is short notation for
10 to the power of 2 or 10^2. 10^2 is 100. We can not
express the radix 10 exponent 10^2 as an radix 2 exponent
integer number. The next radix 2 exponent is 2^6 or 64. Now
we have the following equation: 10^2 = x*2^6. To calculate
x, we divide the two exponents: 10^2 / 2^6 = 100 / 64 =
1.5625. The radix 2 representation of 10^2 is
1.5625*2^6.
If the
mantissa is 1.5, the equation becomes 1.5*10^2
= (1.5*x)*2^6. Value x is again the division
result of 10^2 / 2^6.
We just have to multiply the radix
10 mantissa by the "mantissa
multiplier" to get the radix 2
mantissa: 1.5*1.5625=2.34375.
Now our mantissa is out of the
range 1 to 1.999.. . We avoid
this problem with radix 2
exponent 2^7 instead of 2^6. 1.5*10^2 = (1.5*x)*2^7. Exponent division
result is 10^2
/ 2^7 =
0.78125. Radix
2 mantissa is
1.5*0.78125=1.171875. The radix 2 representation of 1.5*10^2
is 1.171875*2^7. Please check with your
calculator.
My floating point conversion or radix conversion subroutines use an exponent tabel (array) and a mantissa multiplier tabel, the mantissa multiplication and some specialties. For conversion from radix 10 to radix 2, I use the radix 10 exponent as index into the exponent tabel and mantissa tabel. Lets convert radix 10 FP number -1.3*10^-3. For radix 10 exponent -3 I get radix 2 exponent -10 and radix 2 mantissa multiplier 1.024. With radix 10 mantissa -1.3 the mantissa conversion calculation for radix 2 is -1.3*1.024=-1.3312. The radix 2 exponent is a look-up. Together we get -1.3*10^-3=-1.3312*2^-10. Check again.
One specialty of radix conversion is mantissa underrun or
overrun. Underrun is mantissa less then 1, overrun is mantissa
larger then 1.999.. . We got mantissa overrun with radix 10 FP
number 1.5*10^2
and radix 2 exponent 2^6 and radix 2 mantissa
multipilier 1.5625:
1.5*10^2
= 2.34375*2^6.
To repair overrun, we divide
the mantissa by 2 and add one
to the exponent: 2.34375*2^6
= 1.171875*2^7. To repair
underrun, we multiply the
mantissa by 2 and subtract one
from the exponent. Divide by 2
and multiply by 2 of binary
numbers is done by shifting, a
simple operating for a
computer. "Repair the
mantissa" is often called
"normalize the mantissa".
I use Code::Blocks
as C language IDE on MS-Windows. Code::Blocks uses gcc as C
compiler. EASy68K is my 68000
assembler/simulator IDE. My "real" 68000 computer is the 68k-MBC
from Just4Fun. This is a Single board computer (SBC) with minimum
3 IC: the CPU 68008, a SRAM and a 40 pin PIC microcontroller as
GLUE chip. Connection to the Host computer, my PC, is via an USB
to UART (RS232) converter board (e.g. CP2102).
The "proof of the pudding" of every piece of software is a test
program. The EASy68K simulator simulates a BIOS, too. There are
BIOS calls for ASCII string print, hexadecimal number print and
more. This is the output of the test program:
I designed my FP conversion source code in the programming
language C. Later I translated this source code by hand from C to
Motorola 68000 assembler. The hand translation was iterative: I
changed the C source code to make good use of the 68000 opcodes.
This subroutine is used in subroutine asc2fp. It is an universal
ASCII to signed integer conversion. The string contains the ASCII
representation of a radix 10 integer number in the range -32768 to
32767, that is 16bit signed integer. The algorithm is the
traditional "multiply preliminary result by 10 and add the next
digit" idea. Multiply by 10 is done with shift and addition:
First, the integer is shifted one bit, that is multiply by two.
Second a copy of the integer is created. The copy is shifted by 2
bits or multiplied by 4. Third the integer*2 and the integer*2*4
numbers are added. This multiply by 10 through shift and add is
faster then the MULU multiply opcode of the 68000.
The 68000 has a powerful "get pointer value and increment
pointer" opcode. The "move.b (A0)+, D2" opcode is
16bits long. This is good code density. The opcodes that need an
8bit or a 16bit constant like "cmp.b #'-', D2"
are 32bits long. In average, the Motorola 68000 and Intel 8086
have the same code density. The 68000 processor has 8 32bit data
registers, the 8086 has only 4 16bit data registers.
*// converts ASCII string to integer [-32768 .. 32767]
asc2int:
*int16_t asc2int(char* A0buf)
*
=====================================================================
* in: A0=pointer to char
* out: D0=signed 16bit integer
* uses D0 to D3, A0
*{
clr.w
D0
* uint16_t D0v = 0;
clr.b
D1
* unsigned char D1sgn = 0;
move.b (A0)+,
D2
* unsigned char D2la = *A0buf++; // la =
look ahead
cmp.b #'-',
D2
* if ('-' == D2la) {
bne.s .endi
moveq #1,
D1
* D1sgn = 1;
move.b (A0)+,
D2
* D2la = *A0buf++;
.endi:
* }
.while:
* while (D2la >= '0' && D2la <=
'9') {
cmp.b #'0', D2
bcs.s .endw
cmp.b #'9', D2
bhi.s .endw
add.w D0,
D0
* D0v <<=
1;
// do D0v *= 10; with shifts
move.w D0,
D3
* uint16_t D3v = D0v
<< 2;
asl.w #2,
D3
add.w D3,
D0
* D0v += D3v;
sub.b
#'0',
D2
* D0v += (D2la - '0');
ext.w D2
add.w D2,
D0
move.b (A0)+,
D2
* D2la = *A0buf++;
bra.s
.while
.endw:
* }
* int16_t D0w = (int16_t)D0v; // D0v and D0w
can be a union
and.b D1,
D1
* if (D1sgn) {
beq.s .endi2
neg.w
D0
* D0w = 0 - D0w;
.endi2:
* }
* return D0w;
rts
*}
This subroutine is used in subroutine fp2asc. It is an universal
signed integer to ASCII conversion for the exponent. The IEEE754
allows a radix 10 exponent from -38 to +38. My implementation only
allows radix 10 exponent from -31 to +32. This reduces the
conversion constant tables a little. Every tabel is maximum 256
bytes long. The C language does not know about carry flag, but the
CPU has a carry flag. Therefore I wrote C functions like "sub16()"
to simulate the carry flag in C. This was part of the iterations
between C and assembler to get fast and correct implementation.
The traditional integer to ASCII algorithm uses divide to build
the output ASCII string from last digit to first digit. My
subroutine uses the tabel approach: tabel (array) c0 has the digit
values 10000, 1000, 100, ... as binary numbers. Instead of
division, multiple subtraction is used. The same tabel approach is
used in the subroutine fp2asc for the mantissa. The variable
D1flag or register D0 and an if() statement implement suppressing
of leading zeros.
*// converts integer [-32768 .. 32767] to ASCII string
int2asc:
*void int2asc(char* A0buf, int16_t D0e)
*
=====================================================================
* in: D0=signed integer [-32768 .. 32767], A0=pointer to
char
* out: nothing
* uses D0 to D4, A0, A1
*{
and.w D0,
D0
* if (D0e < 0) {
bpl.s .endi
move.b
#'-', (A0)+
* *A0buf++ = '-';
neg.w
D0
* D0e = 0 - D0e;
.endi:
* }
* uint16_t D0n = (uint16_t)D0e;
lea c0,
A1
* uint16_t* A1c0ptr = c0;
clr.b
D1
* uint8_t D1flag = 0;
moveq #4,
D2
* int8_t D2i = 4;
.do:
* do {
moveq #'0',
D3
* uint8_t D3digit =
'0';
move.w
(A1)+,
D4
* uint16_t D4c0val =
*A1c0ptr++;
* // Assembler style
unsigned division 16bit/16bit, result is D3digit
.while:
sub.w D4,
D0
* while (0 ==
sub16(D0n, D4c0val)) {
bcs.s .endw
addq.b #1,
D3
*
++D3digit;
bra.s .while
.endw:
* }
add.w D4,
D0
* D0n +=
D4c0val; // make result positive again
cmp.b #'0',
D3
* if ('0' == D3digit
&& 0 == D1flag && D2i > 0) {
bne.s .endi2
and.b D1, D1
bne.s .endi2
and.b D2, D2
ble.s
.endi2
bra.s
.dol
*
continue; // suppress leading 0
.endi2:
* }
moveq #1,
D1
* D1flag = 1;
move.b
D3,
(A0)+
* *A0buf++ = D3digit;
.dol:
dbra D2,
.do
* } while (--D2i > -1);
clr.b
(A0)
* *A0buf = '\0';
rts
*}
The 32bit IEEE754 FP number starts with one bit for mantissa sign
(1 is negative mantissa), 8 bits for the exponent and 23 bits for
the mantissa. The exponent is in bias 127 coding. The mantissa
coding uses "hidden leading bit". The mantissa is always
normalized in the range 1 to 1.999.. . The leading bit is always 1
and not in the 32bit IEEE754 number. The subroutine splits the
32bit IEEE754 number into three parts: mantissa with leading bit
in register D6, two's complement exponent in register D5 and
mantissa sign in register D4. A register is a variable that is not
in the main memory, but in the CPU. Register operations are faster
then memory operations.
The traditional approach is to have a signed mantissa in one
register. This approach has pros and cons. Floating point add and
subtract is easy with a two's complement mantissa. But signed
multiply and signed divide are slower then their unsigned
counterparts. Furthermore, the sign bit in signed mantissa looses
one bit of accuracy. Last but not least, shift as replacement for
divide works with unsigned numbers, but not with signed numbers:
-1/2 = 0 but -1 shift right is 0xFFFFFFFF if you use "arithmetic
shift right", 32bit numbers and two's complement.
A "middle" ground is to use one's complement: the number is
unsigned, the sign bit is just "tacked" on. Before multiply and
divide you change one's complement into unsigned number and
separate sign. Before add and subtract you change one's complement
into two's complement, after you change it back to one's
complement.
* IEEE754 32bit to 3 registers
fp2regs:
*
=====================================================================
* in: D6=IEEE754
* out: D6=mantissa, D5=exponent, D4=sign
move.l D6, D5
moveq #0, D4
and.l #$0007FFFFF,
D6 * mask
mantissa
bset.l #23,
D6
* set leading mantissa bit
and.l #$0FF800000,
D5 * mask sign,
exp
add.l D5,
D5
* shift sign bit to X flag
addx.b D4,
D4
* shift X flag to bit 0
rol.l #8,
D5
* roll top 8 bits to bottom 8 bits
sub.b #127,
D5
* change bias 127 to two's complement
rts
This subroutine combines the three parts mantissa, exponent and mantissa sign into one 32bit IEEE754 FP number.
* 3 registers to IEEE754 32bit
regs2fp:
*
=====================================================================
* in: D6=mantissa, D5=exponent, D4=sign
* out: D6=IEEE754
* uses D4 to D6
bclr.l #23,
D6
* clear leading mantissa bit
add.b #127,
D5
* change two's complement to bias 127
ror.l #8,
D5
* roll bottom 8 bits to top 8 bits
lsr.b
#1,D4
* roll sign bit to X
roxr.l #1,
D5
* roll X to bit 31
or.l D5,
D6
* combine sgn, exp with frac
rts
The Q notation is helpful for fixed point numbers. A Q 4.28
number has 4 bits for the integer part and 28 bits for the
fraction part. We can express Radix 10 numbers from 0 to 15.999..
with Q 4.28. The multiplication for Q numbers is like the integer
multiplication. Two 32bit values result in a 64bit multiplication
result. After the integer multiplication, the result is normalized
(shifted) to a Q 4.28 result. The constant Ibits is 4, the
constant Fbits is 28.
* unsigned 4.28 Q mul 32bit = 32bit * 32bit; D6 = D6 * D3
qumul32:
*
=====================================================================
* in: D6=unsigned Q 4.28, D3=unsigned Q 4.28
* out: D6=unsigned Q 4.28
* uses D2, D3, D6, D7
* See Logan, O'Hara; The complete Timex TS1000 & Sinclair
ZX81 ROM Disassembly; Part B, page
41
moveq #0, D7
moveq #32, D2
bra.s .start
.loop:
bcc.s .noadd
add.l D6,
D7
* HLHL += DEDE
.noadd:
roxr.l #1,
D7
* RR HLHL
.start:
roxr.l #1,
D3
* RR BCBC
dbra D2, .loop
* result D7=high 32bit, D6= low 32bit
* D6frac = D76facc >> Fbits;
exg d7,
d6
* or swap D7, D6 and D6frac = D76facc <<
Ibits;
moveq #Ibits-1, D2
.loop2:
add.l D7, D7
addx.l D6, D6
dbra D2, .loop2
* result D6
rts
The first "tricky" subroutine. The tabel c1 or constant values
array c1 contains the Q 4.28 binary values of the radix 10 digits,
that is the binary value of 1.0, 0.1, 0.01, .. . As every radix 10
digit is counted down to zero, the radix 2 mantissa is counted up.
Again, doing multiplication "by hand" is faster then using the
68000 MULU opcode. This opcode can only multiply 16bit numbers,
but the mantissa is 32bit. If the radix 10 mantissa
ASCII string is 2 or greater, the radix 2 mantissa is not
normalized. The "magic constants" Imask1 and Imask2 help to shift
the mantissa right or left until normalized. Having a
not-normalized mantissa of maximum value 9.999... is the reason
for the Q 4.28 fixed point format.
*// convert ASCII string -?[0-9](.[0-9]*(e-?[0-9]+)?)? to fp
*// RE: ?= 0 to 1 repetition, *=0 to N repetition, +=1 to N
repetition
*// () block
asc2fp:
*FPU asc2fp(char* A0buf)
*
=====================================================================
* in: A0=pointer to char
* out: D6=IEEE754 floating point
* uses D0 to D7, A0, A1
*{
* // input mantissa part -?[1-9].[0-9]*
move.b (A0)+,
D1
* unsigned char D1la = *A0buf++; // la =
look ahead
clr.b
D4
* unsigned char D4sgn = 0;
cmp.b #'-',
D1
* if ('-' == D1la) {
bne.s .endi
moveq #1,
D4
* D4sgn = 1;
move.b (A0)+,
D1
* D1la = *A0buf++;
.endi:
* }
lea c1+4,
A1
* unsigned long *A1c1ptr = &c1[1];
moveq #0,
D6
* unsigned long D6frac = 0;
moveq #7,
D3
* int8_t D3i = sizeof c1 / sizeof c1[0] - 1;
.do:
* do {
move.l (A1)+,
D2
* unsigned long
D2c1val = *A1c1ptr++;
cmp.b
#'.',
D1
* if ('.' == D1la) {
bne.s
.endi2
move.b (A0)+, D1
*
D1la = *A0buf++;
.endi2:
* }
cmp.b
#'0',
D1
* if (D1la < '0') {
bcs.s
.endw
*
break;
* }
cmp.b
#'9',
D1
* if (D1la > '9') {
bhi.s
.endw
*
break;
* }
sub.b
#'0',
D1
* int8_t D1digit =
D1la - '0';
.while2:
and.b D1,
D1
* while (D1digit != 0)
{
beq.s
.endw2
add.l D2,
D6
*
add32(D6frac, D2c1val);
subq.b #1,
D1
*
--D1digit;
bra.s
.while2
.endw2:
* }
move.b (A0)+,
D1
* D1la = *A0buf++;
dbra D3,
.do
* } while (--D3i > -1);
.endw:
* // input exponent part e-?[0-9]+
clr.w
D0
* int16_t D0exp10 = 0;
cmp.b #'e',
D1
* if ('e' == D1la) {
bne.s .endi3
jsr
asc2int
* D0exp10 =
asc2int(A0buf);
.endi3:
* }
* FPU f;
and.l D6,
D6
* if (0 == D6frac && 0 == D0exp10)
{ // zero 0, 0., 0.e0
bne.s .endi4
and.w D0, D0
bne.s .endi4
move.b #-Ebias,
D5
* f.e = -Ebias;
* f.f = D6frac;
* f.s = D4sgn;
bra.s
.ret
* return f;
.endi4:
* }
sub.w #Emin10,
D0
* uint8_t D0ndx = D0exp10 - Emin10;
lea c3,
A0
* signed char D5exp = c3[D0ndx];
move.b 0(A0, D0), D5
lea c2,
A0
* unsigned long D3c2val = c2[D0ndx];
asl.w #2, D0
move.l 0(A0, D0), D3
* unsigned long long D76facc = D6frac;
jsr
qumul32
* qumul32(D76facc, D3c2val);
add.l #Rvalue,
D6
* D6frac += Rvalue;
* // normalize
.while3:
move.l D6,
D0
* while (D6frac & Imask1) { // mantissa to
large
and.l #Imask1, D0
beq.s .endw3
lsr.l #1,
D6
* D6frac >>= 1;
addq.b #1,
D5
* ++D5exp;
bra.s .while3
.endw3:
* }
.while4:
move.l D6,
D0
* while (0 == (D6frac & Imask2)) { //
mantissa to small
and.l #Imask2, D0
bne.s .endw4
add.l D6,
D6
* D6frac <<= 1;
subq.b #1,
D5
* --D5exp;
bra.s
.while4
.endw4:
* }
lsr.l #Rbits,
D6
* D6frac >>= Rbits; //
remove "rounding" bits
.ret:
jmp
regs2fp
* f.f = D6frac;
* f.e = D5exp;
* f.s = D4sgn;
* return f;
*}
The second "tricky" subroutine. This time, the mantissa
multiplication can give radix 10 mantissa underrun or overrun.
Underrun is an ASCII string starting with 00.xxxx, overrun is an
ASCII string starting with Xx.xxxx where X is a digit between 1
and 9. The subroutine performs a mantissa multiplication by
suppressing leading zero, a mantissa division by moving the
decimal point. The radix 10 exponent gets decreased or increased
accordingly.
*// convert floating point to ASCII string
fp2asc:
*void fp2asc(char* A0buf, FPU D6f)
*
=====================================================================
* in: A0=pointer to char, D6=IEEE754 floating point
* out: nothing
* uses D0 to D7, A0, A1
*{
* unsigned long D6frac = f.f;
* signed char D5exp = f.e;
* unsigned char D4sgn = f.s;
jsr fp2regs
and.b D4,
D4
* if (D4sgn) { // 1
beq.s .endi
move.b #'-',
(A0)+
* *A0buf++ = '-';
.endi:
* } // 1
cmp.b #-127,
D5
* if (-127 == D5exp && 0x800000 ==
D6frac) { // 2
bne.s .endi2
cmp.l #$0800000, D6
bne.s .endi2
move.b #'0',
(A0)+
* *A0buf++ =
'0'; // output FP zero
clr.b
(A0)
* *A0buf = '\0';
rts
* return;
.endi2:
* } // 2
.while:
* while ((unsigned char)D5exp & 3) { //
change radix 2 exponent to pseudo radix 16
move.b D5, D4 * don't destroy D5
and.b #3, D4
beq.s .endw
add.l D6,
D6
* D6frac <<= 1;
sub.b #1,
D5
* --D5exp;
bra.s
.while
.endw:
* }
asl.l #Rbits,
D6
* D6frac <<= Rbits; // add
"rounding" bits
sub.b #Emin2,
D5
* D5ndx = (D5exp - Emin2) >>
2; // get index of radix 2
to radix 10 constants
lea c4,
A1
* unsigned long D3cfrac = c4[D5ndx]; //
get radix 2 to radix 10 mantissamultiplier
move.l 0(A1, D5), D3
lsr.b #2,
D5
* int8_t D0cexp =
c5[D5ndx];
// get radix 10 exponent
lea c5, A1
move.b 0(A1, D5), D0
* unsigned long long D76facc = D6frac;
jsr
qumul32
* qumul32(D76facc,
D3cfrac);
// change mantissa from radix 2 to radix 10
add.l #Rvalue2,
D6
* D6frac += Rvalue2;
* // output mantissa part
moveq #Digits,
D2
* int8_t D2digits = Digits;
clr.b
D1
* uint8_t D1flag = 0;
lea c1,
A1
* unsigned long* A1c1p = c1;
moveq #-1,
D4
* for (int8_t D4i = -1; D4i < D2digits;
++D4i) { // start at -1 because "real" output is 0X.XXXXXX
.for:
cmp.b D2, D4
bge.s .endf
move.b #'0',
D3
* int8_t D3digit =
'0';
move.l (A1)+,
D5
* unsigned long
D5c1val = *A1c1p++;
.while2:
sub.l D5,
D6
* while (0 ==
sub32(D6frac, D5c1val) && D3digit < '9') { // fix
5.04:0000e-1 output
bcs.s
.endw2
cmp.b
#'9', D3
bcc.s
.endw2
addq.b #1,
D3
*
++D3digit;
bra.s .while2
.endw2:
* }
add.l D5,
D6
* add32(D6frac,
D5c1val); // make result positive again
* // without D1flag,
output can be 00.XXXXXXX, 0X.XXXXXX or XX.XXXXX
* // pseudo mul10,
div10 by suppress leading zero or move dot. Correct radix 10
exponent
move.b D3,
(A0)
* *A0buf = D3digit;
and.b D1,
D1
* if (0 == D1flag)
{ // 3
bne.s
.endi3
cmp.b #'0',
D3
*
if (D3digit != '0') { // 4
beq.s .endi4
moveq #1, D1
*
D1flag = 1;
addq.w #1, A0
*
++A0buf;
move.b #'.', (A0)+
*
*A0buf++ = '.'; // output dot after first non zero
D3digit
sub.b D4, D0
*
D0cexp -= D4i; // D4i=-1 pseudo div10, D4i=1
pseudo mul10
add.b D4, D2
*
D2digits += D4i; // D4i=-1 one mantissa digit less, D4i=1
one mantissa digit more
.endi4:
*
} // 4
bra.s
.forl
*
continue;
.endi3:
* } // 3
addq.w #1,
A0
* ++A0buf;
.forl:
addq.b #1, D4
bra.s .for
.endf:
* }
and.b D0,
D0
* if (D0c5val) { // 5
beq.s .endi5
* // output exponent
part
move.b #'e',
(A0)+
* *A0buf++ = 'e';
ext.w D0
jmp
int2asc
* int2asc(A0buf,
D0c5val);
* return;
.endi5:
* } // 5
clr.b
(A0)
* *A0buf = '\0';
rts
*}
My C source code uses "break", "continue" and "multiple return".
Some say, this is no more structured programming, specially
multiple return is evil. I say, as long as every block has one
entry and one exit, it is structured. I agree, break, continue and
multiple return are C language features that are seldom used. But
they are useful, specially if you are a "never nester"
programmer. By the way, there is no "else" in a never nester
programmers source code.
The ASP68K
PROJECT collected assembler tricks to save program space or
execution time. If a "jump subroutine" is directly followed by a
"return subroutine", you can replace the two opcodes with a
"jump":
jsr int2asc rts |
jmp int2asc |
You can save space by replacing "jsr" with "bra.w" and "jmp" by
"bra.w", if the target is in 16bit signed distance:
Note: the EASy68K assembler does this itself, if the jsr or jmp
goes to lower address.
jsr
asc2int jmp regs2fp |
bsr.w asc2int bra.w regs2fp |
If only one bit is to set or clear, the "bset" or "bclr" opcodes
are shorter then "or.l" or "and.l" opcodes:
or.l #$000800000, D6 and.l #$0007FFFFF, D6 |
bset.l #23,
D6 bclr.l #23, D6 |
If the immediate value is 16bit signed, "add.w #16, A2" or "lea
16(A2), A2" opcode is shorter then "add.l #16, A2":
add.l #16,
A2 |
add.w #16, A2 lea 16(A2), A2 |
The "add Dx, Dx" opcode executes faster then the "asl #1, Dx"
opcode:
asl.w #1, D0 asl.l #1, D5 asl.l #1, D7 |
add.w D0, D0 add.l D5, D5 add.l D7, D7 |
The "addx Dx, Dx" opcode executes faster then the "roxl #1, Dx"
opcode:
roxl.b #1, D4 roxl.l #1, D6 |
addx.b D4, D4 addx.l D6, D6 |
The "addq.w #n, Ax" opcode executes faster then the "addq.l #n,
Ax" opcode:
addq.l #8, A2 addq.l #1, A0 |
addq.w #8, A2 addq.w #1, A0 |
The "moveq #0, Dx" opcode executes faster then the "clr.l Dx"
opcode:
clr.l D4 clr.l D7 |
moveq #0, D4 moveq #0, D7 |
These assembler tricks saved 8 bytes program size and 2 or 4
clock cycles per faster opcode on a 68000. A 68000 NOP needs 4
clock cycles.
"Algorithms + Data Structures = Programs" was Niklaus Wirth's
mantra. The data structures for the FP conversion subroutines are
primitive. We have the tables (arrays) c0 to c5. The exponent
value plus some offset (Emin10, Emin2) is used as index in these
tables. Only "FORTRAN" style data structures, simple and fast!
c0 Integer constants 10000, 1000, ... for integer conversion
c1 Q constants 10.0, 1.0, ... 0.0000001 for mantissa conversion
c2 Q mantissa constants to convert exponent 10^x to exponent 2^y,
values 2^Fbits * 10^x / 2^y
c3 integer exponent constants, radix 10 to radix 2, e.g. 10^0
converts to 2^0 .. 2^3
c4 Q mantissa constants to convert exponent 2^x to exponent 10^y,
values 2^Fbits * 2^x / 10^y
c5 integer exponent constants, radix 2 to radix 10, e.g. 2^3
converts to 10^0 .. 10^1
The "equ" constants do not use memory, the "dc" constants do.
*enum {
Digits equ
7
* Digits = 7,
Ibits equ
4
* Ibits = 4,
Emin10 equ
-31
* Emin10 = -31,
Emin2 equ
-108
* Emin2 = -108,
*};
*enum {
Ebias equ
127
* Ebias =
127,
// Exponent bias
Leadbit equ
$0800000
* Leadbit =
0x800000, //
leading bit
Fbits equ
32-Ibits
* Fbits = 32 -
Ibits, //
fraction bits
Rbits equ
Fbits-23
* Rbits = Fbits -
23, // rounding
bits
Rvalue equ
$010C/2-1
* rounding 0.0000049...
Rvalue2 equ
$01B/2
* rounding 0.00000049...
Imask1 equ $0E0000000
Imask2 equ $0F0000000
*};
c1:
*unsigned long c1[] = {
dc.l $0A0000000,
$010000000, $0199999A, $028F5C3
dc.l $041893, $068DC,
$0A7C, $010C, $01B
*};
c2:
*unsigned long c2[] = {
dc.l $01039D666,
$014484BFF, $0CAD2F7F, $0FD87B5F, $013CE9A37, $0C612062
dc.l $0F79687B, $01357C29A,
$0C16D9A0, $0F1C9008, $012E3B40A, $0BCE5086
dc.l $0EC1E4A8, $012725DD2,
$0B877AA3, $0E69594C, $01203AF9F, $016849B87
dc.l $0E12E134, $011979981,
$015FD7FE1, $0DBE6FED, $0112E0BE8, $015798EE2
dc.l $0D6BF94D, $010C6F7A1,
$014F8B589, $0D1B7176, $010624DD3, $0147AE148
dc.l $0CCCCCCD, $010000000,
$014000000, $0C800000, $0FA00000, $013880000
dc.l $0C350000, $0F424000,
$01312D000, $0BEBC200, $0EE6B280, $012A05F20
dc.l $0BA43B74, $0E8D4A51,
$012309CE5, $0B5E620F, $0E35FA93, $011C37938
dc.l $016345786, $0DE0B6B4,
$01158E461, $015AF1D79, $0D8D726B, $010F0CF06
dc.l $0152D02C8, $0D3C21BD,
$0108B2A2C, $014ADF4B7, $0CECB8F2, $01027E72F
dc.l $01431E0FB, $0C9F2C9D,
$0FC6F7C4, $013B8B5B5
*};
c4:
*unsigned long c4[] = {
dc.l $0314DC645, $07E37BE2,
$0C9F2C9D, $01431E0FB, $0204FCE5E, $052B7D2E
dc.l $08459516, $0D3C21BD,
$0152D02C8, $021E19E0D, $056BC75E, $08AC7230
dc.l $0DE0B6B4, $016345786,
$02386F270, $05AF3108, $09184E73, $0E8D4A51
dc.l $0174876E8,
$02540BE40, $05F5E100, $09896800, $0F424000, $0186A0000
dc.l $027100000, $06400000,
$0A000000, $010000000, $01999999A, $028F5C28F
dc.l $068DB8BB, $0A7C5AC4,
$010C6F7A1, $01AD7F29B, $02AF31DC4, $06DF37F6
dc.l $0AFEBFF1, $011979981,
$01C25C268, $02D09370D, $0734ACA6, $0B877AA3
dc.l $012725DD2,
$01D83C950, $02F394219, $078E4804, $0C16D9A0, $01357C29A
dc.l $01EF2D0F6,
$031848189, $07EC3DB0, $0CAD2F7F, $014484BFF, $02073ACCB
dc.l $05313A5E, $084EC3C9
*};
c0:
*uint16_t c0[] = {
dc.w 10000, 1000, 100,
10, 1
*};
c3:
*signed char c3[] = {
dc.b -103, -100, -96, -93,
-90, -86, -83, -80, -76, -73, -70, -66
dc.b -63, -60, -56, -53,
-50, -47, -43, -40, -37, -33, -30, -27
dc.b -23, -20, -17, -13,
-10, -7, -3, 0, 3, 7, 10, 13
dc.b 17, 20, 23, 27, 30,
33, 37, 40, 43, 47, 50, 53
dc.b 56, 60, 63, 66, 70,
73, 76, 80, 83, 86, 90, 93
dc.b 96, 100, 103, 106
*};
c5:
*signed char c5[] = {
dc.b -33, -31, -30, -29,
-28, -26, -25, -24, -23, -22, -20, -19
dc.b -18, -17, -16, -14,
-13, -12, -11, -10, -8, -7, -6, -5
dc.b -4, -2, -1, 0, 1, 2,
4, 5, 6, 7, 8, 10
dc.b 11, 12, 13, 14, 16,
17, 18, 19, 20, 22, 23, 24
dc.b 25, 26, 28, 29, 30,
31, 33, 34
*};
My floating point conversion subroutines for Motorola 68000 come forty years late. In 1986, each of my friends had an ATARI ST 520. We enhanced the 520 from 512KByte RAM to 1MByte, added a floppy disk and later 20MByte hard disk. My hard disk had even 30 MByte, thanks to a RLL hard disk controller. I used the very fine Lattice C compiler. Only after programming C on 8088 MS-DOS "boxes" I realized how nice 32bit address pointers are and how fine the "real time" behavior of TOS was compared to MS-DOS. Atari ST can do real time MIDI ...
FP conversion Algorithms (program) size is 540 Bytes
FP conversion Data Structures (constants) size is 646 Bytes
Next steps for a full 68000 FP package are implementation of
fpadd, fpsub, fpmul and fpdiv. The 1980 "68000 Motorola Fast
Floating Point" assembler source code is again available.
The DTACK GROUNDED
newsletter, beginning in 1981, has a lot of information
about this topic, too. The transcendental functions like sin and
cos can be approximated by series expansion (Microsoft
Altair BASIC, Sinclair
ZX81 BASIC) or by CORDIC (HP35,
Motorola FFP). See my CORDIC
implementation in Z80 assembler document. See also my square
root implementation in C language document.
File fp_conversion.zip
contains the C version of the program, the 68000 assembler version
and a C program to create the tables c1 to c5.
Author contact E-mail is: