[02-APR-20] The Implantable Intertial Sensor (A3035) is a wireless accelerometer and gyroscope that may be implanted beneath the skin of a small mammal or attached to the exterior of a fish. As we describe in our IIS Technical Proposal, the design of the IIS is motivated by an experiment to measure the movement of freely-swimming fish in a water tank. The IIS is equipped with a 13-mm helical antenna and two 5-mm helical leads we use to recharge the battery. Antenna and leads are insulated with silicone. The circuit itself is encapsulated in epoxy and silicone. To attach the device to a fish, we wrap it in a custom-made rubber sleeve, which we can super-glue to the fish's scales or skin. We recharge the A3035's battery with a Batter Charger (A3033E).
The table below lists the existing versions of the Implantable Inertial Sensor (IIS).
|A3035A||ML920 11 mA-hr||0.9||1.0||8||Acceleration xyz 16-bit 128 SPS
Gyroscope xyz 16-bit 128 SPS
The A3035B provides 128 sixteen-bit samples of each of x, y, and z components of acceleration, and x, y, and z components of rotation. The total number of samples per second is 6 × 128 = 768 SPS. Total quiescent current we expect to be around 1.2 mA, so we anticipate operating life of 8 hours for one charge of the 11 mA-hr battery.
The IIS is managed by a field-programmable gate array (FPGA) in a 2.5-mm square package, LCMX02-1200ZE. This device provides both volatile and non-volatile memory as well as thousands of programmable logic gates. It is capable of implementing arbitrarily-complex stimuli in response to a single command.S3035A_1: IIS Version A Schematic.
[04-JUN-20] The A3035A is powered by an ML920 manganese-lithium rechargeable battery. This battery may be recharged through the A3035A's two recharge leads, which protrude a short distance from the device, within the loop of the antenna.
We recharge the ML920 through two internal silicon diodes using an A3033B battery charger. Charging to 80% capacity will be complete in twelve hours, but 100% capacity requires a forty-eight hour charge.
The A3035AV1 appears to need no modifications, but does require a booster power supply to start up.
[03-APR-20] We receive 25,000 custom-made rubber bands that fit around our smaller-sized implantable devices, as shown below.
This rubber adheres well to super-glue. We can glue two bands together instantly with super-glue gel. With the rubber sleeve around the device, we can glue it to the exterior of a fish, and pull it off later. We throw away the rubber sleeve, recharge the battery, and use another sleeve for the next experiment.
[20-AUG-20] We have 100 of A303501A in 10 panels of 10.
[13-OCT-20] We have 3 of A3035AV1 first article. Inactive current consumption 1.1 μA. Turn on and we are able to scan the logic chip.
[14-OCT-20] We prepare A3035A No4 (number 0004 on circuit board label applied by assembly house) with our initital P3035A01 firmware, which does not communicate with the accelerometer or gyroscope, but instead transmits zeros at 512 SPS. We get good reception, current consumption 173 μA. We load ML920 battery, but the battery cannot supply the start-up current reequired by the LCMXO2-1200 logic chip. We try ML1220, but it can't do it either, nor a BR1225, alghouth a BR2477 is able to get the board started, as well as a 10-mAhr LiPo battery. We can jump-start the board by attaching an external 2.6-V power supply at power up, and after that the board runs fine off its ML920.
We are able to read the BMA423 accelerometer chip identifier register (0x00) and the BMG250 gyroscope sensor time register. We see the values we read from the sensors in the SCT signal plotted in the Recorder Instrument. We conclude that all connections in the circuit work, and approve second article. We leave the A3035A with 11-mAhr battery transmitting.
[16-OCT-20] After 48 hours, the A3035A is still running. Place on metal floor of our FE2F for half an hour, after which it is no longer running, and 2.5 V external power supply delivers 0.9 mA to the battery.
[21-OCT-20] We have No4 battery charged. We program No1 and No2, and re-program No4. Now each has channel number matching its board number.
[22-OCT-20] We assign channel numbers 1, 7, and 17 to boards 1, 2, and 4 in anticipation of using six channel numbers per board for three gyroscope outputs and three accelerometer outputs. We simplify the ring oscillator, trying out the following ring oscillator entity.
entity ring_oscillator is port ( ENABLE : in std_logic; calib : in integer range 0 to 31; CK : out std_logic); end; architecture behavior of ring_oscillator is -- Functions and Procedures function to_std_logic (v: boolean) return std_ulogic is begin if v then return('1'); else return('0'); end if; end function; -- Attributes to guide the compiler. attribute syn_keep : boolean; attribute nomerge : string; -- Ring Oscillator and Transmit Clock component BUFBA is port (A : in std_logic; Z : out std_logic); end component; signal RIN, R1, R2, R3, R4 : std_logic; attribute syn_keep of RIN, R1, R2, R3, R4 : signal is true; attribute nomerge of RIN, R1, R2, R3, R4 : signal is ""; signal end_count : integer; begin ring1 : BUFBA port map (RIN,R1); ring2 : BUFBA port map (R1,R2); ring3 : BUFBA port map (R2,R3); ring4 : BUFBA port map (R3,R4); divider : process (RIN, calib) is constant ck_length : integer := 4; variable count : integer range 0 to 31 := 0; variable end_count : integer range 0 to 31; begin RIN <= to_std_logic((ENABLE = '1') and (R4 = '0')); end_count := calib; if rising_edge(RIN) then if (count = end_count - 1) then count := 0; else count := count + 1; end if; if (count >= 0) and (count <= ck_length) then CK <= '1'; else CK <= '0'; end if; end if; end process; end behavior;
We vary calib from 9 to 16 and measure the period of TCK and obtain this progression. The ideal value of the period is 200 ns, with acceptable range 195-215 ns.
[23-OCT-20] We work on the serial data interface with the sensors, making sure that SCK is HI until after the falling edge of !CSA or !CSG, going HI again before the rising edge of !CSA or !CSG, and checking the setup and hold times of the outgoing address bits on SDI and the incoming data bits on SDO.
In the figure above we send address 0x0019 on SDI, which the gyroscope clocks on the rising edge of SCK. The gyroscope responds immediately after the final address bit, as shown in the detail below. On the next falling edge of SCK, we see the first data bit emerging from the gyroscope, and the A3035A accepts this bit on the rising edge of TCK, which occurs a few nanoseconds before the rising edge of SCK.
Setup time for SDI before the rising edge of SCK is 150 ns, and hold time after is 50 ns. Setup time for SDO before the rising edge of SCK is 100 ns and hold time is 100 ns. We present 24 falling and rising edges of SCK. The first eight cycles transmit the read bit and seven address bits to the gyroscope. The next 16 read out eight-bit register 0x0019 and 0x001A. The first byte is bits 8..15 of the 24-bit sensor timer. The second byte is bits 16-23. Because the gyroscope byte ordering is little-endian, we cannot read and transmit a sixteen-bit data value without storing and rearranging the bytes before-hand.
[28-OCT-20] We prepare the P3035A02 firmware, which adds a program memory, in which we store a sin wave. We transmit the gyroscope sensor timer on channel No1, and the sin wave on channel No2.
We are running at 1024 SPS, reading the gyroscope 512 times per second, and current consumption is 520 μA. When we transmit only the gyroscope timer, 512 SPS, current consumption is 150 μA.
[30-OCT-20] The excessive current consumption of P3035A02 was due to the 8-kΩ resistor on CK, which we use as a test point for !CSA and !CSG. In A3035A03 we invert the test point output to eliminate almost all this current, and now with 1024 SPS, one channel from gyroscope, one channel being read from ROM, current is 210 μA. We modify the Sample_Controller so that it uses the ROM as a program memory. We compose instructions as shown below, and fill the ROM with instructions that read two sensor timer bytes from the gyroscope and transmit them at 128 SPS.
instruction := to_integer(unsigned(prog_instr)); if (instruction = 1) then SAI <= true; else SAI <= false; end if; if (instruction = 2) then GYSEL <= true; end if; if (instruction = 3) then GYSEL <= false; end if; if (instruction = 4) then SAWR <= true; end if; if (instruction = 5) then SAWR <= false; end if; if (instruction = 6) then sensor_addr <= "00011001"; end if; if (instruction = 7) then sensor_addr <= "00011000"; end if; if (instruction = 8) then xmit_bits(15 downto 8) <= sensor_byte; end if; if (instruction = 9) then xmit_bits(7 downto 0) <= sensor_byte; end if; if (instruction = 10) then channel_offset <= 0; end if; if (instruction = 11) then channel_offset <= 1; end if; if (instruction = 12) then TXI <= true; else TXI <= false; end if; if (instruction = 0) then prog_cntr <= "00000000"; else prog_cntr <= std_logic_vector(unsigned(prog_cntr)+1); end if;
We see the sensor timer ramp pattern as before, exactly 128 SPS, and current consumption 116 μA. We adjust program to 1024 SPS and see exactly 1024 SPS at 298 μA. Cost of transmission is 0.18 μA/SPS including the two byte reads from the gyroscope.
[04-NOV-20] Our Sample_Controller now provides an instruction that loads a register A with the next byte in the program memory, and another that decrements the register A until it is zero. By this means, we can program the sample rate with the value we load into A. We set the sample rate to 512 SPS and current is 208 μA.
[07-NOV-20] We have the following Z80 instructions defined in our VHDL firmware for our embedded microprocessor. We are using little-endian byte ordering, as in the Z80, and all the same operation codes. We have 4 kBytes of ROM for instruction memory and 1 KByte of RAM for program use. The logic program now occupies 359 of 1280 look-up tables available in the logic chip (28% full).
constant nop : integer := 16#00#; constant dec_A : integer := 16#3D#; constant inc_A : integer := 16#3C#; constant inc_B : integer := 16#04#; constant dec_B : integer := 16#05#; constant ld_A_B : integer := 16#78#; constant ld_B_A : integer := 16#47#; constant ld_A_n : integer := 16#3E#; constant ld_A_mm : integer := 16#3A#; constant ld_mm_A : integer := 16#32#; constant jp_nn : integer := 16#C3#; constant jp_z_nn : integer := 16#CA#; constant jp_nz_nn : integer := 16#C2#;
We define a prototype memory map in the Z80 sixteen-bit address space.
|0000-0FFF||Instruction Read-Only Memory||0000-0FFF|
|1000-1FFF||Program Random-Access Memory||1000-13FF|
Writing to addresses in the gyroscope and acceleromater ranges initiates serial access to the respective sensor. The lower six bits of the address are transmitted to the sensor. Serial access completes in 4 μs. We have three transmission control registers. Location 0x4000 is the least significant byte of a sixteen-bit data transmission, 0x4001 is the most significant, and writing n to 0x4002 initiates the transmission of sixteen bits with channel number id + n, where id is the device's ID number. Addresses 0x1000-0x13FF are 1-kByte RAM for use by the program. The program memory is read-only, and no instruction can access it other than to load instructions. We install the GNU Z80 Assembler. We use it to compile the following code, which demonstrates the use of RAM to store a value, as well as sensor reading, sample transmission, and conditional jumps.
start: ld A,8 ; 2 load delay count into A ld (0x1000),A ; 3 save delay to RAM ld A,(0x2018) ; 4 read low byte of gyro timer ld (0x4000),A ; 3 write to low byte of xmit bits ld A,(0x2019) ; 4 read middle byte of gyro timer ld (0x4001),A ; 3 write to hi byte of xmit bits ld A,0 ; 2 load A with channel offset ld (0x4002),A ; 3 initiate xmit ld A,(0x1000) ; 4 load delay from RAM nop ; 1 loop_1: dec A ; 1 decrement A jp nz,loop_1 ; 3 loop delay (3+1)*10 = 40 jp start ; 3 Total 64
We use a TCL script to read in the object code produced by the assembler and translate it into the line-by-line format accepted by the Lattice Diamond compiler. The above script gives us exactly 512 SPS with the gyroscope clock on channel No1, consuming 224 μA from 2.6 V.
[16-NOV-20] We now have a sufficient subset of Z80 instructions to perform arithmetic calculations, a stack, an index register, eight-bit and sixteen-bit data access cycles, and increment and decrement instructions for all seven eight-bit registers. The logic takes up around 1160 of the 1280 available LUTs (look-up tables) in the device. When we try to add CALL and RTN functions for subroutines, the design uses 1280 LUTs, and so will not fit. If we remove the increment and decrement functions for registers B, C, D, E, H, and L, the design drops to 1151 LUTs. If instead we remove INC SP, the design drops to 1276 and barely fits. We tell the compiler to keep the INC_Out bits and the design drops from 1276 to 1228. We do the same for ADD_Out and the design rises from 1228 to 1240. We undo that change. We explicitly reduce the cpu_addr, SP, and IX to 13 bits. We re-define our memory map.
|0000-13FF||Random Access Memory (RAM), Initialized to Configuration File||0000-13FF|
|1700-17FF||Interrupt and Control Registers||1700-170F|
Design drops to 1057 LUTs. We allow the compiler to merge SBYI, SBYD, SCKE and design rises to 1067 LUTs, so we restore our no-merge constraints. Timing analysis gives a maximum delay from RCK to its derivatives of 42 ns, and the compiler accepts a clock speed of 40 MHz. Current consumption of the CPU at 10 MHz with 10% activity fraction is 1.1 mA. We add the LD r,n instructions and design is 1140. We remove LD r,n and add instead indirect load instructions for BC and DE as address and A as data. We are at 1085 LUTs. We do not have INC BC or INC DE, but we can increment BC, DE, or HL with the help of the carry flag, which is set when we incrment any eight-bit register, and the jump-if-carry instruction. We give instructions and clock cycles required below.
inc E ; 1 jp c,mark1 ; 3 inc D ; 1 mark1:
Most often, the two-register increment will take 4 clock cycles, compared to 6 for the Z80's single instruction to do the same thing. At 1085 LUTs we have space to add a few shift and rotate instructions if we find we need them for sixteen-bit arithmetic operations with signed integers. For now, we believe we have everything we need for the IIS. We can read 6144 bytes from address block 0x0000-0x13FF and write them in two-byte serial transfers to the accelerometer Features In register at 0x155E like this:
ld A,0x00 ; 3 ld B,A ; 1 ld C,A ; 1 ld A,0x13 ; 2 ld D,A ; 1 ld A,0xFF ; 2 ld E,A ; 1 lp: ld A,(BC) ; 2 ld L,A ; 1 dec C ; 1 jp nc,m1 ; 3 dec B ; 1 m1: ld A,(BC) ; 2 ld H,A ; 1 ld (0x13FF),HL ; 4 Sixteen-bit write dec C ; 1 jp nc,m2 ; 3 dec D ; 1 m2: jp nc,lp ; 3
Here the loop itself takes at most 20 clock cycles. If the CPU will runs off the IIS's 32.768 kHz clock, and the above process loads the initial value of the 6-KByte RAM into the accelerometer feature memory in 3.75 s.
[17-NOV-20] We add an Interrupt Controller to the logic, which provides four memory locations as shown in the table below. The controller generates an interrupt request signal for the CPU from five possible sources: a timer, the sample transmitter, the sensor interface, and direct interrupt lines from the gyroscope and accelerometer. Our maximum timer interrupt interval is 256 ÷ 32768 = 7.8 ms. We can turn on and off individual interrupt signals with the interrupt mask and reset individual interrupts with the reset location. When the CPU receives an interrupt request, it completes its current operation, then pushes the program counter onto the stack and jumps to address interrupt_pc = 0x0003. Meanwhile, the CPU will start off at start_pc = 0x0000 when it powers up, so we should begin any program with a three-byte jump instruction for the start, and another three-byte jump instruction for the interrupt. We return from an interrupt with a RET instruction, and the CPU will pop the program counter off the stack and continue with the previous operation. Our logic takes 1163 LUTs. We add INC SP and DEC SP back into the CPU and the space required drops to 1078 LUTs.
We would like to add PUSH r and POP r for individual eight-bit registers, because these could then serve as exchange operations for all registers. But Z80 does not include a single-byte stack transfer. We would like to be able to store and set the stack pointer so that we can switch tasks, thus building a multi-threaded microprocessor that can run in a single logic chip. The LD SP,HL instruction allows us to set the stack pointer to the contents of HL. The LD (nn),SP instruction allows us to store SP in a memory location. We will implement these. We want to check that we have all the shift and rotate instructions needed for signed sixteen-bit arithmetic.
[18-NOV-20] We abandon the Z80 instruction set. Instead of providing load instructions between the registers, we provide individual eight-bit push and pop instructions PUSH r and POP r for all seven eight bit registers, plus the flags register, and two index registers IX and IY. We allow for SP, IX, and IY to be moved to and from HL. We permit arithmetic and logical operations only between A and a constant byte or register B. If we want to add H to A, without losing the current contents of B, we push B and H onto the stack, pop B then H off the stack, add B to A, then swap B and H back again. We can swap or load from one register to another through the stack. We eliminate all sixteen-bit data access, with the exception of pushing IX, IY, and PC (the program counter) onto the stack. We re-structure the sensor interface so that we write data to outgoing data bits, then initiate serial communication with another write to a control register. On reads, we initiate then read sensor bits. We select eight or sixteen-bit serial access with a control bit. The new logic is 1051 LUTs. With the 10-ns part, the maximum delay between RCK and an output is 50 ns, so we have no doubt the CPU will run at 10 MHz. Our processor is untested as yet, firmware P3035A05.
We move the CPU behavior definition out of our main VHDL file and into our entities file. We define the above address map with constants, and implement it with the Memory Management Unit process. We add general-purpose interrupts to the Interrupt Handler, and allow the CPU to set them with the Interrupt Set Register.
[19-NOV-20] We convert our processor to big-endian byte ordering rather than little-endian. We begin work on the Open-Source Reconfigurable Eight-Bit Central Processing Unit (OSR8-CPU) assembler.
|0000||R/W||Program Variable Memory and Configuration Data (6 KB = 6144 Bytes)|
|17FF||R/W||Top of Program Stack (SP = 0x17FF = 6143)|
|1800||R/W||Sensor Data Hi Byte (D0-D7 give SD8-SD15)|
|1801||R/W||Sensor Data Lo Byte (D0-D7 give SD0-SD7)|
|1802||W||Sensor Register Location (D0-D6 give SA0-SA6)|
|1803||W||Sensor Control Register (initiates access, D0-D3: GYSEL, SAWR, SA16)|
|1A00||W||Transmit Hi Byte (D0-D7 give TD8-TD15)|
|1A01||W||Transmit Lo Byte (D0-D7 give TD0-TD7)|
|1A02||W||Transmit Channel Offset (channel = device_id + offset)|
|1A03||W||Transmit Control Register (initiates transmission)|
|1C00||R||Interrupt (D0-D7: TMR, TXD, SAD, INTG, INTA, GPI1-3)|
|1C02||R/W||Interrupt Mask (store "1" to Dn to enable interrupt n)|
|1C04||W||Interrupt Reset (write "1" to Dn to clear interrupt n)|
|1C06||W||Interrupt Set (write "1" to Dn to set interrupt n)|
|1C08||R/W||Timer Period (32.768 kHz Periods)|
We add "LD HL,PC" and "LD PC,HL" to permit us to implenent relative jumps, although at the cost of copying the program counter, adding to it in the accumulator, and over-writing the program counter. We correct bugs in our implementation of PUSH and POP, and move incrementing and decrementing IX, IY, and SP into combinatorial logic to accelerate indexed read and write operations. The new code takes 1129 LUTs.
[24-NOV-20] We eliminate the separate address incrementer in our CPU and perform incrementing and decrementing for SP, IX, and IY directly, which does not increase the code size. We compose our own eight-bid adder. The registers in P3035A_CPU.vhd are now global signals to permit us to refer to them outside the main CPU process. As written now, the CPU uses a variable called opcode to present the current value of the program data during an opcode read state, and to store this current value for use in other states. We are using this variable to control the eight-bit adder in combinatorial logic, assuming it will be valid immediately after the rising edge of CK. But this is not the case: outside the rising-edge if-clause, the opcode variable will be updated only upon rising edges, and no such update can respond to program data that updates on the same rising edge. The result is that we need two "inc" instructions, one after the other to perform an increment, but each "dec" instruction works fine. That's because "dec" is the default.
[25-NOV-20] Quiescent current consumption of the CPU stuck in a loop with no sensor or transmitter activity is 136 μA. We add to the firmware all logic needed for self-calibration of the ring oscillator. The CPU writes a value to the fast clock divider. We increase the divider from 9 to 16 and reprogram the A3035A for each. The VHDL logic remains identical, only the CPU program changes. Thus the ring oscillator is not re-compiled and its frequency remains the same. We plot TCK period versus divider below, and compare to the performance we obtained by adjusting a VHDL constant and re-compiling logic.
The CPU can set a self-calibrate bit, which turns on the ring oscillator. The clock controller counts TCK cycles for one half RCK period. If TCK is exactly 5 MHz, we will see 76 periods. The CPU reads out the count and can adjust the clock calibration constant to bring TCK to 5 MHz. With this code included in our P3035A logic, we are now at 1120 LUTs total. We are able to read out and transmit our TCK counter, and set the FCK divisor with software. Our increment, decrement, and addition instructions are working, as well as conditional jumps, pop and push.