DMA upgrade on Dynamixel Teensy 3.2 Driver

After doing some more research, I decided to jump into configuring my DARwIn-OP’s Dynamixel Teensy 3.2 Driver Prototype to get rid of the retransmitting loop code:

void loop() 
{
  if (Serial1.available())
  {
    uint8_t c = Serial1.read();
    Serial1.write(c);
  }
}

This loop just wait for a byte received by the UART and retransmit it with the same UART, which is configured with a hardware direction pin (RTS driven by the transmitter).

If I could use Teensy’s DMA (Direct Memory Access) controller to do the UART retransmission, I could empty the loop so the Teensy can be used in any other tasks without affecting the retransmission performance.

The DMA requirement is very simple:

  • To trigger a DMA request when a byte is received by the UART.
  • The DMA transfer has to be 1 byte.
  • The DMA transfer has to read from the UART DATA register (received data).
  • The DMA transfer has to write into the UART DATA register (for retransmission).
  • Do not involve the CPU in the process, no interrupt events.

Looking at the DMA capabilities of the Teensy’s ARM MK20DX256VLH7 CPU, it looked feasible, and after a few tries I got it working.

Now the retransmission has a smaller latency and a much smoother timing:

Teensy with UART DMA
Teensy with UART managed by DMA

The signals in the chart are:

  • RX into the Teensy (from Odroid’s TX).
  • TX from the Teensy.
  • Hardward Direction Pin.

The coding has a larger setup to configure the DMA and the UART (Teensy’s interrupt driven serial API is no longer useful), but the loop() function is now empty:

#define UART_TXRTSE (2)
#define UART_TXRTSPOL (4)

#define BAUD_RATE (1000000)

void setup() 
{
  int divisor = BAUD2DIV(BAUD_RATE);  

  // DMA:
  
  // p 415 source address = uart data register
  DMA_TCD1_SADDR = &UART0_D;

  // p 415 source address offset
  DMA_TCD1_SOFF = 0;

  // p 416 transfer attributes: 8 bits
  DMA_TCD1_ATTR = 0;

  // p 417 minor byte count = 1 byte
  DMA_TCD1_NBYTES_MLNO = 1;

  // p 420 last source address adjustment = 0
  DMA_TCD1_SLAST = 0;

  // p 420 destination address = uart data register
  DMA_TCD1_DADDR = &UART0_D;

  // p 421 destination address offset
  DMA_TCD1_DOFF = 0;

  // p 423 channel link disabled
  DMA_TCD1_CITER_ELINKNO = 1;

  // p 423 last destination address adjustment = 0
  DMA_TCD1_DLASTSGA = 0;

  // p 427 channel link disabled
  DMA_TCD1_BITER_ELINKNO = 1;
  
  // p 424 control and status = 8 cycle stall, active
  DMA_TCD1_CSR = DMA_TCD_CSR_BWC(3) | DMA_TCD_CSR_ACTIVE;

  // p 402 enable DMA REQ channel 1.
  DMA_SERQ = DMA_SERQ_SERQ(1);

  // clock setup
  // p 252-259 system clock gating
  SIM_SCGC6 |= SIM_SCGC6_DMAMUX;
  SIM_SCGC7 |= SIM_SCGC7_DMA;
  SIM_SCGC4 |= SIM_SCGC4_UART0;
  
  // wait for clocks to become stable.
  delay(500);

  // p366 dma mux channel configuration  
  DMAMUX0_CHCFG1 = DMAMUX_ENABLE | DMAMUX_SOURCE_UART0_RX;

  // UART:
  
  // p 1222 UART0 Control Register 5 request DMA on receiver full
  UART0_C5 = UART_C5_RDMAS;

  // RX TX pins
  CORE_PIN0_CONFIG = PORT_PCR_PE | PORT_PCR_PS |
                     PORT_PCR_PFE | PORT_PCR_MUX(3);

  CORE_PIN1_CONFIG = PORT_PCR_DSE | PORT_PCR_SRE |
                     PORT_PCR_MUX(3);

  // p 1208 uart0 baud rate  
  UART0_BDH = (divisor >> 13) & 0x1F;
  UART0_BDL = (divisor >> 5) & 0xFF;
  UART0_C4 = divisor & 0x1F;

  UART0_C1 = UART_C1_ILT;
  UART0_TWFIFO = 2; // tx watermark
  UART0_RWFIFO = 1; // rx watermark
  UART0_PFIFO = UART_PFIFO_TXFE | UART_PFIFO_RXFE;

  UART0_C2 = UART_C2_TE | UART_C2_RE | UART_C2_RIE;

  // enable PIN 6 as hardware transmitter RTS with active HIGH.
  CORE_PIN6_CONFIG = PORT_PCR_MUX(3);
  UART0_MODEM = UART_TXRTSE | UART_TXRTSPOL; 
}

void loop() 
{
}

 Actually, now I am running the typical ‘blink’ in the loop() function just so I know the Teensy is running.

New Dynamixel Driver with a Teensy 3.2

I replaced my initial TTL prototype with a Teensy 3.2, this is a development board with a 32 bit 72MHz ARM CPU in a small 35×18 mm board.

Odroid-XU4, Teensy 3.2 and Servo MX-28 Setup
Odroid-XU4, Teensy 3.2 and Servo MX-28 Setup

I selected this board because it has several serial interfaces (UART) supporting:

  • 1 an 3 Mbps, baudrates that can be used with the MX-28 servo.
  • A hardware direction pin. Its hardware RTS pins can signal when the UART is transmitting, (by configuring the RTS being driven by the transmitter instead of the receiver part of the UART, sadly the Odroid-XU4 (as several other boards) does not support this option from what I gather from its available CPU documentation.

Also Teensy’s site documentation seems good enough.

Robotis documents this setup to interface a UART to the Dynamixel bus. It requires a UART and a direction pin with 5V logic:

Robotis Citcuit Interface to Dynamixel Bus
Robotis Citcuit Interface to Dynamixel Bus

So by placing a Teensy 3.2 between the Odroid-XU4 and the Dynamixel bus I can generate the direction pin by hardware instead of a delay-prone software implementation.

This is a simplified schematic of the setup, I changed the receive buffer with an OR gate in order to avoid a pull-up resistor:

Odroid-XU4, Teensy 3.2, Dynamixel Schematic
Odroid-XU4, Teensy 3.2, Dynamixel Schematic

This setup only uses 1 UART on the Teensy. Teensy’s function is only to retransmit to the Dynamixel Bus and to generate the direction pin. Meanwhile the feedback from the Bus goes directly back to the Odroid, there is no need to pass it through the Teensy. This way, this setup can be used with other micro-controllers that only have 1 UART and there is no extra delay in the feedback. The Odroid provides 1.8V and 5V that power the Teensy and level shifters, and the Teensy provides 3.3V that also powers the level shifters.

Probably my final setup will use 2 UARTs in the Teensy, so it can generate a feedback to the Odroid and participate as another device in the Dynamixel bus (under its protocol) and have an extra function, like PWM or analog I/O. It will depend if there is enough idle time in the bus to add more commands, but the current 8ms control cycle in the DARwIn-OP software is very limited.

The Teensy’s site Documentation for the UART is strait forward. It is programmed with an add-on to the Arduino IDE called Teensyduino.

The following (and flawed, as pointed later) program is easily derived to retransmit through the serial interface with a direction pin:

void setup() 
{
  Serial1.begin(1000000);
  Serial1.transmitterEnable(6);
}
void loop() 
{
  if (Serial1.available()) 
  {
    uint8_t c = Serial1.read();
    Serial1.write(c);
  }
}

In this example pin 6 is setup as the direction pin to signal when a transmission is in progress.

On the Odroid-XU4 side, the standard Dynamixel library can be used. The only change it needs is the name of the serial device which is /dev/ttySAC0 for the UART exposed in the Odroid’s expansion connector 10.

This configuration at 1Mbps worked interacting with a MX-28 servo. But 2 things didn’t work as planned:

  • There was a delay of about 5 bytes (50 us) in the retransmission. I was expecting over 1 byte, but not that much.
  • The direction pin 6 did not work properly all the time (this was not noticeable right away).

Retransmission delay

Teensy Retransmission Delay
Teensy Retransmission Delay

The Teensy’s ARM MK20DX256VLH7 CPU documentation describes in chapter 47 the UART interface. In section 47.3.21 it describes the UART_RWFIFO register that configures the threshold for the receive buffer before interrupting the CPU, its value is 1 after reset.

By checking the Serial1.begin() library source code I noticed that this UART_RWFIFO threshold is increased to 4. This allows for a lower CPU usage in handling receiving data, but it adds latency. Also, the library code handles the UART by hardware interrupt events. So actually the CPU knows that data was received after the first 4 bytes have being received (if less than 4 bytes is all that is transmitted,  the CPU will also be notified by an idle interrupt event). The Serial1.available() function does not query the UART, it only checks some software buffers that are actually filled through interrupt handling.

Since now I am only using the Teensy to retransmit, I lowered the threshold back to 1 byte by modifying the setup function in my code:

void setup() 
{
  Serial1.begin(1000000);
  Serial1.transmitterEnable(6);

  // set receiver buffer threshold for interrupt back to 1.
  uint8_t c2 = UART0_C2;
  UART0_C2 = c2 & ~UART_C2_RE; // disable C2[RE] (receiver enable)
  UART0_RWFIFO = 1;            // set receiver threshold
  UART0_C2 = c2;               // restore C2[RE]
}
void loop() 
{
  if (Serial1.available())
  {
    uint8_t c = Serial1.read();
    Serial1.write(c);
  }
}

UART0_C2 and UART0_RWFIFO point to hardware configuration registers and are defined in Teensy’s library header files. The CPU’s hardware UART #0 maps to the library’s Serial1 C++ object.

Flawed Direction Pin

After stress testing, 1 in every 30 to 100 commands to the servo would result in a timeout waiting for the servo’s response. So after several attempts I captured some cases where the direction pin worked incorrectly.

Direction Pin Failure
Direction Pin Failure 1 – Servo still responds, but there is an extra initial return byte.
Direction Pin Failure 2
Direction Pin Failure 2 – Servo gets byte 4 corrupted, no response.

In the digital probe chart, the signals are:

  1. TX from Odroid-XU4
  2. TX from Teensy 3.2 retransmission
  3. Direction Pin from Teensy 3.2
  4. Dynamixel Bus
  5. RX back to Odroid-XU4

Normally, the direction pin works okay, but sometimes it would deactivate during transmission. Failure chart 1 shows a case that do not affect the message to the servo, but the Odroid receives back an extra initial 0xFF byte. Failure chart 2 shows a message being corrupted, the 4th byte in the Dynamixel bus has value 63 but should have value 5.

During my previous check at the Serial1 library source code I noticed that transmitterEnable functionality is actually implemented by software. It is not using the hardware RTS feature of the UART. From the signal analyzer probing, it is obvious that this software implementation is flawed. Since I am interested in a hardware solution, I did not try to fix the library source code, but I did notice at least one race condition not properly handled.

So after reviewing again Teensy’s ARM MK20DX256VLH7 CPU documentation, I found the hardware solution in section 47.3.14, the UART_MODEM configuration register describes how to configure RTS to signal when the UART transmitter is active.

Also this other ARM K20 document describes the hardware pins’ multiple configuration. In chapter 8.1, it lists how the different internal hardware signals can be multiplexed to the external CPU pins. In particular, CPU pins 25, 37, and 61 can be configured as RTS for UART 0. These are CPU pins, not Teensy’s board pins. This schematic shows that only 2 pins in Teensy’s board are available, pin 6 (connects to CPU pin 61) and pin 19 (connects to CPU pin 37). After digging a bit some other code around I found how to program the configuration of a pin, in particular, pin 6 as RTS (ALT3 functionality).

This is the final version for the setup to use a hardware controlled direction pin, the call to the flawed Serial1.transmitterEnable() library API was removed.

#define UART_TXRTSE (2)
#define UART_TXRTSPOL (4)

void setup() 
{
  Serial1.begin(1000000);
  
  // set receiver buffer threshold for interrupt back to 1.
  uint8_t c2 = UART0_C2;
  UART0_C2 = c2 & ~UART_C2_RE; // disable C2[RE] (receiver enable)
  UART0_RWFIFO = 1;            // set receiver threshold
  UART0_C2 = c2;               // restore C2[RE]

  // enable PIN 6 as hardware transmitter RTS with active HIGH.
  CORE_PIN6_CONFIG = PORT_PCR_MUX(3);
  UART0_MODEM = UART_TXRTSE | UART_TXRTSPOL; 
}
void loop() 
{
  if (Serial1.available())
  {
    uint8_t c = Serial1.read();
    Serial1.write(c);
  }
}

The direction pin works flawlessly and has a more consistent timing:

Teensy Driver Working
Teensy Driver Working Fine

Now I need to shrink this prototype to a circuit board under the Teensy 3.2.

Testing new Dynamixel Driver

As I am using an Odroid-XU3 (and now I’ll be upgrading to an Odroid-XU4) with an USB2AX adapter to interface the Dynamixel MX-28 servos, I am getting annoyed by the USB delay of the adapter.

As the USB2AX is an USB 1.1 device and USB 1.1 has a 1ms time framing for transmitting and receiving data. Any Dynamixel commands takes at lease 2ms which is unacceptable. Even using bulk read and write instructions (which are taking longer time as I connect more servos) I would like to comply to the 8ms control loop in the original DARwIn-OP programming. So I am getting too close to the limit.

Luckily, as described in odroid forum, the Odroid’s UART interface can be setup to the Dynamixel’s bus specifications (8 bit, 1 stop, No Parity) at 1Mbps and all the way up to 3Mbps!!!

So I am working on a driver to implement the Dynamixel’s TTL half-duplex serial interface with just the RX and TX pins from the Odroid (without any control pin to command the half-duplex direction). This way it can be use with simple UARTs such as the one found on the Odroid which do not have a hardware output to signal an active transmission in progress (and I don’t want to use a software controlled gpio to avoid time mismatches).

So far I did a TTL prototype (Odroid’s UART is 1.8V while Dynamixel bus is 5V), but I’ll continue by shrinking this to a micro-controller soon.

TTL servo driver prototype
TTL servo driver prototype

Before this I was working on using an SPI UART to replace the USB2AX, but that is going to the trash now.

Pause 2

Again, after a summer pause I am retaking this DARwIn-OP cloning project.

Right now I am upgrading mi Sherline mill with digital RPM control and replacing the Y thread which was worn-out and it was getting visible in the last cuts.