Quantcast
Viewing all articles
Browse latest Browse all 17527

OMAP-L137: Ethernet issue

Hi, My customer has a production stop at this moment and contacted me with following question on OMAP-L137. Please have a look at below customer questions.

I am currently investigating an issue with a motion board where we are using the TI OMAP L137 (OMAPL137DZKB3). During this investigation we encountered some unexpected behavior from the OMAP and would like your support to find the root-cause and find a solution. 

Background information:
The board layout is such that the OMAP processor is connected to a Ethernet PHY type KSZ8041 (from Microchip Technology) via the RMII bus. The MII interface is directly connected to a 1.5m UTP cable to a laptop (HOST). This laptop runs test SW to emulate one simple API call to the motion board over Ethernet.

Problem introduction:
Our end-customer reported an issue regarding time out errors on software API calls. The problem is that the OMAP does not respond to the host within the timeout period for the call (8 seconds) and an time out error is logged on the host. The timeout is initially set to 8 seconds but is increased to 3 min. as a work around, because eventually (typically after 1min 50 seconds) the call is always completed.

Inhouse Investigation:

  • We are able to reproduce this issue on all boards we tested so far.
  • We discovered that the problem is in the communication path from OMAP to host (via Ethernet):
    • On application level the response is formulated within milliseconds pushed into the TCP socket, but the host receives the reply after 1min50 seconds.
    • With WireShark we see network errors and resulting retransmits during the time out period.

  • The issue reproduces more easily on one board than on other boards and is temperature related. The number of timeouts increases with higher OMAP temperatures (tested up to 45 degrees Celsius).

Observations:

  • If we heat up the OMAP processor slightly with a heat gun we see that the time out period of the API call is increasing in order of one minute or even more.
  • If we ping to the OMAP processor during the time out period, this time out is decreased in the order of seconds but we do not really know why (possibly has to do with the TCP congestion mechanism in the kernel).
  • For every time the issue occurs we see that the RX Packet loss error counter on the host is increased by 1.
  • We see on an Oscilloscope that the timing between the RMII_CLK and RMII_TXEN signals is changing when the time out errors are occurring (please check the scope plots in the attachment). There are no changes noticed between the relation of the RMII_CLK and the RMII_D0/D1 signals. Also the RMII_CLK seems stable.
  • We additionally checked the supply noise, RMII timing and performed and performed an Ethernet compliance test. These are all within specification.

Question 1:

Is there any known relation between the RMII TXEN timing and an OMAP input (power supply, crystal, etc.) or by decoupling or PCB routing? What can cause the temperature sensitivity with respect to the EMAC subsystem?

 

Question 2:

Which specific supply pins are suppling the EMAC block? The PLL supply of the OMAP is measured to be within OMAP specification, but we would like to measure the supply noise as close to the EMAC block as possible.

OMAP_RMII_CLK_VS_RMII_TX_EN_ERRORS_during_OMAP_heating.png:

Image may be NSFW.
Clik here to view.

OMAP_RMII_CLK_VS_RMII_TX_EN_NO_ERRORS.png

Image may be NSFW.
Clik here to view.

Thanks in advance, Patrick


Viewing all articles
Browse latest Browse all 17527

Trending Articles