Aceshardware Forum Index Aceshardware
(not so) temporary home for the aceshardware community
 
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups    RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

P4 stalls for >240 uSec

 
Post new topic   Reply to topic    Aceshardware Forum Index -> General forum
View previous topic :: View next topic  
Author Message
stevek999



Joined: 14 Jul 2008
Posts: 3

PostPosted: Mon Jul 14, 2008 2:11 pm    Post subject: P4 stalls for >240 uSec Reply with quote

Background: We have an in-house realtime OS developed from 386,486,pentium upwards as the available PICMIG boards change

Problem: We have come to evaluate a P4 board as our previous Celeron (socket 370) board is going EOL. When we try to use the COM2 port at 115K it misses characters. I've tracked it down to the Input instruction sometimes taking over 240 microseconds. Interrupts are disabled since we use this feature to download new software versions serially

Diagnotics: I have eventually ended up with a small piece of test ASM86 code on a Floppy Master Boot Record which times the COM2 status register read using the RDTSC instruction. Interrupts are disabled. This seems to prove that the problem exists. I've then tried booting from this floppy on various desktop type machines (single cores, dual codes, whatever) and most of them seem to exhibit the same symptoms although the time is different. The problem doesn't seem to occur on AMD processors

I can't believe that this is a problem with the Intel processors as other people using realtime OSs would surely have spotted it as it would affect interrupt latency

Has anyone heard of a problem like this

I've found many forums/threads where people talk about cache latency and such but 240+ MICROSECONDS seems a bit much

Thanks
Back to top
View user's profile Send private message
HenryWince



Joined: 07 Sep 2007
Posts: 15

PostPosted: Tue Jul 15, 2008 10:06 am    Post subject: Reply with quote

240 us indeed seems a tad high. Are you sure your measuring code is correct?

I would expect IN/INS to take a couple of thousand cycles at most. Those instructions are not fast as they perform bus serialization. I.e. IN is delayed until all pending stores have completed.
Back to top
View user's profile Send private message
who?



Joined: 01 Sep 2007
Posts: 540

PostPosted: Tue Jul 15, 2008 5:50 pm    Post subject: Reply with quote

Did you suspend the Interrupt? All of them?
Usually, what you see is the probably result of Non Maskable Interrupts (NMI)
One of your hardware services is interrupting your program. since you are under floppy drive, many of your devices are not programmed ... the USB subsystem for example.
I saw many people getting confused with this before, you may want to look into this.




who?
Back to top
View user's profile Send private message
stevek999



Joined: 14 Jul 2008
Posts: 3

PostPosted: Tue Jul 15, 2008 11:31 pm    Post subject: Reply with quote

who?,

as you can see from the following code I used the CLI command which disables all maskable interrupts. As for the NMI, the first symptom occurred when downloading new software serially (receive a byte and store in next memory location loop) and at that time there was no handler for NMI, or any other interrupts, so the processor would have double faulted if one had occurred

Here's the ASM86 code from my test MBR


; TEST CODE
; $$$$$$$$$$$$$$$$$$


CLI ; DISABLE INT'S

MOV EAX,0
DEC EAX
PUSH EAX ; LOOP COUNTER

; LOOP
; ----

L001:

; CHECK IF TIME TO OUTPUT A '-'
; -----------------------------

POP EAX
INC EAX
PUSH EAX
AND EAX,0FFFFFFH
JNZ L002

MOV AL,'-'
MOV AH,0EH
MOV BX,00007H
INT 10H

L002:


; EXECUTE 'CPUID' INSTRUCTION TO ENSURE PREVIOUS
; INSTRUCTIONS ARE ALL COMPLETE
; ----------------------------------------------

MOV EAX,0
; CPUID
DB 00FH,0A2H

; GET THE CPU CLOCK COUNTER VALUE IN EDX:EAX
; ------------------------------------------

; RDTSC
DB 00FH,031H

; SAVE ON STACK
; -------------

PUSH EAX
PUSH EDX

; GET COM2 STATUS (TEST INSTRUCTION)
; ----------------------------------

MOV EDX,2FDH
IN AL,DX

; EXECUTE 'CPUID' INSTRUCTION TO ENSURE PREVIOUS
; INSTRUCTIONS ARE ALL COMPLETE
; ----------------------------------------------

MOV EAX,0
; CPUID
DB 00FH,0A2H

; GET THE CPU CLOCK COUNTER VALUE IN EDX:EAX
; ------------------------------------------

; RDTSC
DB 00FH,031H

; RESTORE START CPU CLOCK COUNTER VALUE IN ECX:EBX
; ------------------------------------------------

POP ECX
POP EBX

; CALCULATE NETT CPU CLOCK COUNT
; ------------------------------

SUB EAX,EBX
SBB EDX,ECX

; CHECK IF COUNT IS EXCESSIVE OR FIRST TIME
; 20000 = 7 MICROSECONDS FOR 3.0 GHZ PROCESSOR
; OR 13 MICROSECONDS FOR 1.5 GHZ PROCESSOR
; OR 20 MICROSECONDS FOR 1.0 GHZ PROCESSOR
; ---------------------------------------------

POP ECX
OR ECX,ECX
PUSH ECX
JZ L010

CMP EAX,20000
JB L001

L010:

; OUTPUT EAX AS DECIMAL
; ---------------------


.....

This code was written to prove a point it is not meant to be real code. This was only writtten after the problem had manifested itself in other areas

Does anyone out there have the facilities to code this sequence into a DOS program and run it on a P4 for me

One observation I have is that this code is I/O intensive and loops at about 500,000 times a second. Since a dual core machine I tried it on was only showing the problem about once a second, how long would that equate to on a normally loaded XP machine
Back to top
View user's profile Send private message
andikleen



Joined: 10 Sep 2007
Posts: 59

PostPosted: Thu Jul 17, 2008 4:13 pm    Post subject: Re: P4 stalls for >240 uSec Reply with quote

It could be SMM code causing the latency spike.

Some systems have a periodic SMM handler doing some work for the BIOS. Normally you see this more on server platforms though. Another area
that can cause latencies is power management, but you don't seem to do
that.

The only good way to diagnose SMM is unfortunately to run it under
a hardware probe.
Back to top
View user's profile Send private message
Groo



Joined: 22 Jul 2007
Posts: 178

PostPosted: Tue Jul 22, 2008 12:20 am    Post subject: Reply with quote

Stupid question, but is HT off in the BIOS?

-Charlie
Back to top
View user's profile Send private message
stevek999



Joined: 14 Jul 2008
Posts: 3

PostPosted: Wed Jul 23, 2008 11:20 pm    Post subject: Reply with quote

After anothers weeks digging....

1) the problem is not restricted to an input instruction. I think that the reason I thought it was was because it originally showed up using a COM port and also that in any loop an I/O instruction takes the longest time and therefore is exposed to the problem for a larger proportion of the loop time

2) The problem is demonstrable on an old Dell Dimension 4400 from 2002 upto an Intel Core 2 Duo (E6550) from 2007

3) The stall time varies from PC model to PC model but 2 examples of one model I tried had the same value

4) The stall occurs at multiples of some interval. I'm still working on timing that interval, at the moment I use a loop counter so obviously the count is different for each PC

5) I have found a way of removing the problem on 4 different PCs. Many PC BIOSes allow you to disable the USB and/or the USB legacy mode. I've found that disabling the USB legacy mode works. Or on those PCs without that option, disabling USB completely, works. Unfortunately the only board where this doesn't fix the problem completely is the PICMIG board that I am evaluating. Disabling the USB only reduces the problem

Does anyone know of any published errata that may cover this 'feature' of the P4 and its chipsets as I could code a workaround into our realtime O.S if I knew what it was

BTW our realtime O.S doesn't use USB so disabling it is acceptable to us
Back to top
View user's profile Send private message
no@spam.com



Joined: 07 Oct 2007
Posts: 69

PostPosted: Thu Jul 24, 2008 6:39 am    Post subject: Reply with quote

stevek999 wrote:
After anothers weeks digging....

1) the problem is not restricted to an input instruction. I think that the reason I thought it was was because it originally showed up using a COM port and also that in any loop an I/O instruction takes the longest time and therefore is exposed to the problem for a larger proportion of the loop time

2) The problem is demonstrable on an old Dell Dimension 4400 from 2002 upto an Intel Core 2 Duo (E6550) from 2007

3) The stall time varies from PC model to PC model but 2 examples of one model I tried had the same value

4) The stall occurs at multiples of some interval. I'm still working on timing that interval, at the moment I use a loop counter so obviously the count is different for each PC

5) I have found a way of removing the problem on 4 different PCs. Many PC BIOSes allow you to disable the USB and/or the USB legacy mode. I've found that disabling the USB legacy mode works. Or on those PCs without that option, disabling USB completely, works. Unfortunately the only board where this doesn't fix the problem completely is the PICMIG board that I am evaluating. Disabling the USB only reduces the problem

Does anyone know of any published errata that may cover this 'feature' of the P4 and its chipsets as I could code a workaround into our realtime O.S if I knew what it was

BTW our realtime O.S doesn't use USB so disabling it is acceptable to us


http://www.usb-programming.com/interrupt-transfers.html

Take a look at your interrupt rate related to USB... and weep.
Back to top
View user's profile Send private message
who?



Joined: 01 Sep 2007
Posts: 540

PostPosted: Thu Jul 24, 2008 7:49 pm    Post subject: Reply with quote

no@spam.com wrote:
stevek999 wrote:
After anothers weeks digging....

1) the problem is not restricted to an input instruction. I think that the reason I thought it was was because it originally showed up using a COM port and also that in any loop an I/O instruction takes the longest time and therefore is exposed to the problem for a larger proportion of the loop time

2) The problem is demonstrable on an old Dell Dimension 4400 from 2002 upto an Intel Core 2 Duo (E6550) from 2007

3) The stall time varies from PC model to PC model but 2 examples of one model I tried had the same value

4) The stall occurs at multiples of some interval. I'm still working on timing that interval, at the moment I use a loop counter so obviously the count is different for each PC

5) I have found a way of removing the problem on 4 different PCs. Many PC BIOSes allow you to disable the USB and/or the USB legacy mode. I've found that disabling the USB legacy mode works. Or on those PCs without that option, disabling USB completely, works. Unfortunately the only board where this doesn't fix the problem completely is the PICMIG board that I am evaluating. Disabling the USB only reduces the problem

Does anyone know of any published errata that may cover this 'feature' of the P4 and its chipsets as I could code a workaround into our realtime O.S if I knew what it was

BTW our realtime O.S doesn't use USB so disabling it is acceptable to us


http://www.usb-programming.com/interrupt-transfers.html

Take a look at your interrupt rate related to USB... and weep.


That's what I guessed too, an MNI is probably issued from one of the device, USB, Floppy ... DVD drives? it is one of them.

who?
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic   Reply to topic    Aceshardware Forum Index -> General forum All times are GMT + 1 Hour
Page 1 of 1   

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB
Hosted by FreeForums.org