C++ timers

stevf

Senior member
Jan 26, 2005
290
0
0
Hello, looking to time sorting methods for a class and using clock_t works well enough but it looks like that method doesnt work too well when you start getting below 10 to 15 milliseconds as it will then often report zero. I dont need anything overly accurate but I was wondering if there were any other timers to use in C++ that might have better resolution. I see there is a win32 api call i can use of QueryPerformanceCounter and I will try that tonight but does anyone know of a portable method?


Thanks

Steve
 

EagleKeeper

Discussion Club Moderator<br>Elite Member
Staff member
Oct 30, 2000
42,591
5
0
You are looking to get into OS specific timers when getting into the 10 ms range.
 

stevf

Senior member
Jan 26, 2005
290
0
0
thanks - really doesnt matter for this project - I can try the win32 api I found or I can increase my data set to sort or just not worry about it in this case.



Steve
 

Markbnj

Elite Member <br>Moderator Emeritus
Moderator
Sep 16, 2005
15,682
13
81
www.markbetz.net
The high performance multimedia timers are usually the ones people use to get hi-res timing on Windows. Check out the multimedia timer API for more information.
 
Sep 29, 2004
18,665
67
91
Originally posted by: Markbnj
The high performance multimedia timers are usually the ones people use to get hi-res timing on Windows. Check out the multimedia timer API for more information.

This stuff is golden to know about. People should experiment with them just to reinforce the fact that they exist.

They have actually real world applications when trying determine if an algorithm will run fast enough for your given target system. Atleast you can determine if something is a risk item up front which is big in the real world.
 

degibson

Golden Member
Mar 21, 2008
1,389
0
0
rdtsc - x86 instruction to read the tick register. Highest resolution clock in the system.
 

Crusty

Lifer
Sep 30, 2001
12,684
2
81
Originally posted by: degibson
rdtsc - x86 instruction to read the tick register. Highest resolution clock in the system.

Isn't that what the QueryPerformanceCounter and QueryPerformanceFrequency functions tell you?
 

stevf

Senior member
Jan 26, 2005
290
0
0
i tested QueryPerformanceCounter and QueryPerformanceFrequency tonight and before quick sort would either give me 0, 15, or 16 milliseconds, using this timer I get around 5.3 milliseconds

Steve
 

Sc4freak

Guest
Oct 22, 2004
953
0
0
Originally posted by: Crusty
Originally posted by: degibson
rdtsc - x86 instruction to read the tick register. Highest resolution clock in the system.

Isn't that what the QueryPerformanceCounter and QueryPerformanceFrequency functions tell you?
Not necessarily. QueryPerformanceCounter will use the best timer available on the system. On most, that'll be the rdtsc instruction. But you should use QueryPerformanceCounter instead of the asm instruction.
 

degibson

Golden Member
Mar 21, 2008
1,389
0
0
Originally posted by: Sc4freak
Originally posted by: Crusty
Originally posted by: degibson
rdtsc - x86 instruction to read the tick register. Highest resolution clock in the system.

Isn't that what the QueryPerformanceCounter and QueryPerformanceFrequency functions tell you?
Not necessarily. QueryPerformanceCounter will use the best timer available on the system. On most, that'll be the rdtsc instruction. But you should use QueryPerformanceCounter instead of the asm instruction.

Most folks will prefer to wrap the timer with a function call. Depends on what is measured, however -- when measuring very detailed events, however, the overhead of just about any wrapper that isn't an ASM macro is too high. Since the rest of this thread seems to be mostly about millisecond-range events, rdtsc directly is probably overkill and could stand a wrapper layer or two.
 

stevf

Senior member
Jan 26, 2005
290
0
0
Thanks all for a good discussion on this topic - helped me learn interesting details. Does anyone know off the top of their head the linux equivalent or is that too variable depending on distro?

EDIT: looks like gettimeofday() is the linux version contained in sys/time.h


Thanks


Steve
 

EagleKeeper

Discussion Club Moderator<br>Elite Member
Staff member
Oct 30, 2000
42,591
5
0
There is a high preformance timer that can be added to the Windows OS that will give you micro seconds
 
Sep 29, 2004
18,665
67
91
Originally posted by: Sc4freak
Originally posted by: Crusty
Originally posted by: degibson
rdtsc - x86 instruction to read the tick register. Highest resolution clock in the system.

Isn't that what the QueryPerformanceCounter and QueryPerformanceFrequency functions tell you?
Not necessarily. QueryPerformanceCounter will use the best timer available on the system. On most, that'll be the rdtsc instruction. But you should use QueryPerformanceCounter instead of the asm instruction.

True, the MSDN liscense actually states that what timer is being used is undefined. That is the whole reason that the QueryPerformanceFrequency() funciton is needed.

I forget what the resolution is that I usually get with the high performance timer. I do remember that it was much better than 1 ms resolution though.
 

Crusty

Lifer
Sep 30, 2001
12,684
2
81
I've been able to get measurements in the .0X ms range using QueryPerformanceCounter.
 

degibson

Golden Member
Mar 21, 2008
1,389
0
0
Originally posted by: Crusty
I've been able to get measurements in the .0X ms range using QueryPerformanceCounter.

Sounds to me like that is not the cycle counter then. The cycle counter should have units of picoseconds.
 

Crusty

Lifer
Sep 30, 2001
12,684
2
81
Originally posted by: degibson
Originally posted by: Crusty
I've been able to get measurements in the .0X ms range using QueryPerformanceCounter.

Sounds to me like that is not the cycle counter then. The cycle counter should have units of picoseconds.

I never attempted to time it in an empty loop, I've always had a light load in the threads using the performancecounter for timing. I'll test it tomorrow at work
 

degibson

Golden Member
Mar 21, 2008
1,389
0
0
Originally posted by: Crusty
Originally posted by: degibson
Originally posted by: Crusty
I've been able to get measurements in the .0X ms range using QueryPerformanceCounter.

Sounds to me like that is not the cycle counter then. The cycle counter should have units of picoseconds.

I never attempted to time it in an empty loop, I've always had a light load in the threads using the performancecounter for timing. I'll test it tomorrow at work

For comparison, the tightest back-to-back rdtsc I've managed to achieve was either 3 or 7 cycles -- I don't recall which. I think rdtsc might flush the pipe, which would account for why it wasn't 1.
 

Crusty

Lifer
Sep 30, 2001
12,684
2
81
So testing in VMWare I can get resolution down to the nanosecond, I'm sure it would be faster if this was a native windows install.
 

stevf

Senior member
Jan 26, 2005
290
0
0
So I time all my sorts I was required to time and for fun i decided to add a few extras. All of them sorted 10,000 ints in an array except for one which was in an unordered list. One of the for fun sorts I tried was using the built-in sort function from list in the STL. The performance on this one was horrible, worse than a bubble sort. Bubble sort took about half a second and list.sort() took almost a second. Any ideas/thoughts why? From some simple research it looks like that function does a quick sort kind of sort but timing it doesnt back that up

 

tfinch2

Lifer
Feb 3, 2004
22,114
1
0
Originally posted by: Crusty
So testing in VMWare I can get resolution down to the nanosecond, I'm sure it would be faster if this was a native windows install.

Never trust timing results observed in a virtual machine, especially at that resolution.
 

degibson

Golden Member
Mar 21, 2008
1,389
0
0
Originally posted by: tfinch2
Originally posted by: Crusty
So testing in VMWare I can get resolution down to the nanosecond, I'm sure it would be faster if this was a native windows install.

Never trust timing results observed in a virtual machine, especially at that resolution.

rdtsc isn't privileged, and pre-nehalem HW can't virtualize it. It should be identical on VMs and on raw hardware.
 

degibson

Golden Member
Mar 21, 2008
1,389
0
0
Originally posted by: stevf
So I time all my sorts I was required to time and for fun i decided to add a few extras. All of them sorted 10,000 ints in an array except for one which was in an unordered list. One of the for fun sorts I tried was using the built-in sort function from list in the STL. The performance on this one was horrible, worse than a bubble sort. Bubble sort took about half a second and list.sort() took almost a second. Any ideas/thoughts why? From some simple research it looks like that function does a quick sort kind of sort but timing it doesnt back that up

The list makes a big difference. Unordered linked list = pointer manipulations, and every single one is going to be at least an L1D miss, probably an L2 miss to boot. Furthermore, OOO cores can't leverage I/MLP on lists, but can on vectors.
 

tfinch2

Lifer
Feb 3, 2004
22,114
1
0
Originally posted by: degibson
Originally posted by: tfinch2
Originally posted by: Crusty
So testing in VMWare I can get resolution down to the nanosecond, I'm sure it would be faster if this was a native windows install.

Never trust timing results observed in a virtual machine, especially at that resolution.

rdtsc isn't privileged, and pre-nehalem HW can't virtualize it. It should be identical on VMs and on raw hardware.

Not necessarily as VMware virtualizes the TSC.

To build on this, you can still obtain the value of the hardware TSC. To do this, you must enable performance counters in your vmx file, and use the rdpmc instruction. I believe we have a whitepaper out there somewhere on this.
 

degibson

Golden Member
Mar 21, 2008
1,389
0
0
Originally posted by: tfinch2
Originally posted by: degibson
Originally posted by: tfinch2
Originally posted by: Crusty
So testing in VMWare I can get resolution down to the nanosecond, I'm sure it would be faster if this was a native windows install.

Never trust timing results observed in a virtual machine, especially at that resolution.

rdtsc isn't privileged, and pre-nehalem HW can't virtualize it. It should be identical on VMs and on raw hardware.

Not necessarily as VMware virtualizes the TSC.

To build on this, you can still obtain the value of the hardware TSC. To do this, you must enable performance counters in your vmx file, and use the rdpmc instruction. I believe we have a whitepaper out there somewhere on this.

1) If you get a chance, please point me at the whitepaper, or something more technical would be even better.

2) My understanding of x86 in the 'pentium' era was that there was not a way to trap on rdtsc. There are ways to write TSC, so a VM could write the tick counter whenever a guest OS is entered, but it wouldn't have a way to interfere with the reading of two consecutive reads of TSC unless it was single-stepping or just got lucky with a timer interrupt.

3) I could be totally wrong about 2. x86 never fails to frighten me.
 

stevf

Senior member
Jan 26, 2005
290
0
0
one of the sorts I perform in this test is a merge sort based on an unordered list. The sort and list code was provided to me but it certainly is nothing special. That sort is slightly faster that the quick sort on an array. The data sorted is the same for each run as I loop through all the sorts 10 times and use the value of my loop counter for the seed. It is just the STL list that is horrible. I am going to try a few other things from the STL and see what happens. I have done far more than the assignment requires already but I am having fun throwing extra things in there and seeing what happens


Also, any suggestions for more tests? I was required to do a bubble, insertion, selection, quick, and merge using the code that was provided. I have added a gnome sort (just because of the silly name) and the STL sort. Was going to cast one set to double and run that through one or two of the sorts to see if that makes a difference. May add radix to it too. These should often be worst case scenarios as the data is random
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |