C++ : The cost of pointers

Carlis

Senior member
May 19, 2006
237
0
76
Hi
I am planning to solve a physics problem through a special simulation technique.
I am very fond of pointers, and in this problem they would make life a lot easier.
In this problem the 'core' of the task would consist of manipulating pointers and thus the way different objects are related to each other.

I have the impression that there is a prize to pay for this in terms of computational performance. How can I estimate the performance loss? Does it result from bad utilisation of cache, or what is the story?

Best regards,
Carlis
 

Cogman

Lifer
Sep 19, 2000
10,283
134
106
Interesting question and you are right to wonder about cache because ultimately the cost of pointers is going to be closely tied to how often you are hitting/missing the cache.

I would strongly suggest that you use a profiler such as gprof. You'll often find that IO and algorithmic complexity trumps the cost of individual operations. At very least, you'll be able to focus your efforts on things that matter most.
 

LevelSea

Senior member
Jan 29, 2013
942
53
91
Interesting question and you are right to wonder about cache because ultimately the cost of pointers is going to be closely tied to how often you are hitting/missing the cache.

I would strongly suggest that you use a profiler such as gprof. You'll often find that IO and algorithmic complexity trumps the cost of individual operations. At very least, you'll be able to focus your efforts on things that matter most.

Bogosort FTL
 

nickbits

Diamond Member
Mar 10, 2008
4,122
1
81
Using pointers is probably going to be quicker than any other solution you come up with. Unless using them requires a lot of nested loops and recursive functions, they you may pay a penalty.
 

TheRyuu

Diamond Member
Dec 3, 2005
5,479
14
81
See agner's "Optimizing software in C++" optimization guide[1]. Pointers (and references) are addressed on pages 36-38.

Basically you shouldn't have to worry about performance because of how microprocessors are constructed. Also a lot of what you're already doing in C++ is being done through pointers (e.g. relative to the stack pointer or implicit "this" pointer).

The optimization manual should cover all the cases you'd want to know about when dealing with pointers. There are also others which deal with a few other topics (such as optimizing stuff in assembly[2]).

Edit:
Unless using them requires a lot of nested loops and recursive functions, they you may pay a penalty.

It would only be a problem if it had to read it from memory on every iteration (i.e. it couldn't be stored in a register).

[1] http://www.agner.org/optimize/optimizing_cpp.pdf
[2] http://www.agner.org/optimize/
 
Last edited:

Carlis

Senior member
May 19, 2006
237
0
76
Thank you all for your input. I think a good starting point is to read the referred guide.
Perhaps I can do some small scale experiments on some select problem to see what happens.
Thanks a lot
//
Carlis
 

Merad

Platinum Member
May 31, 2010
2,586
19
81
Pointers add a layer of indirection, so of course there is a (very negligible) amount of overhead associate with their use. This is basically irrelevant, because it's an unavoidable cost of developing software.

Cache thrashing should only be a problem if your data set is very large (much too large for the d-cache) and your data access patterns are bad. The classic example is when accessing a 2d array. If you iterate down the columns of the array every data access causes a large jump in memory, destroying the effectiveness of the d-cache (aka cache thrashing). Iterating down each row before going to the next column maximizes the effectiveness of the d-cache because memory accesses are sequential. On a large array, a column-first algorithm can be literally hundreds of times slower than a row-first algorithm that does the exact same thing.

In short, don't make blatantly bad choices in your algorithm, then if your performance is not acceptable run profiling to identify problems.
 

sao123

Lifer
May 27, 2002
12,650
203
106
if I recall correctly, the cost of dereferencing a pointer and subsequently accessing the data pointed to is based upon the size of the object which needs to be referenced.
built in types should be extremely fast, very large objects might be somewhat slower.
 

Merad

Platinum Member
May 31, 2010
2,586
19
81
if I recall correctly, the cost of dereferencing a pointer and subsequently accessing the data pointed to is based upon the size of the object which needs to be referenced.
built in types should be extremely fast, very large objects might be somewhat slower.

The size of the object shouldn't matter most of the time. If the object is larger than a single cache line, you increase the odds of having a cache miss since part of the object potentially could be evicted from cache while you're still using it. In a modern CPU however caches are large enough that that seems unlikely, and even if part of the object was evicted from L1 it would almost certainly be re-loaded from L2 or L3, so the penalty is minimal
 

sao123

Lifer
May 27, 2002
12,650
203
106
im not specifically talking about cache penalty...

dereferencing a pointer is a very quick operation, since it merely involves obtaining a memory location address.

What I was addressing was, it is faster to (perform operation) read/copy/move on a single int/float size, than a array of 100 items, or a class with 50 members,
 

Fox5

Diamond Member
Jan 31, 2005
5,957
7
81
im not specifically talking about cache penalty...

dereferencing a pointer is a very quick operation, since it merely involves obtaining a memory location address.

What I was addressing was, it is faster to (perform operation) read/copy/move on a single int/float size, than a array of 100 items, or a class with 50 members,

I'm confused as to what you are asking.

At the very least, adding in a pointer dereference is costing you an additional operation. If your access patterns are bad, then you'll also be hitting main memory and missing the cache.

If you're asking whether it's faster to do:
int x1;
int x2;
... up to int x50;

or int x[500];

or a class with 50 member variables...

Well it depends. On an optimizing compiler, all 3 might be equal.
Otherwise, I'd expect the array to be the fastest, and the class with 50 member variables to be the slowest. Each member variable would require a dereference of a memory location, followed by the operation you want to do. On an optimizing compiler, it could create a new stack frame to handle the class members, or otherwise realize they're all contiguous for accesses.
 

Cogman

Lifer
Sep 19, 2000
10,283
134
106
I'm confused as to what you are asking.

At the very least, adding in a pointer dereference is costing you an additional operation. If your access patterns are bad, then you'll also be hitting main memory and missing the cache.

If you're asking whether it's faster to do:
int x1;
int x2;
... up to int x50;

or int x[500];

or a class with 50 member variables...

Well it depends. On an optimizing compiler, all 3 might be equal.
Otherwise, I'd expect the array to be the fastest, and the class with 50 member variables to be the slowest. Each member variable would require a dereference of a memory location, followed by the operation you want to do. On an optimizing compiler, it could create a new stack frame to handle the class members, or otherwise realize they're all contiguous for accesses.

A class with 50 member variables is going to be just as fast as an array with 50 elements. In either case you are dereferencing memory to get to a memory location. In both cases, the memory is going to be a contiguous block (cache effective). This is before talking about optimizations.

Talking about optimizations, a class can be much faster than an array. This is because you can put access constraints on the class the can help the compiler out. For example, making a class immutable allows the compiler to do a whole lot of tricks with method calls and member accesses that it can't do with an array.
 

Merad

Platinum Member
May 31, 2010
2,586
19
81
im not specifically talking about cache penalty...

dereferencing a pointer is a very quick operation, since it merely involves obtaining a memory location address.

What I was addressing was, it is faster to (perform operation) read/copy/move on a single int/float size, than a array of 100 items, or a class with 50 members,

The array operation can potentially be optimized with SSE instructions. Other than that, there is no difference.
 

sm625

Diamond Member
May 6, 2011
8,172
137
106
When I wonder what a particular piece of code is costing me, I just put it in a loop of one million or so iterations and then use the performance counter before and after the loop to tell me how long it takes. I've tested many simple things like function calls and type casts. Needless to say they all cost pretty much nothing. But I've never seen any incidence where using a pointer is slower. It has always been either the same or faster. What really costs time is anything to do with strings (other than simple char arrays). CString being the biggest offender. It actually takes 5000 times more cpu cycles just to assign a value to a CString vs a char* type string.
 

gdansk

Platinum Member
Feb 8, 2011
2,896
4,387
136
Code:
std::string a;
a.size();

Code:
std::string a;
std::string *b = &a;
b->size();

produce functionally identical assembly on my machine.

The true cost of pointers comes in when data is not proximate to each other. For example, in linked lists. If the address is not in the cache, the processor will be forced to load a new section from memory. TheRyuu previously linked Agner's guide. Read that.
 
Last edited:

Merad

Platinum Member
May 31, 2010
2,586
19
81
When I wonder what a particular piece of code is costing me, I just put it in a loop of one million or so iterations and then use the performance counter before and after the loop to tell me how long it takes.

That's not a very reliable way to test performance FYI. For example, a piece of code that normally only runs a few times might have a very poor (say 50%) branch prediction rate. Loop the same code a million times so that the branch predictor has a chance to warm up and it might easily hit a 95+% prediction rate. Same goes for cache performance. IOW, the first few iterations of the loop are likely to perform drastically worse than the overall average. A profiler is really the only way to accurately test.
 

Merad

Platinum Member
May 31, 2010
2,586
19
81
Code:
std::string a;
a.size();

Code:
std::string a;
std::string *b = &a;
b->size();

produce functionally identical assembly on my machine.

I don't know what settings you compiled with, but appears that the compiler has optimized away everything. Your main, if unoptimized, should contain at least 3 function calls: string::string(), string::size(), and string::~string().

An easier example is in C, compiled with -m32 for easier readability (IMO):

Code:
#include <stdio.h>

 int main()
 {
   int x = 10;
   printf("%d\n", x);
 }

----

080483c4 <main>:
80483c4: 55                    push   %ebp
80483c5: 89 e5                 mov    %esp,%ebp
80483c7: 83 e4 f0              and    $0xfffffff0,%esp
80483ca: 83 ec 20              sub    $0x20,%esp
// set the value of x in memory
80483cd: c7 44 24 1c 0a 00 00  movl   $0xa,0x1c(%esp)
80483d5: b8 b4 84 04 08        mov    $0x80484b4,%eax
// load x from memory
80483da: 8b 54 24 1c           mov    0x1c(%esp),%edx
// places copy of x on the stack as the printf arg  
80483de: 89 54 24 04           mov    %edx,0x4(%esp)  
80483e2: 89 04 24              mov    %eax,(%esp)
80483e5: e8 0a ff ff ff        call   80482f4 <printf@plt>
80483ea: c9                    leave
80483eb: c3                    ret

Code:
#include <stdio.h>

 int main()
 {
   int x = 10;
   int* y = &x;
   printf("%d\n", *y);
 }

----

080483c4 <main>:
80483c4: 55                    push   %ebp
80483c5: 89 e5                 mov    %esp,%ebp
80483c7: 83 e4 f0              and    $0xfffffff0,%esp
80483ca: 83 ec 20              sub    $0x20,%esp
// store x in memory
80483cd: c7 44 24 18 0a 00 00  movl   $0xa,0x18(%esp)
// get the address of x
80483d5: 8d 44 24 18           lea    0x18(%esp),%eax
// store the address of x in y
80483d9: 89 44 24 1c           mov    %eax,0x1c(%esp)
// load y from memory
80483dd: 8b 44 24 1c           mov    0x1c(%esp),%eax
// load *y from memory
80483e1: 8b 10                 mov    (%eax),%edx
80483e3: b8 c4 84 04 08        mov    $0x80484c4,%eax
// copy *y on the stack as a printf arg
80483e8: 89 54 24 04           mov    %edx,0x4(%esp)
80483ec: 89 04 24              mov    %eax,(%esp)
80483ef: e8 00 ff ff ff        call   80482f4 <printf@plt>
80483f4: c9                    leave
80483f5: c3                    ret
 

neocpp

Senior member
Jan 16, 2011
490
0
71
In my experience the cost of pointers is pretty small compared to how you can potentially use bad access patterns because of them. Only a few times when doing some numerically intensive work has there been a small but noticeable performance hit (depending on the application you may be able to mitigate it with templates). Out of curiosity, what sort of problem are you planning on solving?

Also, don't forget to turn on optimization when benchmarking, since I'm assuming you do that when actually running your application.
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |