For those that do not know: In mainstream system architectures, cores do not really communicate directly with each other — e.g. there are no instructions in the
x86-64 ISA to "talk" to another core. A program
process executes
threads on the cores, and the threads may communicate through
shared memory. This puts a burden on the system to make sure the contents of all the caches in the system are
coherent, so that the threads can agree on a consistent view of the memory. The rules that threads need to follow and the guarantees that the system provides if they do are called a
memory model.
Hence, what is meant by "core-to-core latency" is actually memory access latency, taking into account the penalty of the cache coherency protocol, which in turn is very much affected by the amount of sharing (concurrent reading and writing, in particular).
Here is a great lecture on CPU caches and their effect on software performance. Spoiler: If a program's performance is limited by "core-to-core latency", the programmer may have done something really bad by accident or ignorance. Note that the final slides have great references to further in-depth material on this topic.