Keep in mind that aliasing is the result of generating images of (and mapping them to) a display with discrete elements (pixels), and then shown to beings with continuous vision.
Humans do not have a continuous vision. They have a discreet countable amount of cones and rods in (an imperfect) pattern.
Would a game running on a 'retina' display at native resolution receive no benefit from AA?
AA is an optical illusion created by blending nearby pixels to mask the fact an image is rendered unto pixels which are large enough for a human to distinguish between them.
Being projected unto a retina is not a magic bullet for solving this issue. Although it does make things simpler in some regards (and more difficult in others, see below)
Ultimately the resolution/dot pitch is what matters most for your ability to detect jaggedness.
The human retina contains about 120 million rod cells (color blind, sensitive to low light and motion, primarily peripheral vision) and 5 million cone cells (3 types for 3 colors, require bright light, centralized at middle of eye).
Due to the brain processing the data organically it probably does NOT consider each receptor cell to be a pixel and thus the distinguishable resolution is somewhat lower. I'm not quite sure if it uses rods to augment resolution but if it does then not by much considering how poor peripheral vision is.
However
http://hyperphysics.phy-astr.gsu.edu/hbase/vision/rodcone.html
As you can see cones are extremely dense in the center of the eye and thus you do not simply have a color resolution of 5 megapixels, but a focus resolution of 5 megapixel (or rather, probably somewhat under that due to the organic nature of the brain). Rod cells are color-blind and make up primarily (but not exclusively) the peripheral vision.
At some point the resolution would be high enough that the human eye could not distinguish jaggedness and AA would not be needed at all. Whether it is projected on the retina or from a monitor.
I would say then that a display should be 5 megapixels per area of focus.
So you need to measure a human's focus area, compare it to distance they sit from display, and make it have enough pixels so that no matter where on the display you focus you cannot distinguish jaggedness.
A retinal projection would have the advantage of being high resolution central display focused directly unto the center of the eye and secondary peripheral low resolution display for the peripheral vision. This would greatly reduce the total amount of pixels needed to eliminate the need for AA.
Some maths.
If a human focus area (at the distance you sit from a monitor; this varies and can be better controlled with retinal projection display) is 10cm diameter circle (area = 25cm*pi) and can distinguish 4 megapixels (hypothetical numbers, I am not sure if true) then it has a pixel density of 0.0509 megapixels/cm^2
A 16:10 display that is 24 inches diagonally is 20.35 cm by 12.72 cm aka 258.88 cm^2. And would in the above scenario need to have need to have 13.18 megapixels for them to correspond 1 to 1 to cells in the eye and be indistinguishable such that AA would be obsolete.
4K displays are almost at that, but since I just guessed at the focal area and resolution earlier on there is no way of telling without proper research to actually deduce the correct figures.