Development of Swimming Athlete image recognition system

steppinthrax · May 3, 2016

So, I'm a newly certified swimming official. It's funny, there is a job called Stroke and Turn official. Pretty much you stand at the end of the pool and analyze the swimmers stroke patterns, verify that they touch a certain way, verify the way their arms moves etc.

I put my eng cap on and I'm thinking of a computer system (camera on each end) that can build a wireframe model of the swimmer and make an assessment if they are swimming the stroke properly. The most important thing is the ability for the system to be able to see whether a body part is above or below water!

I'm just curious (from the get go), if this is a tall task, or too sci fi to even think about in today's tech. Other thing is I would want to use as much pre-packaged algorithms.

Is this possible with today's systems?

cbrunny · May 3, 2016

so I swim. Not an eng and not a programmer.

It is true there is a "best form" but every individual will have a slightly different form that is best for that individual.

Even in a more controlled sport like cycling where you have five guaranteed points of contact and a clear understanding of how the power generated in the legs translates into wattage at the pedals, each individual has different muscle composition and therefore a different dynamic that generates the most power. E.g. different seat angles, height, crank length, etc.

Swimming has zero points of contact and exists within moving fluid. If the best form is equal to that which generates most power (which is debatable depending on your stance on efficiency vs. endurance vs. speed), how do you measure how much power is generated?

Gryz · May 3, 2016

Have you ever written a computer program yourself ?

It is my experience that people who have no clue about programming often think that the most complex issues are kinda easy to program. They think they only lack a little programming exprience ....

In reality, writing software is always more complex than you think. And would probably require man-years to build.

Now if you could use off-the-shelf software libraries, that would make things more feasible. But writing pattern-recognition software from scratch ? Good luck with that.

purbeast0 · May 3, 2016

yes this is an extremely tall task. i don't know anything about it, but i would assume you would want something like mo-cap technology used in video games, where they wear those dots all over them with special suits, and then that transitions all of those points to a computer, where you can then make a model out of it. obviously that wouldn't work with swimming since it's underwater.

i'm also not sure where you would want to capture video from. swimmers are like half above water and half under water, so wherever your camera is, you'd be capturing like 1/2 of what's going on. i'd think you would almost want to have a drone overtop that is pointed down and following right overhead the swimmer. and then you'd have to use some image recognition software or like bio scanning software.

either way this is by no means a simple task and is most likely way out of the realm of your skill set, especially to just do in your spare time.

ControlD · May 4, 2016

Yeah, this sounds neat but nearly impossible.

I have been a licensed swim official for some time now myself, and one thing you quickly learn is that every swimmer's stroke is a little different. I think it would take a LONG time to build a database that could take all of those minute differences into account.

I think there are plenty of more useful applications that could help officials out that are more feasible than this.

Ken g6 · May 4, 2016

The other problem I see with this plan is that water distorts light. If your camera is just above the waterline, anything below will be garbled. If your camera is just below the waterline, anything above will be invisible. Plus, waves move the waterline.

Broheim · May 5, 2016

it's possible to do as a research project at a university with current tech, I'm not sure how good the system would be however.

this is a problem best solved with machine learning imo, and with machine learning in its infancy there isn't any off the shelf tech that can do what you're asking. This would take funding, time and talent mass I imagine are far beyond the scope of your idea.

in 5 years, however, I would not be surprised if you could get Azure to do what you're asking. Microsoft and others have been ramping up development in the area of machine learning and AI and making it more accessible.

slugg · Jun 3, 2016

Hey there. I have machine vision and image processing experience. Some relevant projects include vision-based tide and wave extrapolation, autonomous camera network calibration for 3D scene capturing, vision-based avionics, geolocation via skyline and horizon recognition, and cellular microscopy recognition in low signal to noise ratio images. It's been years, but I have the background.

What you're describing actually is possible using current techniques and equipment, but it won't work as you've described it. Let's go on a quick mental adventure.

There is no reason to build a wireframe; that's a human-ism, probably in line with what you'd see in a movie. A computer does not know or care about wire frames, and neither should you. Your goal is to automatically grade the quality of the swimmer's technique.

A computer cares about data and algorithms. Feed the computer a bunch of data, run it through an algorithm, and it gives you a result. The result could be a simple "yes" or "no," but it could also even be another algorithm altogether. Those results could be combined using another algorithm to get another result. And so on. Basically, the point is, computers are very good and taking in a bunch of data, then crunching it.

Now put yourself in that mentality. From that perspective, a camera is not a visual device; it's a two-dimensional data capture device. A video camera is a three dimensional data capture device (X, Y, and time). Now think about 1080p video at 30 frames per second for 10 seconds; that's over 600 million pixels of information, each of which has 6 dimensions (red, green, blue, X coordinate, y coordinate, time). Tons of data!

Alone, the data is meaningless. To teach the computer to "see" points of interest, we need to teach it to identify patterns in the data. Various algorithms are used to represent the same data in a different way. For example, I could describe a picture of a swimmer in a pool by listing out every single pixel, or I could say "dark blob inside of a light blue blob." Of course, the latter option makes us lose a ton of data, but it still conveys information. Information and data are not the same. There are literally thousands of features we could calculate for a single frame of video, ranging from "there is a lot of blue" to a statistical distribution of a histogram of gradients. By this point, we've gone from having an insurmountable amount of useless data to an insurmountable amount of potentially useful information. There's a big problem, though. If you asked the computer to tell you how the swimmer is doing by this stage, it would give you over a thousand random descriptions that, while all are true, you have no realistic way of knowing which ones actually matter.

Each piece of extracted information is called a feature. The collection of all features is called a feature vector. The dimensionality of the feature vector is literally just the number of features. What we need to do is shrink the thousands of dimensions down to just two or three. This process is a combination of three major areas: feature selection, dimensionality reduction, and modeling. The first one is dead simple: figure out which features you just want to blatantly ignore or remove. The last one is not so simple, but pretty straight forward: come up with a formula that takes in the features and gives you a quality of the swimmer's technique. But what about that ridiculously pompous sounding one in the middle?

Dimensionality reduction is the key to solving low SNR (signal to noise ratio, or "quality") machine vision problems. A swimmer that is moving in water that may be fully or partially submerged and may have all sorts of varying lighting conditions is definitely a low quality image. By contrast, you could say that scanning a text document is a high SNR image, since there are basically just two colors and no variance in lighting. So how does it work and how do we do it?

Think about a cube. That's easy to do, as it's only 3 dimensions. Now think about a 1000 dimension object. It ties your brain in a knot, right? But it's actually fairly simple in concept (in practice, you better have deep pockets for consulting fees). Draw a cube on a piece of paper. You will notice that despite the picture being only two dimensions, you can comprehend a three dimensional object. Another way of stating that is that you've represented 3 dimensional data with 2 dimensional information. Notice the distinction between data and information, again. So if you can represent 3D data with 2D information, couldn't you represent 4D data with 3D information? And so on?

The answer is yes. You can represent that 1000-dimension feature vector with 999 dimensions. You can keep going all the way down to 2 or 3 if you wanted to. You would be amazed as to how we can represent the scene. Expand your definition of the real world for a moment. Is the swimmer moving within the water, or is he using the water to move the pool around him? They're two different ways of explaining the same thing. But since we don't care about the pool and only care about the swimmer, we can use the latter representation. By doing this, we eliminate the dimension of "position", since now through this perspective, the swimmer suddenly stops moving and the idea of a position is no longer relevant. Keep up this mentality, only you need to do it mathematically.

Eventually, you will arrive at a smaller, manageable feature vector. You could then plug those numbers into a formula that gives you a quality rating for the technique of the swimmer. Figuring out this formula is called modeling. The short version is that we have known shapes, known formulas for those shapes, and a history of known swimmer data. Now find the shape that most closely matches your historic swimmer data, and there's your model. Plug the numbers in and it'll tell you how close it is to the theoretically perfect shape. The closer it is, the better the swimmer's technique.

Recap:

Collect data (video), process it into information, reduce the information to the most important components, and fit it to a model.

There you go! You have an automated swim coach!

Easy, right?

Development of Swimming Athlete image recognition system

steppinthrax

Diamond Member

cbrunny

Diamond Member

Gryz

Golden Member

purbeast0

No Lifer

ControlD

Diamond Member

Ken g6

Programming Moderator, Elite Member

Broheim

Diamond Member

slugg

Diamond Member

TRENDING THREADS