High School Student Blog: The Role of AI in Fencing (Refereeing)

A Rare Look at Computer Vision and Action Classification in the Sport of Fencing

Fencing isn’t exactly known for being at the leading edge of sports technology. It wasn’t until the 1990s that foil fencing had electric scoring systems. However, fencing presents a unique challenge that should be perfect for artificial intelligence.

The Why

Unlike many other sports, the referee in fencing has a huge influence over the outcome of about. The ref has the power to determine who gets the point after a touch. It’s not like in basketball, where if the ball goes in the hoop, you’re guaranteed to score points. There are complex rules of right of way that dictate which fencers score a point in the event they both hit. Many of these rules are up to interpretation by the referee.

Good referees are also hard to come by. The fencing community is already relatively small, and as you can imagine, the refereeing community is even smaller. Referees often work 10 hours a day during tournaments.

This is where artificial intelligence can hopefully step in and help. The obvious advantages being that there is no limit to how many bouts a software algorithm could ref and that algorithms are consistent in their output. Realistically, an artificial intelligence referee just wouldn’t be able to replace a human one and would be only there to assist the human referee. This could be especially useful when a fencer challenges a referee’s call, something that is currently limited to only very high-level fencing where there is more than one ref at hand.

If you want to catch up on how right of way works in fencing, Ninh Ly has a great and simple clip on it.

The Concept

Our task here is a binary classification problem, either touch left or touch right (we’re excluding simultaneous touches here to make the task much simpler). We’re fortunate to have a vast library of fencing videos on youtube that we can use to train the model. All we need to do is cut the video into smaller clips and label each as touch left or right, which we will cover in the next post. After that, we can treat this as a human action or video classifier.

The Roadmap

What to expect in upcoming posts.

Data Collection and Preprocessing

We automatically collect and label fencing videos using sholtodouglas’s amazing repository and perform data augmentation.

Openpose

As an alternative to convolutional networks, we use OpenPose software to help us extract relevant features to feed into the model.

Fitting the Model

Let’s see how LSTMs, GRUs, CNNs, and TCNs perform on the data and what challenges this particular dataset has.

Pose Estimation and Preprocessing for an AI Fencing Referee

Using Openpose to Process and Extract Features from Fencing Video Data

In the last blog post, we collected our dataset of 2-second clips that are labeled either left touch or right touch. Unfortunately, we can’t feed this directly into a model to start training. Typically with action classification models, some sort of CNN is used on the raw video to extract the features. However, since in our case we are looking for the body features of each fencer we can use a pose estimation model to extract features in place of a CNN. We’ll use OpenPose, a popular and effective pose estimator which you can check out here.

As you can see, OpenPose outputs the joints of each person fairly well. However, OpenPose estimates the pose for everyone in the frame, and we only want the pose of our 2 fencers. Luckily, OpenPose comes with a built-in parameter, number_people_max which does what the name suggests. It takes the two poses that the model is most confident on and outputs only those.

It’s not the perfect workaround since there are cases where the model will detect the wrong person such as a referee. There’s no easy way to fix this other than training a model to detect only fencers from scratch so for now, we’ll just have to filter these out manually.

Other than outputting video files, OpenPose can also output JSON files of the coordinates of each pose which is what we will be using to train our model. It acts as a great feature extractor, removing the background and leaving only the pose of the fencer’s behind. Running OpenPose on our entire dataset takes about 24 hours. Now, each clip is represented by 30 JSON files representing each frame. We’ll convert these to 30 lists to make things easier and convert them back to a NumPy array later.

We’re not done yet, however. OpenPose outputs the confidence of each joint which we don’t need so we can throw those out. It also outputs feet and head joints which while are cool to look at will likely cause the model to overfit so we can throw those out too. The feet and head are not vital when determining right of way. Another important factor to keep in mind is that for our model, order matters. OpenPose doesn’t track people across frames meaning that the left fencer might appear first in the list for one frame and then second in the list for another. This doesn’t matter when we’re visualizing the data but it will mess the model up. To fix this, we’ll just say whichever pose is more to the left (has a lesser X-axis value) will appear first in the list. We’ll also do the same for each individual arm and leg. Next, we’ll handle mistakes by the OpenPose model by removing any outliers. We’ll use a general rule of thumb in statistics and say that any point that is more than 1.5 times the IQR away from the 1st and 3rd median is an outlier and should be removed. We’ll also handle the occasional frames like the example above where someone other than the 2 fencers is detected by saying that if the center of the fencers in the current frame is more than 60 pixels from the previous frame’s center, it is probably incorrect. We’ll also do our best to resize the fencers. We’ll first center them vertically in the frame using their median y-value and then expand them so that their height within the frame is consistent (target height/median height).

All of this preprocessing removes ‘noise’ from our data and allows our model to focus only on the fencer’s actions. This way, the model doesn’t have to account for factors like distance from the camera and outliers. Testing shows that this preprocessing allows us to decrease the complexity of our model significantly.

One final addition to our data is that we’ll attach when individual lights go off. Using code similar to the one used to originally collect our data, we’ll detect when the lights go off and append a 1 or a 0 to signify when the lights go off. Since we have no way of detecting the blade, this should make it easier for the model to differentiate between a miss and an actual hit on the fencer.

After that, we’ll standardize our data to fit between 0 and 1 and round this to 3 decimal places to prevent overfitting.

Jason Mo is a Student Ambassador in the Inspirit AI Student Ambassadors
Program. Inspirit AI is a pre-collegiate enrichment program that exposes curious high school students globally to AI through live online classes. Learn more at
https://www.inspiritai.com/.

https://thejasonmo.medium.com/pose-estimation-and-preprocessing-for-an-ai-fencing-referee-e63515a55dbd

Previous
Previous

High School Student Blog: Project on Self Driving Cars

Next
Next

AI in Art: Completing AI Art Projects in High School