By Ilse Arwert 26 April 2023

Novel Drone Controlling Methods

Initial project goal

Drones are a fascinating new technology and are becoming more commonplace by the day. The way they’re controlled is usually with a handheld joystick controller or a phone. In the initial project brief, I considered implementing drone controlling methods using body position (whole-body gesturing), eye-tracking or brainwave analysis.

The steps I originally envisioned were:

Figure out which of these methods would be most interesting to implement.
Figure out which kind of drone to use that is both programmable and within my budget, then acquire said drone.
Implement the chosen method, documenting the process in an instructional manner to encourage others to try this.

I quickly realized I wanted to do body position analysis, because I would not need any additional equipment for these and I felt that gesturing for a drone to move in specific ways was the most intuitive option of the list.

I considered several drone options, and after some research I ended up choosing the DJI Tello Drone because it is fairly cheap and programmable using Python through the DJITellopy library, available at https://github.com/damiafuentes/DJITelloPy. The available commands can be found on page 3-5 of the Tello User Guide (available via https://dl-cdn.ryzerobotics.com/downloads/Tello/Tello%20SDK%202.0%20User%20Guide.pdf).

Once this preliminary research was done and the drone arrived, I could start the implementation process.

Body Position Analysis implementation

I had never done anything with body position analysis before, so began by doing research into several options. I ended up choosing Mediapipe and OpenCV to recognize the positions due to their widespread use. I implemented this in such a way that I could use either pre-existing videos or webcam input:

if (vidName == "webcam"):
    cap = cv2.VideoCapture(0, cv2.CAP\_DSHOW)
else:
   cap = cv2.VideoCapture(vidName)

I wrote several body positions as commanding positions:

arms out - neutral
arms angled up - go up
arms angled down - go down
left arm out, right arm down - go left
right arm out, left arm down - go right

This was a fun mathematical exercise, as MediaPipe generates the locations of specific points (shoulder, elbow, wrist, …) in screen-relative coordinates and I had to extract that information to calculate the angles between, for example, the arm and the torso of the person in the video. I implemented a function calculating the angle between two body coordinates and the hinge between them. For example, angle_LS was the angle of the left shoulder, calculated using the position of the left hip (LH) and left elbow (LE). See the code snippet below.

Using these angles, I then inferred which command should be sent.

# get the angles at which the joints are bent right now
angle_LS = pe.calcAngle(LH, LE, LS)
angle_LE = pe.calcAngle(LS, LW, LE)
angle_RS = pe.calcAngle(RH, RE, RS)
angle_RE = pe.calcAngle(RS, RW, RE)

if ((110 < angle_LS) & (angle_LS < 180)):      # left arm up
    if ((110 < angle_RS) & (angle_RS < 180)):  #   and right arm up
        if(isFlying):
            return 'up 10cm'
        else:
            return 'takeoff'

Note the if (isFlying) statement in the above code snippet. The drone needed to do different actions depending on whether or not it was currently flying; for example, if it is not flying and the user gestures up, we need to send a takeoff command. If the drone is flying, that should be a move_up command instead. Therefore we track the current state of the drone and send the commands based on both the current state and the pose.

One major challenge was to figure out the frequency of generating commands, as glitches in the body position analysis would occasionally fire random commands that were not intended by the user. In order to address this and increase reliability, I implemented a best-out-of-three system to reduce the likelihood of such single-frame glitches impacting the commands that were being sent. Commands were added to a deque and only forwarded to the drone if it was a valid command and matched the latest two commands (see the code snippet below). This did introduce a minor reduction in drone responsiveness, as it would take several frames of a new position to update the commands, but improved the results enough (and was minor enough) that this was deemed an acceptable trade-off.

command = getCommand()
commandsTemp.append(command)
commandsTemp.popleft()

if((commandsTemp[0] == commandsTemp[1]) & (commandsTemp[1] == commandsTemp[2]) & (commandsTemp[0] != None)):
    r = commandsTemp[0]
    telloResponse = tello.command(r)

I should note that my implementation was helped massively by existing tutorials, mainly the one at https://techvidvan.com/tutorials/human-pose-estimation-opencv/ to help me get started with MediaPipe.

Connecting to the drone

I connected to the DJI Tello drone using the method described in their documentation. This is done in the tello.py file, which is available on my Github: https://github.com/ilse-arwert/telloControl/tree/merging-Tello-and-PoseEstimator.

Connecting the systems

The final step was to use the commands generated with the approach described above to actually control the drone. At first, I would simply send the commands to the drone as fast as they were being generated. This was in conflict with the way the drone expected to receive commands, as the commands were being sent in succession without waiting for the drone to finish executing the previous command, causing the commands to back up and the drone to not respond properly. I fixed this by causing the process to wait until the Tello drone would respond that the last command was finished.

if((commandsTemp[0] == commandsTemp[1]) & (commandsTemp[1] == commandsTemp[2]) & (commandsTemp[0] != None)):
    r = commandsTemp[0]
    telloResponse = tello.command(r)
    while (telloResponse != "ok"):
        time.sleep(1)

This worked, but did slow the command processing down due to all the waiting. In additon to this, because the Tello would, for example, go up 30 centimers and then pause to wait for another command (which might still be ‘up’) the flight pattern was quite jerky. However, by this time I unfortunately ran out of time for the project.

Future Work

In the future, it would be beneficial to figure out a way for the commands to be sent continuously, perhaps by cancelling existing commands before sending a new one, or instead of defining a set distance (like up 30cm) simply commanding up and letting that run until a new command was received.

It would also be great, if extra challenging, to implement the option to use the Tello’s own camera instead of a laptop webcam. The obvious challenges are that this reduces how far away the drone can fly from the user before this becomes unfeasible, but would make the interaction feel more immersive as opposed to gesturing at a laptop to make the drone move.

I think this work has definite potential as an intuitive drone controlling method, and could certainly be used for e.g. non-preprogrammed drone/dancer performances, which is where the original inspiration comes from. However, there is certainly still more work to be done before this will work well.

What I learned

Before this project, I had never flown a drone or interpreted a body position. Also, this increased my skill in linking different systems together and working around their idiosyncrasies, such as the interaction between the Tello drone wanting to finish the current command before receiving a new one and how that interacts with a continuously updating system like Mediapipe.

As part of the output of this project I’ve put the instructions, code and snippets together into a blog post for other programmers to peruse, as well as putting the full code base online at Github: https://github.com/ilse-arwert/telloControl/tree/merging-Tello-and-PoseEstimator. In this way, I fulfilled the stated project goals of

generating new knowledge; and
educating others on how to achieve this themselves.

Unfortunately, due to the limited scope of this project, I was unable to conduct an evaluation on how intuitive the current iteration of gestures is to the average user, which would be an interesting challenge for any future work! Luckily, changing the body position selection is fairly easy, so I just might do a mini-survey on my own time in the future. If I do, I’ll make another blog post and share that information as well!

By Ilse Arwert 24 April 2022

AI and ethics (forthcoming publication)

As essay I wrote with a group of researchers will be published later this year as the leading paper in a book on AI and Ethics.

Title: Facial Recognition in the Public Space: Challenges and Perspectives

Abstract: Due to technological advances, facial recognition and other AI systems are becoming more commonplace, a development which garners ethical and privacy-related concerns and has prompted the European Commission to develop the proposed AI Act of 2021. This study investigates whether current legislation, including the proposed AI Act, is enough to properly regulate the use of facial recognition systems in public spaces in the European Union. To this purpose, we outline the current status of EU legislation regarding these systems, examine several case studies of real-world use of such systems and the response thereto, and identify overarching ethical and legal issues that arise from these case studies. We find that currently, the ambiguous phrasing in legislation, combined with lack of enforcement of the need for consent, means that legislation is not sufficient to regulate the use of facial recognition in public spaces in the European Union. Therefore, based on the observations from the case studies, we make several recommendations that, when followed, encourage the use of facial recognition systems in public spaces in an ethical, legal and responsible way.

More information will be added to this page as it becomes publicly available.

Created as part of the Leiden University master Media Technology in conjunction with Münster University and Twente University

By Ilse Arwert 11 November 2021

Inherently Flawed

Robots and other artificial creatures are often made with a purpose in mind; either to do something humans cannot or do not want to do, or to do it better, or faster. But what if a robot is inherently flawed, built in such a way that it will never achieve its aim? Inherently Flawed is an exploration of trying to overcome an obstacle when the obstacle is you. We wanted to build a creature that, in the context of the Sisyphean myth, was Sisyphus and the mountain and the boulder.

We realized this creature in a simple form, built mainly out of wood to emphasize the mechanical nature of our creature: governed by natural physics rather than a hi-tech approximation thereof. We wanted to keep the visual fairly simple to emphasize the repetitiveness of the creature; even as it continuously struggles to surpass the obstacle it creates, it does not vary its methods, meaning that it will never succeed even if a more simple solution is not that far away.

We built our creature out of simple, natural materials to emphasize the mechanical simplicity and balance of our creature.

Created as part of the Leiden University master Media Technology

By Ilse Arwert 22 April 2021

Swarming Simulators (publication)

Quick links:

Link to the publication: Swarming as a Bird/Fish: Investigating the Effect of First-Person Perspective Simulation on Players’ Connectedness with Nature
More information about the GALA conference: Games and Learning Alliance Conference website

This research was originally conducted in collaboration with Imani Dap, Sotiris Piliouras and Jiaqi Li. The publication process was aided by Tessa Verhoeven and Ross Towns.

Abstract: During recent years, the need to promote environmental knowledge and pro-environmental behaviors has become more evident. The efforts towards effective environmental education include the use of simulations that aim to bring human players closer to the natural world. In the specific case of swarm simulators, they are often constructed in a two-dimensional space and experienced from a third-person perspective, lacking the immersive benefits of seeing a game world through the character’s eyes. In this study, we developed schooling and flocking simulators to examine whether playing the simulators can affect people’s understanding of swarming behavior and feeling of connectedness to nature, made quantifiable through pre- and post-simulator questionnaires. Experiencing the simulations in first-person perspective was found to increase feelings of kinship with animals, raising connectedness to nature. Additionally, a positive correlation was observed between people’s engagement in the simulation and the increase of connectedness to nature. These findings could be useful insights in designing serious games for raising awareness or behavioral change related to environmental education.

Links to the simulations (will run in web browser):

The video below shows the background, method and results of our research.

For more extensive information, please see the published paper.

In addition to designing a poster and creating this video, we wrote a paper detailing this research and submitted it to the Games and Learning Alliance Conference of 2023. After being accepted, we presented the paper in Tampere, Finland. It was then published in the conference proceedings.

Created as part of the Leiden University master Media Technology

By Ilse Arwert 19 February 2021

Light Pong

Light Pong is an electronic game based on ping pong, with lights instead of a ball. This is an original concept designed, built and programmed by Ilse Arwert, Imani Dap and Daniel Simu.

It has been featured on:

Link to GitHub repo: https://github.com/ilse-arwert/pingpong

Created as part of the Leiden University master Media Technology