Contact Us
  • Expert view
  • Industry trends

Cirium HackAI 2024 – A hackathon about evaluating large language models for use in the aviation industry


The two-day hackathon challenged students to improve a large language model’s performance on an aviation specific benchmark. 

By Alex Brooker, VP R&D and Konstantinos Maragkos, Lead Data Scientist at Cirium

Alex Brooker
Alex Brooker

Konstantinos Maragkos

Why benchmark Large Language Models on aviation tasks?

Organizations are increasing their adoption efforts for Generative AI, in both internal and client-facing use cases. This new technology is highly disruptive, introducing efficiencies when leveraged properly. With it, however, stakeholders find that new sets of challenges have manifested. For example, the output of such models is highly unstructured, and evaluating its quality becomes a complex process. Particularly in domain specific applications, such a model is not only required to demonstrate general knowledge and reasoning skills, but also prove it possesses solid understanding of the industry. As we explore this new frontier in Cirium, it is crucial that our models are aviation experts and can enable decision making in a valid, transparent and impactful manner.

What was the hackathon about?

Cirium, in collaboration with the AI Society at the University of Southampton, organized a two-day hackathon, where students were challenged to improve a large language model’s performance on an aviation specific benchmark.

The participants were required to use open access models (Mistral and Llama2) to provide answers to a set of multiple-choice questions; the five teams that would achieve the best model score would be the finalists.

Due to the nature of this challenge, the students were limited to 7 billion parameter versions of the aforementioned models, quantized for further efficiency. While these model versions suffer from decreased accuracy, reasoning abilities and overall output quality, they allowed more room for experimentation and potential improvement. Furthermore, the participants were not allowed to fine-tune said models. Instead, they were required to use a combination of techniques, such as prompt engineering and retrieval augmentation (RAG), that they would also have to iteratively optimise. To keep things interesting, a new set of questions were presented on the second day. This time, they featured more multiple-choice options and some purposefully provided only wrong answers.

What were the key takeaways?

The teams came up with many interesting approaches; but all of them proved one important concept. Effective prompt engineering, in combination with a good retrieval system, can offer significant improvements. Even in this case, where supposedly weaker versions of large language models were used to answer industry specific questions.

What is the outcome of this event?

Cirium has generated an evaluation benchmark that will be made publicly available. We strongly believe that it will be a significant contribution to the existing pool of benchmark datasets, especially since no other known test exists with focus on the aviation domain. In addition, Cirium is best positioned to produce such a benchmark, leveraging hundreds of years of combined Subject Matter Expert experience and the most complete aviation data in the sector.

In Cirium we are also adopting Generative AI, including large language models. Naturally, this requires us to put our models to the test, in order to ensure quality and transparency.

This benchmark is the fruit of such efforts. Last but not least, at Cirium, we are able to further enhance our benchmark with proprietary questions, allowing us to test in-house fine-tuned models on bespoke use cases.

Interested in the latest Generative AI technology?

Contact the team to receive the latest updates or register your interest to work with our labs team.

You may also like …

SHOW MORE ARTICLES
Cirium HackAI 2024 – hackathon
Beyond the chatbot – How AI can transform aviation, and the challenges it faces

February 2024

AI has the potential to transform the aviation industry in many ways, such as improving safety, efficiency, and customer experience.

Cirium HackAI 2024 – hackathon
AI in aviation Q&A: Kevin Hightower (VP of Product Management) and Niha Shaikh (VP of Product)

September 2023

In this Q&A, Kevin Hightower, and Niha Shaikh, take a look at what AI is, and how it might help…

Cirium HackAI 2024 – hackathon
How Cirium is advancing Unmanned Traffic Management

September 2022

Cirium is supporting ground-breaking data research and developing technological solutions to advance the field of UTM and drive new drone…

Cirium HackAI 2024 – hackathon
How technology and aviation analytics can transform our industry

June 2022

With many industries reaping the benefits of digital transformation, Cirium’s VP of Product for Cirium Sky, Niha Shaikh, discusses how…

RELX logo