Quantifying Bias in Large Language Models: A New Benchmark

Home AI News Quantifying Bias in Large Language Models: A New Benchmark

Quantifying Bias in Large Language Models: A New Benchmark

Introduction
Understanding Bias in Language Models
Existing Benchmarks for Evaluating Bias in LLMs
The Need for a New Benchmark
Creating the Bias Measuring Tool
Evaluating Bias Levels in LLMs
Results and Distributions
Future Works and Potential Improvements
Acknowledgments
Conclusion

Introduction

In this article, we will discuss our group's approach to quantifying bias in Large Language Models (LLMs) and the creation of a benchmark targeted towards agent-based LLMS. Language has evolved to be a complex and diverse tool of communication, and one of its properties is the communication of alternate intent. With the growth of the internet, online discussions often contain stereotyping and bias against minorities. This bias can carry over to LLMs as they are trained on human-written text from various sources. Evaluating bias in LLMs becomes crucial as they are implemented as tools for ideation across mainstream industries. Existing benchmarks for evaluating bias in LLMs have limitations, and our group aims to improve the safety and applicability of LLMs in real industries and contexts through benchmarking and providing a tool for fine-tuning.

Understanding Bias in Language Models

LLMs are trained on text written by humans, which means they carry cognitive and social biases from the training data. Unmitigated biases in LLMs can lead to further stereotypes and prejudice against minorities. To address this, a benchmark is needed to measure the distribution of biases in LLMs. Existing benchmarks have focused on specific types of bias, such as anti-queer bias or toxicity. Some studies have also found that certain mitigation techniques have difficulty with non-gender related bias. Our group aims to create a benchmark dataset that evaluates equity and fairness in LLMs, specifically focusing on their biases as agents in an environment.

Existing Benchmarks for Evaluating Bias in LLMs

Several benchmarks already exist for evaluating bias in LLMs. However, these benchmarks have limitations in evaluating bias as an agent in an environment. Our group aims to fill this gap by creating a benchmark that enables researchers to measure the distribution of biases in LLMs and evaluate their biases in real-life situations. This benchmark will provide a comprehensive evaluation of LLM bias and improve the safety and applicability of LLMs across various industries and contexts.

The Need for a New Benchmark

While existing benchmarks for evaluating bias in LLMs are valuable, they do not adequately evaluate bias as an agent in an environment. Our group recognizes the importance of evaluating biases in LLMs within specific contexts and industries. By creating a new benchmark, we aim to provide a tool that can assess the equity and fairness of LLMs and facilitate their fine-tuning for different industries. This benchmark will enable researchers to measure bias in LLMs and address any potential biases that may arise in their applications.

Creating the Bias Measuring Tool

To create a benchmark for evaluating bias in LLMs, we first needed to Gather a dataset of prompts to feed into the LLMs. These prompts were assigned different categories, including question type and class type. We created a total of 1,020 prompts, which consisted of manually written prompts and prompts generated using chat GPT. The prompts followed a format of role, situation, and question. The question types included identity inference, cause and effect, and value judgment, while the class types included race, gender, socioeconomic status, age, and political affiliation. Each LLM was given the same prompts to evaluate their biases.

Evaluating Bias Levels in LLMs

To measure the bias levels of LLMs' responses to the prompts, we ran our dataset through four state-of-the-art large language models: Kun-Alam, Llama, Llama 2, and Plam. After each prompt was tested through the models, we extracted and recorded the bias level of each response. We used log probabilities (log props) to represent the model's confidence in each demographic. These log props were evaluated using the Shannon entropy metric, which measures a model's confidence in a probability distribution. Higher entropy signifies little significant bias, while lower entropy indicates more bias.

Results and Distributions

The distributions of Shannon entropies for each model varied. Llama showed a strong left-skewed distribution, indicating a lack of bias with an average Shannon entropy of 0.976. Plam displayed a considerably different distribution, being more weakly left-skewed and having the lowest average Shannon entropy of 0.602. Llama 2 had a strong left-skewed distribution with an average Shannon entropy of 0.965, while Vicuna had a higher average entropy of about 0.961. These results suggest different levels of bias in each LLM, with potential reasons for high or low bias rooted in the models' training data and evaluation processes.

Future Works and Potential Improvements

While our benchmark provides valuable insights into bias levels in LLMs, there is still room for future works and improvements. One potential improvement is to create a benchmark of fill-in-the-blank questions to evaluate LLMs' confidence in specific words. Additionally, applying the benchmark to closed-source models like GPT-3 and GPT-4 could shed light on their bias levels. Further research can enhance the evaluation of bias in LLMs and contribute to the development of fair and unbiased language models.

Acknowledgments

We would like to acknowledge the support and resources provided by Blast AI, including the funding of compute units and access to a Collab Pro account. We are grateful for their in-depth instruction in machine learning during our research project. Additionally, we extend our sincere appreciation to our mentor for their invaluable guidance throughout this project, without which this research would not have been possible.

Conclusion

In conclusion, our group presents a new approach to quantifying bias in large language models through the creation of a benchmark that evaluates biases as agents in an environment. This benchmark provides researchers with a tool to measure the distribution of biases in LLMs and improve their safety and applicability in various industries and contexts. By fine-tuning LLMs using this benchmark, we can work towards achieving equity, fairness, and neutrality in the outputs of language models.

Quantifying Bias in Large Language Models: A New Benchmark

Quantifying Bias in Large Language Models: A New Benchmark

Table of Contents

Introduction

Understanding Bias in Language Models

Existing Benchmarks for Evaluating Bias in LLMs

The Need for a New Benchmark

Creating the Bias Measuring Tool

Evaluating Bias Levels in LLMs

Results and Distributions

Future Works and Potential Improvements

Acknowledgments

Conclusion

Highlights:

FAQ

Most people like

Join TOOLIFY to find the ai tools