Home AI News Revolutionary Dolly 2.0: Commercial Use and TRUE Open Source LLM

Revolutionary Dolly 2.0: Commercial Use and TRUE Open Source LLM

Introduction
Dolly 2.0: What is it?
Differences between Dolly 1 and Dolly 2
Databricks Dolly 15K: A High-Quality Human-Generated Data Set
How Databricks Collected the Data
Tasks Focused on in the Data Set
Running the Dolly 2.0 Model
Examples of Prompts and Responses
Conclusion
FAQs

Dolly 2.0: A Revolutionary Language Model

Language models have come a long way in recent years, and Dolly 2.0 is one of the most exciting models to hit the market. Created by Databricks, the Creators of Dolly 1, this new model is trained on a completely different dataset and is now available for both research and commercial purposes. With 12 billion parameters, Dolly 2.0 is a large language model Based on the EleutherAI GPT family, making it one of the most powerful models available today.

Differences between Dolly 1 and Dolly 2

While Dolly 1 was a great model in its own right, there are some significant differences between it and Dolly 2.0. For one, Dolly 2.0 is now licensed for commercial use, which is a huge AdVantage for businesses looking to leverage the power of language models. Additionally, Dolly 2.0 is based on the EleutherAI GPT family, which means it is not subject to the same licensing issues as other open-source language models like GPT-3.

Databricks Dolly 15K: A High-Quality Human-Generated Data Set

One of the most exciting things about Dolly 2.0 is the release of the Databricks Dolly 15K data set. This data set contains 15,000 high-quality human-generated prompt-response pairs specifically designed for instruction tuning large language models. Unlike other open-source language models that use chat GPT-generated data for training purposes, Databricks collected their own human-generated data set, which is now available for anyone to use, modify, and extend for any purpose, including commercial applications.

How Databricks Collected the Data

Databricks collected the data for the Dolly 15K data set using around 5,000 employees for data annotation and data collection purposes. The data set focuses on seven specific tasks, including open question and answer, closed question answer data set, information extraction from Wikipedia, summarization, brainstorming data sets, classification data sets, and creative writing data sets.

Tasks Focused on in the Data Set

The Databricks Dolly 15K data set is specifically designed to help train large language models to perform specific tasks. The tasks focused on in the data set include open question and answer, where there may not be a single correct answer; closed question answer data set; information extraction from Wikipedia; summarization of information from Wikipedia; brainstorming data sets; classification data sets; and creative writing data sets. While there is nothing related to programming in the data set, the model may still be able to perform some programming tasks due to its fine-tuning on the EleutherAI GPT family.

Running the Dolly 2.0 Model

Running the Dolly 2.0 model is relatively straightforward. You will need to import the required packages, define the model you want to use, and pass a prompt to the object to get a response. The model is available for download from the Hugging Face page, and the Databricks Dolly 15K data set is also available for download. You can run the model in Google Colab or on your local machine.

Examples of Prompts and Responses

Dolly 2.0 is capable of generating responses to a wide range of prompts. Some examples include providing instructions for a given exercise, coming up with a step-by-step plan to dominate the world, explaining how the moon landing was fake, and why birds are not real. While the model is not restricted in any way, it does have a tendency to generate funny responses and may not always follow the instructions given.

Conclusion

Dolly 2.0 is a powerful language model that is capable of generating responses to a wide range of prompts. While it may not always follow the instructions given, it is still an exciting model that is available for both research and commercial purposes. The release of the Databricks Dolly 15K data set is also a significant development, as it provides a high-quality human-generated data set for training large language models.

FAQs

Q: What is Dolly 2.0? A: Dolly 2.0 is a large language model created by Databricks that is trained on a completely different dataset than Dolly 1. It is licensed for commercial use and has 12 billion parameters.

Q: What is the Databricks Dolly 15K data set? A: The Databricks Dolly 15K data set is a high-quality human-generated prompt-response data set specifically designed for instruction tuning large language models.

Q: What tasks are focused on in the Databricks Dolly 15K data set? A: The Databricks Dolly 15K data set focuses on seven specific tasks, including open question and answer, closed question answer data set, information extraction from Wikipedia, summarization, brainstorming data sets, classification data sets, and creative writing data sets.

Q: Can Dolly 2.0 be used for programming tasks? A: While there is nothing related to programming in the Databricks Dolly 15K data set, the model may still be able to perform some programming tasks due to its fine-tuning on the EleutherAI GPT family.

Q: What are some examples of prompts and responses generated by Dolly 2.0? A: Examples of prompts and responses generated by Dolly 2.0 include providing instructions for a given exercise, coming up with a step-by-step plan to dominate the world, explaining how the moon landing was fake, and why birds are not real.

Create Your Novel's First Draft in 20 Minutes with ChatGPT

Exploring the Definition of Intelligence in AI: Thinking vs Acting, Humanly vs Rationally