Salesforce / dialogstudio-t5-base-v1.0

huggingface.co
Total runs: 53
24-hour runs: 0
7-day runs: -112
30-day runs: -126
Model's Last Updated: September 08 2023
text2text-generation

Introduction of dialogstudio-t5-base-v1.0

Model Details of dialogstudio-t5-base-v1.0

Model Card for DialogStudio-T5 base

drawing

Table of Contents

  1. TL;DR
  2. Model Details
  3. Usage
  4. Uses
  5. Bias, Risks, and Limitations
  6. Training Details
  7. Evaluation
  8. Environmental Impact
  9. Citation
  10. Model Card Authors

TL;DR

If you already know T5 and Flan-T5, DialogStudio-T5 is better at many things. With the same number of parameters, the models are fine-tuned from a selected amount of dialogues from DialogStudio and also 1000 additional tasks.

Disclaimer : Content from this model card are modified from contents written by the Hugging Face team, and parts of it were copy pasted from the T5 model card and Flan-T5 model card .

Follow the DialogStudio GitHub repository for latest information.

Model Details

Data

We sample a small amount of dialogues from each commercial supported dataset under three categories of DialogStudio , i.e., KG-Dial, TOD and Open-Domain dialogues. Additionally, we sample at most 150 examples for each non-translation task from FLAN .

Note that this model version 1.0 does not incorporate datasets utilized for training large-scale models (>=7B) like Alpaca, ShareGPT, GPT4ALL, UltraChat from OpenAI's 'GPT-3.5/4', or other datasets such as OASST1 and WizardCoder.

drawing

Model Description
  • Model type: Language model
  • Language(s) (NLP): English, Spanish, Japanese, Persian, Hindi, French, Chinese, Bengali, Gujarati, German, Telugu, Italian, Arabic, Polish, Tamil, Marathi, Malayalam, Oriya, Panjabi, Portuguese, Urdu, Galician, Hebrew, Korean, Catalan, Thai, Dutch, Indonesian, Vietnamese, Bulgarian, Filipino, Central Khmer, Lao, Turkish, Russian, Croatian, Swedish, Yoruba, Kurdish, Burmese, Malay, Czech, Finnish, Somali, Tagalog, Swahili, Sinhala, Kannada, Zhuang, Igbo, Xhosa, Romanian, Haitian, Estonian, Slovak, Lithuanian, Greek, Nepali, Assamese, Norwegian
  • License: Apache 2.0
  • Related Models: All DialogStudio-T5 Checkpoints
  • Resources for more information:
  • Maximum model length: :
    • Maximum input length: 1200
    • Maximum output length: 256
  • Training formats:
    • We process dialogue data into below input format :
      • With instruction and external knowledge: Instruction: your instruction <USER> user utterance 1 <SYSTEM> system utterance 1 ... <USER> user utterance N <EXTERNAL KNOWLEDGE> your external knowledge
      • Without instruction: <USER> user utterance 1 <SYSTEM> system utterance 1 ... <USER> user utterance N <EXTERNAL KNOWLEDGE> your external knowledge
      • Without external knowledge: Instruction: your instruction <USER> user utterance 1 <SYSTEM> system utterance 1 ... <USER> user utterance N
      • Without both: <USER> user utterance 1 <SYSTEM> system utterance 1 ... <USER> user utterance N
      • Note: output is final the system response; <USER> , <SYSTEM> and <EXTERNAL KNOWLEDGE> are special tokens
    • For sampled FLAN data:
      • We follow their original data format, i.e., we did not set special tokens to separate in-context learning examples.
    • In summary:
      • We recommend you use our format and add our special tokens (such as <USER> and <SYSTEM> ) to get better performance. However, you may not necessary need to exactly follow our format if you do not observe random behavios.
      • We found that T5 model series such as Flan-t5 and DialogStudio-T5 may generate repetitive tokens during inference. If you find such repetition issues, you can set the repetition_penalty in model.generate(), such as 1.5, to mitigate them. Note that repetition_penalty=1.0 by default.

Usage

Find below some example scripts on how to use the model in transformers :

Using the Pytorch model
Running the model on a CPU
Click to expand

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

tokenizer = AutoTokenizer.from_pretrained("Salesforce/dialogstudio-t5-base-v1.0")
model = AutoModelForSeq2SeqLM.from_pretrained("Salesforce/dialogstudio-t5-base-v1.0")

input_text = "Answer the following yes/no question by reasoning step-by-step. Can you write 200 words in a single tweet?"
input_ids = tokenizer(input_text, return_tensors="pt").input_ids

outputs = model.generate(input_ids, max_new_tokens=256)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Running the model on a GPU
Click to expand
# pip install accelerate
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

tokenizer = AutoTokenizer.from_pretrained("Salesforce/dialogstudio-t5-base-v1.0")
model = AutoModelForSeq2SeqLM.from_pretrained("Salesforce/dialogstudio-t5-base-v1.0", device_map="auto")

input_text = "Answer the following yes/no question by reasoning step-by-step. Can you write 200 words in a single tweet?"
input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to("cuda")

outputs = model.generate(input_ids, max_new_tokens=256)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Running the model on a GPU using different precisions
FP16
Click to expand
# pip install accelerate
import torch
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

tokenizer = AutoTokenizer.from_pretrained("Salesforce/dialogstudio-t5-base-v1.0")
model = AutoModelForSeq2SeqLM.from_pretrained("Salesforce/dialogstudio-t5-base-v1.0", device_map="auto", torch_dtype=torch.float16)

input_text = "Answer the following yes/no question by reasoning step-by-step. Can you write 200 words in a single tweet?"
input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to("cuda")

outputs = model.generate(input_ids, max_new_tokens=256)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
INT8
Click to expand
# pip install bitsandbytes accelerate
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

tokenizer = AutoTokenizer.from_pretrained("Salesforce/dialogstudio-t5-base-v1.0")
model = AutoModelForSeq2SeqLM.from_pretrained("Salesforce/dialogstudio-t5-base-v1.0", device_map="auto", load_in_8bit=True)

input_text = "Answer the following yes/no question by reasoning step-by-step. Can you write 200 words in a single tweet?"
input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to("cuda")

outputs = model.generate(input_ids, max_new_tokens=256)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Uses

Direct Use and Downstream Use

The primary use is research on language models, including: research on zero-shot NLP tasks and in-context few-shot learning NLP tasks, such as dialogue response generation, reasoning, and question answering; advancing fairness and safety research, and understanding limitations of current large language models

Out-of-Scope Use

More information needed.

Bias, Risks, and Limitations

The information below in this section are copied and modified from Flan-T5's models card:

Language models, including DialogStudio-T5, can potentially be used for language generation in a harmful way, according to Rae et al. (2021). DialogStudio-T5 should not be used directly in any application, without a prior assessment of safety and fairness concerns specific to the application.

Ethical considerations and risks

DialogStudio-T5 is fine-tuned on a large corpus of text data that was not filtered for explicit content or assessed for existing biases. As a result the model itself is potentially vulnerable to generating equivalently inappropriate content or replicating inherent biases in the underlying data.

Known Limitations

DialogStudio-T5 has not been tested in real world applications.

Sensitive Use:

DialogStudio-T5 should not be applied for any unacceptable use cases, e.g., generation of abusive speech.

Training Details

Training Data

We sample a small amount of dialogues from each commercial supported dataset under three categories of DialogStudio , i.e., KG-Dial, TOD and Open-Domain dialogues. Additionally, we sample at most 150 examples for each non-translation task from FLAN .

Note:

Model Version 1.0 is built on small-scale pre-trained models, this version does not incorporate datasets utilized for training large-scale models (>=7B) like Alpaca, ShareGPT, GPT4ALL, UltraChat from OpenAI's 'GPT-3.5/4', or other datasets such as OASST1 and WizardCoder. As a result, it has certain limitations in terms of writing and creative capabilities. Our initial focus is to update the model versions to enhance existing abilities. Further improvements, including expansion of other capabilities, are part of our roadmap and will be responsive to community requests.

See above Training formats: for details of the training formats.

Training Procedure

These models are based on Flan-T5 and are fine-tuned with instructions for better zero-shot and few-shot performance. There is one fine-tuned DialogStudio model per T5 model size.

The model has been trained on 16 A100 GPUs, each with 40G memory, using public transformer codebase.

Evaluation

Testing Data, Factors & Metrics

The authors evaluated the model on several dialogue tasks and general tasks such as 0-shot/5-shot MMLU and 3-shot BBH.

Results

For full results for DialogStudio, see the research paper .

Environmental Impact

More information needed.

Citation

BibTeX:

@misc{zhang2023dialogstudio,
      title={DialogStudio: Towards Richest and Most Diverse Unified Dataset Collection for Conversational AI}, 
      author={Jianguo Zhang and Kun Qian and Zhiwei Liu and Shelby Heinecke and Rui Meng and Ye Liu and Zhou Yu and and Huan Wang and Silvio Savarese and Caiming Xiong},
      year={2023},
      eprint={2307.10172},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Runs of Salesforce dialogstudio-t5-base-v1.0 on huggingface.co

53
Total runs
0
24-hour runs
-111
3-day runs
-112
7-day runs
-126
30-day runs

More Information About dialogstudio-t5-base-v1.0 huggingface.co Model

More dialogstudio-t5-base-v1.0 license Visit here:

https://choosealicense.com/licenses/apache-2.0

dialogstudio-t5-base-v1.0 huggingface.co

dialogstudio-t5-base-v1.0 huggingface.co is an AI model on huggingface.co that provides dialogstudio-t5-base-v1.0's model effect (), which can be used instantly with this Salesforce dialogstudio-t5-base-v1.0 model. huggingface.co supports a free trial of the dialogstudio-t5-base-v1.0 model, and also provides paid use of the dialogstudio-t5-base-v1.0. Support call dialogstudio-t5-base-v1.0 model through api, including Node.js, Python, http.

dialogstudio-t5-base-v1.0 huggingface.co Url

https://huggingface.co/Salesforce/dialogstudio-t5-base-v1.0

Salesforce dialogstudio-t5-base-v1.0 online free

dialogstudio-t5-base-v1.0 huggingface.co is an online trial and call api platform, which integrates dialogstudio-t5-base-v1.0's modeling effects, including api services, and provides a free online trial of dialogstudio-t5-base-v1.0, you can try dialogstudio-t5-base-v1.0 online for free by clicking the link below.

Salesforce dialogstudio-t5-base-v1.0 online free url in huggingface.co:

https://huggingface.co/Salesforce/dialogstudio-t5-base-v1.0

dialogstudio-t5-base-v1.0 install

dialogstudio-t5-base-v1.0 is an open source model from GitHub that offers a free installation service, and any user can find dialogstudio-t5-base-v1.0 on GitHub to install. At the same time, huggingface.co provides the effect of dialogstudio-t5-base-v1.0 install, users can directly use dialogstudio-t5-base-v1.0 installed effect in huggingface.co for debugging and trial. It also supports api for free installation.

dialogstudio-t5-base-v1.0 install url in huggingface.co:

https://huggingface.co/Salesforce/dialogstudio-t5-base-v1.0

Url of dialogstudio-t5-base-v1.0

dialogstudio-t5-base-v1.0 huggingface.co Url

Provider of dialogstudio-t5-base-v1.0 huggingface.co

Salesforce
ORGANIZATIONS

Other API from Salesforce

huggingface.co

Total runs: 42.2K
Run Growth: 23.3K
Growth Rate: 55.30%
Updated: November 23 2021
huggingface.co

Total runs: 35.8K
Run Growth: -56.7K
Growth Rate: -158.24%
Updated: November 23 2021
huggingface.co

Total runs: 7.4K
Run Growth: 1.3K
Growth Rate: 17.09%
Updated: February 19 2024
huggingface.co

Total runs: 931
Run Growth: 358
Growth Rate: 38.45%
Updated: October 19 2021
huggingface.co

Total runs: 850
Run Growth: -1.1K
Growth Rate: -131.41%
Updated: August 04 2023
huggingface.co

Total runs: 370
Run Growth: -56
Growth Rate: -15.14%
Updated: August 04 2023
huggingface.co

Total runs: 178
Run Growth: -207
Growth Rate: -118.97%
Updated: September 24 2024
huggingface.co

Total runs: 16
Run Growth: -8
Growth Rate: -50.00%
Updated: November 11 2022