Salesforce / codet5-small

huggingface.co
Total runs: 35.7K
24-hour runs: 0
7-day runs: 151
30-day runs: -8.8K
Model's Last Updated: January 21 2025
text2text-generation

Introduction of codet5-small

Model Details of codet5-small

CodeT5 (small-sized model)

Pre-trained CodeT5 model. It was introduced in the paper CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Understanding and Generation by Yue Wang, Weishi Wang, Shafiq Joty, Steven C.H. Hoi and first released in this repository .

Disclaimer: The team releasing CodeT5 did not write a model card for this model so this model card has been written by the Hugging Face team (more specifically, nielsr ).

Model description

From the abstract:

"We present CodeT5, a unified pre-trained encoder-decoder Transformer model that better leverages the code semantics conveyed from the developer-assigned identifiers. Our model employs a unified framework to seamlessly support both code understanding and generation tasks and allows for multi-task learning. Besides, we propose a novel identifier-aware pre-training task that enables the model to distinguish which code tokens are identifiers and to recover them when they are masked. Furthermore, we propose to exploit the user-written code comments with a bimodal dual generation task for better NL-PL alignment. Comprehensive experiments show that CodeT5 significantly outperforms prior methods on understanding tasks such as code defect detection and clone detection, and generation tasks across various directions including PL-NL, NL-PL, and PL-PL. Further analysis reveals that our model can better capture semantic information from code."

Intended uses & limitations

This repository contains the pre-trained model only, so you can use this model for masked span prediction, as shown in the code example below. However, the main use of this model is to fine-tune it for a downstream task of interest, such as:

  • code summarization
  • code generation
  • code translation
  • code refinement
  • code defect detection
  • code clone detection.

See the model hub to look for fine-tuned versions on a task that interests you.

How to use

Here is how to use this model:

from transformers import RobertaTokenizer, T5ForConditionalGeneration

tokenizer = RobertaTokenizer.from_pretrained('Salesforce/codet5-small')
model = T5ForConditionalGeneration.from_pretrained('Salesforce/codet5-small')

text = "def greet(user): print(f'hello <extra_id_0>!')"
input_ids = tokenizer(text, return_tensors="pt").input_ids

# simply generate a single sequence
generated_ids = model.generate(input_ids, max_length=10)
print(tokenizer.decode(generated_ids[0], skip_special_tokens=True))
# this prints "user: {user.name}"
Training data

The CodeT5 model was pretrained on CodeSearchNet Husain et al., 2019 . Additionally, the authors collected two datasets of C/CSharp from BigQuery1 to ensure that all downstream tasks have overlapped programming languages with the pre-training data. In total, around 8.35 million instances are used for pretraining.

Training procedure
Preprocessing

This model uses a code-specific BPE (Byte-Pair Encoding) tokenizer. One can prepare text (or code) for the model using RobertaTokenizer, with the files from this repository.

Evaluation results

For evaluation results on several downstream benchmarks, we refer to the paper.

BibTeX entry and citation info
@misc{wang2021codet5,
      title={CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Understanding and Generation}, 
      author={Yue Wang and Weishi Wang and Shafiq Joty and Steven C. H. Hoi},
      year={2021},
      eprint={2109.00859},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Runs of Salesforce codet5-small on huggingface.co

35.7K
Total runs
0
24-hour runs
41
3-day runs
151
7-day runs
-8.8K
30-day runs

More Information About codet5-small huggingface.co Model

More codet5-small license Visit here:

https://choosealicense.com/licenses/apache-2.0

codet5-small huggingface.co

codet5-small huggingface.co is an AI model on huggingface.co that provides codet5-small's model effect (), which can be used instantly with this Salesforce codet5-small model. huggingface.co supports a free trial of the codet5-small model, and also provides paid use of the codet5-small. Support call codet5-small model through api, including Node.js, Python, http.

Salesforce codet5-small online free

codet5-small huggingface.co is an online trial and call api platform, which integrates codet5-small's modeling effects, including api services, and provides a free online trial of codet5-small, you can try codet5-small online for free by clicking the link below.

Salesforce codet5-small online free url in huggingface.co:

https://huggingface.co/Salesforce/codet5-small

codet5-small install

codet5-small is an open source model from GitHub that offers a free installation service, and any user can find codet5-small on GitHub to install. At the same time, huggingface.co provides the effect of codet5-small install, users can directly use codet5-small installed effect in huggingface.co for debugging and trial. It also supports api for free installation.

codet5-small install url in huggingface.co:

https://huggingface.co/Salesforce/codet5-small

Url of codet5-small

codet5-small huggingface.co Url

Provider of codet5-small huggingface.co

Salesforce
ORGANIZATIONS

Other API from Salesforce

huggingface.co

Total runs: 41.7K
Run Growth: -9.1K
Growth Rate: -21.71%
Updated: January 21 2025
huggingface.co

Total runs: 8.5K
Run Growth: 1.6K
Growth Rate: 19.06%
Updated: February 19 2024
huggingface.co

Total runs: 3.3K
Run Growth: 621
Growth Rate: 18.59%
Updated: January 25 2025
huggingface.co

Total runs: 586
Run Growth: 510
Growth Rate: 87.03%
Updated: January 21 2025
huggingface.co

Total runs: 542
Run Growth: 197
Growth Rate: 36.35%
Updated: January 21 2025
huggingface.co

Total runs: 290
Run Growth: 1
Growth Rate: 0.34%
Updated: January 15 2025
huggingface.co

Total runs: 50
Run Growth: -54
Growth Rate: -108.00%
Updated: January 15 2025