The model is finetuned from IgBert-unpaired using paired antibody sequences from the
Observed Antibody Space
.
Use
The model and tokeniser can be loaded using the
transformers
library
from transformers import BertModel, BertTokenizer
tokeniser = BertTokenizer.from_pretrained("Exscientia/IgBert", do_lower_case=False)
model = BertModel.from_pretrained("Exscientia/IgBert", add_pooling_layer=False)
The tokeniser is used to prepare batch inputs
# heavy chain sequences
sequences_heavy = [
"VQLAQSGSELRKPGASVKVSCDTSGHSFTSNAIHWVRQAPGQGLEWMGWINTDTGTPTYAQGFTGRFVFSLDTSARTAYLQISSLKADDTAVFYCARERDYSDYFFDYWGQGTLVTVSS",
"QVQLVESGGGVVQPGRSLRLSCAASGFTFSNYAMYWVRQAPGKGLEWVAVISYDGSNKYYADSVKGRFTISRDNSKNTLYLQMNSLRTEDTAVYYCASGSDYGDYLLVYWGQGTLVTVSS"
]
# light chain sequences
sequences_light = [
"EVVMTQSPASLSVSPGERATLSCRARASLGISTDLAWYQQRPGQAPRLLIYGASTRATGIPARFSGSGSGTEFTLTISSLQSEDSAVYYCQQYSNWPLTFGGGTKVEIK",
"ALTQPASVSGSPGQSITISCTGTSSDVGGYNYVSWYQQHPGKAPKLMIYDVSKRPSGVSNRFSGSKSGNTASLTISGLQSEDEADYYCNSLTSISTWVFGGGTKLTVL"
]
# The tokeniser expects input of the form ["V Q ... S S [SEP] E V ... I K", ...]
paired_sequences = []
for sequence_heavy, sequence_light inzip(sequences_heavy, sequences_light):
paired_sequences.append(' '.join(sequence_heavy)+' [SEP] '+' '.join(sequence_light))
tokens = tokeniser.batch_encode_plus(
paired_sequences,
add_special_tokens=True,
pad_to_max_length=True,
return_tensors="pt",
return_special_tokens_mask=True
)
Note that the tokeniser adds a
[CLS]
token at the beginning of each paired sequence, a
[SEP]
token at the end of each paired sequence and pads using the
[PAD]
token. For example a batch containing sequences
V Q L [SEP] E V V
,
Q V [SEP] A L
will be tokenised to
[CLS] V Q L [SEP] E V V [SEP]
and
[CLS] Q V [SEP] A L [SEP] [PAD] [PAD]
.
Sequence embeddings are generated by feeding tokens through the model
To obtain a sequence representation, the residue tokens can be averaged over like so
import torch
# mask special tokens before summing over embeddings
residue_embeddings[tokens["special_tokens_mask"] == 1] = 0
sequence_embeddings_sum = residue_embeddings.sum(1)
# average embedding by dividing sum by sequence lengths
sequence_lengths = torch.sum(tokens["special_tokens_mask"] == 0, dim=1)
sequence_embeddings = sequence_embeddings_sum / sequence_lengths.unsqueeze(1)
For sequence level fine-tuning the model can be loaded with a pooling head by setting
add_pooling_layer=True
and using
output.pooler_output
in the down-stream task.
Runs of Exscientia IgBert on huggingface.co
18.0K
Total runs
566
24-hour runs
1.3K
3-day runs
2.6K
7-day runs
9.4K
30-day runs
More Information About IgBert huggingface.co Model
IgBert huggingface.co is an AI model on huggingface.co that provides IgBert's model effect (), which can be used instantly with this Exscientia IgBert model. huggingface.co supports a free trial of the IgBert model, and also provides paid use of the IgBert. Support call IgBert model through api, including Node.js, Python, http.
IgBert huggingface.co is an online trial and call api platform, which integrates IgBert's modeling effects, including api services, and provides a free online trial of IgBert, you can try IgBert online for free by clicking the link below.
Exscientia IgBert online free url in huggingface.co:
IgBert is an open source model from GitHub that offers a free installation service, and any user can find IgBert on GitHub to install. At the same time, huggingface.co provides the effect of IgBert install, users can directly use IgBert installed effect in huggingface.co for debugging and trial. It also supports api for free installation.