gte-large huggingface.co api & thenlper gte-large github AI Model

Introduction of gte-large

Model Details of gte-large

gte-large

General Text Embeddings (GTE) model. Towards General Text Embeddings with Multi-stage Contrastive Learning

The GTE models are trained by Alibaba DAMO Academy. They are mainly based on the BERT framework and currently offer three different sizes of models, including GTE-large , GTE-base , and GTE-small . The GTE models are trained on a large-scale corpus of relevance text pairs, covering a wide range of domains and scenarios. This enables the GTE models to be applied to various downstream tasks of text embeddings, including information retrieval , semantic textual similarity , text reranking , etc.

Metrics

We compared the performance of the GTE models with other popular text embedding models on the MTEB benchmark. For more detailed comparison results, please refer to the MTEB leaderboard .

Model Name	Model Size (GB)	Dimension	Sequence Length	Average (56)	Clustering (11)	Pair Classification (3)	Reranking (4)	Retrieval (15)	STS (10)	Summarization (1)	Classification (12)
gte-large	0.67	1024	512	63.13	46.84	85.00	59.13	52.22	83.35	31.66	73.33
gte-base	0.22	768	512	62.39	46.2	84.57	58.61	51.14	82.3	31.17	73.01
e5-large-v2	1.34	1024	512	62.25	44.49	86.03	56.61	50.56	82.05	30.19	75.24
e5-base-v2	0.44	768	512	61.5	43.80	85.73	55.91	50.29	81.05	30.28	73.84
gte-small	0.07	384	512	61.36	44.89	83.54	57.7	49.46	82.07	30.42	72.31
text-embedding-ada-002	-	1536	8192	60.99	45.9	84.89	56.32	49.25	80.97	30.8	70.93
e5-small-v2	0.13	384	512	59.93	39.92	84.67	54.32	49.04	80.39	31.16	72.94
sentence-t5-xxl	9.73	768	512	59.51	43.72	85.06	56.42	42.24	82.63	30.08	73.42
all-mpnet-base-v2	0.44	768	514	57.78	43.69	83.04	59.36	43.81	80.28	27.49	65.07
sgpt-bloom-7b1-msmarco	28.27	4096	2048	57.59	38.93	81.9	55.65	48.22	77.74	33.6	66.19
all-MiniLM-L12-v2	0.13	384	512	56.53	41.81	82.41	58.44	42.69	79.8	27.9	63.21
all-MiniLM-L6-v2	0.09	384	512	56.26	42.35	82.37	58.04	41.95	78.9	30.81	63.05
contriever-base-msmarco	0.44	768	512	56.00	41.1	82.54	53.14	41.88	76.51	30.36	66.68
sentence-t5-base	0.22	768	512	55.27	40.21	85.18	53.09	33.63	81.14	31.39	69.81

Usage

Code example

import torch.nn.functional as F
from torch import Tensor
from transformers import AutoTokenizer, AutoModel

def average_pool(last_hidden_states: Tensor,
                 attention_mask: Tensor) -> Tensor:
    last_hidden = last_hidden_states.masked_fill(~attention_mask[..., None].bool(), 0.0)
    return last_hidden.sum(dim=1) / attention_mask.sum(dim=1)[..., None]

input_texts = [
    "what is the capital of China?",
    "how to implement quick sort in python?",
    "Beijing",
    "sorting algorithms"
]

tokenizer = AutoTokenizer.from_pretrained("thenlper/gte-large")
model = AutoModel.from_pretrained("thenlper/gte-large")

# Tokenize the input texts
batch_dict = tokenizer(input_texts, max_length=512, padding=True, truncation=True, return_tensors='pt')

outputs = model(**batch_dict)
embeddings = average_pool(outputs.last_hidden_state, batch_dict['attention_mask'])

# (Optionally) normalize embeddings
embeddings = F.normalize(embeddings, p=2, dim=1)
scores = (embeddings[:1] @ embeddings[1:].T) * 100
print(scores.tolist())

Use with sentence-transformers:

from sentence_transformers import SentenceTransformer
from sentence_transformers.util import cos_sim

sentences = ['That is a happy person', 'That is a very happy person']

model = SentenceTransformer('thenlper/gte-large')
embeddings = model.encode(sentences)
print(cos_sim(embeddings[0], embeddings[1]))

Limitation

This model exclusively caters to English texts, and any lengthy texts will be truncated to a maximum of 512 tokens.

Citation

If you find our paper or models helpful, please consider citing them as follows:

@article{li2023towards,
  title={Towards general text embeddings with multi-stage contrastive learning},
  author={Li, Zehan and Zhang, Xin and Zhang, Yanzhao and Long, Dingkun and Xie, Pengjun and Zhang, Meishan},
  journal={arXiv preprint arXiv:2308.03281},
  year={2023}
}

Runs of thenlper gte-large on huggingface.co

611.9K

Total runs

24-hour runs

-39.6K

3-day runs

-100.3K

7-day runs

-709.5K

30-day runs

More Information About gte-large huggingface.co Model

More gte-large license Visit here:

https://choosealicense.com/licenses/mit

gte-large huggingface.co

gte-large huggingface.co is an AI model on huggingface.co that provides gte-large's model effect (), which can be used instantly with this thenlper gte-large model. huggingface.co supports a free trial of the gte-large model, and also provides paid use of the gte-large. Support call gte-large model through api, including Node.js, Python, http.

gte-large huggingface.co Url

https://huggingface.co/thenlper/gte-large

thenlper gte-large online free

gte-large huggingface.co is an online trial and call api platform, which integrates gte-large's modeling effects, including api services, and provides a free online trial of gte-large, you can try gte-large online for free by clicking the link below.

thenlper gte-large online free url in huggingface.co:

https://huggingface.co/thenlper/gte-large

gte-large install

gte-large is an open source model from GitHub that offers a free installation service, and any user can find gte-large on GitHub to install. At the same time, huggingface.co provides the effect of gte-large install, users can directly use gte-large installed effect in huggingface.co for debugging and trial. It also supports api for free installation.

gte-large install url in huggingface.co:

https://huggingface.co/thenlper/gte-large

huggingface.co

thenlper/gte-small

Total runs: 5.3M

Run Growth: 3.6M

Growth Rate: 67.60%

Updated: November 16 2024

huggingface.co

thenlper/gte-base

Total runs: 1.3M

Run Growth: -113.3K

Growth Rate: -8.98%

Updated: November 16 2024

huggingface.co

thenlper/gte-base-zh

Total runs: 46.1K

Run Growth: 3.1K

Growth Rate: 6.62%

Updated: February 05 2024

huggingface.co

thenlper/gte-large-zh

Total runs: 23.0K

Run Growth: 11.9K

Growth Rate: 51.74%

Updated: February 05 2024

huggingface.co

thenlper/gte-small-zh

Total runs: 670

Run Growth: -19

Growth Rate: -2.81%

Updated: May 19 2024

thenlper / gte-large

Introduction of gte-large

Model Details of gte-large

gte-large

Metrics

Usage

Limitation

Citation

Runs of thenlper gte-large on huggingface.co

More Information About gte-large huggingface.co Model

More gte-large license Visit here:

gte-large huggingface.co

gte-large huggingface.co Url

thenlper gte-large online free

thenlper gte-large online free url in huggingface.co:

gte-large install

gte-large install url in huggingface.co:

Url of gte-large

gte-large huggingface.co Url

Provider of gte-large huggingface.co

Other API from thenlper