cross-encoder / msmarco-MiniLM-L6-en-de-v1

huggingface.co
Total runs: 1.2K
24-hour runs: -147
7-day runs: -381
30-day runs: -513
Model's Last Updated: Diciembre 12 2024
text-classification

Introduction of msmarco-MiniLM-L6-en-de-v1

Model Details of msmarco-MiniLM-L6-en-de-v1

Cross-Encoder for MS MARCO - EN-DE

This is a cross-lingual Cross-Encoder model for EN-DE that can be used for passage re-ranking. It was trained on the MS Marco Passage Ranking task.

The model can be used for Information Retrieval: See SBERT.net Retrieve & Re-rank .

The training code is available in this repository, see train_script.py .

Usage with SentenceTransformers

When you have SentenceTransformers installed, you can use the model like this:

from sentence_transformers import CrossEncoder
model = CrossEncoder('model_name', max_length=512)
query = 'How many people live in Berlin?'
docs = ['Berlin has a population of 3,520,031 registered inhabitants in an area of 891.82 square kilometers.', 'New York City is famous for the Metropolitan Museum of Art.']
pairs = [(query, doc) for doc in docs]
scores = model.predict(pairs)
Usage with Transformers

With the transformers library, you can use the model like this:

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

model = AutoModelForSequenceClassification.from_pretrained('model_name')
tokenizer = AutoTokenizer.from_pretrained('model_name')

features = tokenizer(['How many people live in Berlin?', 'How many people live in Berlin?'], ['Berlin has a population of 3,520,031 registered inhabitants in an area of 891.82 square kilometers.', 'New York City is famous for the Metropolitan Museum of Art.'],  padding=True, truncation=True, return_tensors="pt")

model.eval()
with torch.no_grad():
    scores = model(**features).logits
    print(scores)
Performance

The performance was evaluated on three datasets:

  • TREC-DL19 EN-EN : The original TREC 2019 Deep Learning Track : Given an English query and 1000 documents (retrieved by BM25 lexical search), rank documents with according to their relevance. We compute NDCG@10. BM25 achieves a score of 45.46, a perfect re-ranker can achieve a score of 95.47.
  • TREC-DL19 DE-EN : The English queries of TREC-DL19 have been translated by a German native speaker to German. We rank the German queries versus the English passages from the original TREC-DL19 setup. We compute NDCG@10.
  • GermanDPR DE-DE : The GermanDPR dataset provides German queries and German passages from Wikipedia. We indexed the 2.8 Million paragraphs from German Wikipedia and retrieved for each query the top 100 most relevant passages using BM25 lexical search with Elasticsearch. We compute MRR@10. BM25 achieves a score of 35.85, a perfect re-ranker can achieve a score of 76.27.

We also check the performance of bi-encoders using the same evaluation: The retrieved documents from BM25 lexical search are re-ranked using query & passage embeddings with cosine-similarity. Bi-Encoders can also be used for end-to-end semantic search.

Model-Name TREC-DL19 EN-EN TREC-DL19 DE-EN GermanDPR DE-DE Docs / Sec
BM25 45.46 - 35.85 -
Cross-Encoder Re-Rankers
cross-encoder/msmarco-MiniLM-L6-en-de-v1 72.43 65.53 46.77 1600
cross-encoder/msmarco-MiniLM-L12-en-de-v1 72.94 66.07 49.91 900
svalabs/cross-electra-ms-marco-german-uncased (DE only) - - 53.67 260
deepset/gbert-base-germandpr-reranking (DE only) - - 53.59 260
Bi-Encoders (re-ranking)
sentence-transformers/msmarco-distilbert-multilingual-en-de-v2-tmp-lng-aligned 63.38 58.28 37.88 940
sentence-transformers/msmarco-distilbert-multilingual-en-de-v2-tmp-trained-scratch 65.51 58.69 38.32 940
svalabs/bi-electra-ms-marco-german-uncased (DE only) - - 34.31 450
deepset/gbert-base-germandpr-question_encoder (DE only) - - 42.55 450

Note: Docs / Sec gives the number of (query, document) pairs we can re-rank within a second on a V100 GPU.

Runs of cross-encoder msmarco-MiniLM-L6-en-de-v1 on huggingface.co

1.2K
Total runs
-147
24-hour runs
-305
3-day runs
-381
7-day runs
-513
30-day runs

More Information About msmarco-MiniLM-L6-en-de-v1 huggingface.co Model

More msmarco-MiniLM-L6-en-de-v1 license Visit here:

https://choosealicense.com/licenses/apache-2.0

msmarco-MiniLM-L6-en-de-v1 huggingface.co

msmarco-MiniLM-L6-en-de-v1 huggingface.co is an AI model on huggingface.co that provides msmarco-MiniLM-L6-en-de-v1's model effect (), which can be used instantly with this cross-encoder msmarco-MiniLM-L6-en-de-v1 model. huggingface.co supports a free trial of the msmarco-MiniLM-L6-en-de-v1 model, and also provides paid use of the msmarco-MiniLM-L6-en-de-v1. Support call msmarco-MiniLM-L6-en-de-v1 model through api, including Node.js, Python, http.

msmarco-MiniLM-L6-en-de-v1 huggingface.co Url

https://huggingface.co/cross-encoder/msmarco-MiniLM-L6-en-de-v1

cross-encoder msmarco-MiniLM-L6-en-de-v1 online free

msmarco-MiniLM-L6-en-de-v1 huggingface.co is an online trial and call api platform, which integrates msmarco-MiniLM-L6-en-de-v1's modeling effects, including api services, and provides a free online trial of msmarco-MiniLM-L6-en-de-v1, you can try msmarco-MiniLM-L6-en-de-v1 online for free by clicking the link below.

cross-encoder msmarco-MiniLM-L6-en-de-v1 online free url in huggingface.co:

https://huggingface.co/cross-encoder/msmarco-MiniLM-L6-en-de-v1

msmarco-MiniLM-L6-en-de-v1 install

msmarco-MiniLM-L6-en-de-v1 is an open source model from GitHub that offers a free installation service, and any user can find msmarco-MiniLM-L6-en-de-v1 on GitHub to install. At the same time, huggingface.co provides the effect of msmarco-MiniLM-L6-en-de-v1 install, users can directly use msmarco-MiniLM-L6-en-de-v1 installed effect in huggingface.co for debugging and trial. It also supports api for free installation.

msmarco-MiniLM-L6-en-de-v1 install url in huggingface.co:

https://huggingface.co/cross-encoder/msmarco-MiniLM-L6-en-de-v1

Url of msmarco-MiniLM-L6-en-de-v1

msmarco-MiniLM-L6-en-de-v1 huggingface.co Url

Provider of msmarco-MiniLM-L6-en-de-v1 huggingface.co

cross-encoder
ORGANIZATIONS

Other API from cross-encoder