allenai / aspire-contextualsentence-singlem-biomed

huggingface.co
Total runs: 136
24-hour runs: 0
7-day runs: -4
30-day runs: 89
Model's Last Updated: Oktober 03 2022
feature-extraction

Introduction of aspire-contextualsentence-singlem-biomed

Model Details of aspire-contextualsentence-singlem-biomed

Overview

Model included in a paper for modeling fine grained similarity between documents:

Title : "Multi-Vector Models with Textual Guidance for Fine-Grained Scientific Document Similarity"

Authors : Sheshera Mysore, Arman Cohan, Tom Hope

Paper : https://arxiv.org/abs/2111.08366

Github : https://github.com/allenai/aspire

Note : In the context of the paper, this model is referred to as tsAspire and represents the papers proposed multi-vector model for fine-grained scientific document similarity.

Model Card
Model description

This model is a BERT based multi-vector model trained for fine-grained similarity of biomedical scientific papers. This model inputs the title and abstract of a paper and represents a paper with a contextual sentence vectors obtained by averaging the token representations of individual sentences - the whole title and abstract are encoded with cross-attention in the encoder block before obtaining sentence embeddings. The model is trained by leveraging a novel form of textual supervision which leverages co-citation contexts to align the sentences of positive examples. Test time behavior ranks documents based on the smallest L2 distance of sentences between documents or the smallest L2 distance between a set of query sentences and a candidate document.

Training data

The model is trained on pairs of co-cited papers with their sentences aligned by the co-citation context in a contrastive learning setup. The model is trained on 1.2 million biomedical paper pairs. In training the model, negative examples for the contrastive loss are obtained as random in-batch negatives. Co-citations are obtained from the full text of papers. For example - the papers in brackets below are all co-cited and each pair of papers would be used as a training pair with the abstracts sentence aligned using the co-citation context. Here the context notes why the cited papers are similar:

The idea of distant supervision has been proposed and used widely in Relation Extraction (Mintz et al., 2009; Riedel et al., 2010; Hoffmann et al., 2011; Surdeanu et al., 2012) , where the source of labels is an external knowledge base.

Training procedure

The model was trained with the Adam Optimizer and a learning rate of 2e-5 with 1000 warm-up steps followed by linear decay of the learning rate. The model training convergence is checked with the loss on a held out dev set consisting of co-cited paper pairs.

Intended uses & limitations

This model is trained for fine-grained document similarity tasks in biomedical scientific text using multiple vectors per document. The model allows fine grained similarity by establishing sentence-to-sentence similarity between documents. The model is most well suited to an aspect conditional task formulation where a query might consist of sentence in a query document and candidates must be retrieved along this specified sentences. Here, the documents are the title and abstract of a paper. With appropriate fine-tuning the model can also be used for other tasks such as document or sentence level classification. Since the training data comes primarily from biomedicine, performance on other domains may be poorer.

How to use

This model can be used via the transformers library and some additional code to compute contextual sentence vectors.

View example usage in the model github repo: https://github.com/allenai/aspire#tsaspire

Variable and metrics

This model is evaluated on information retrieval datasets with document level queries. Here we report performance on RELISH (biomedical/English), and TRECCOVID (biomedical/English). These are detailed on github and in our paper . These datasets represent a abstract level retrieval task, where given a query scientific abstract the task requires the retrieval of relevant candidate abstracts. In using this sentence level model for abstract level retrieval we rank documents by the minimal L2 distance between the sentences in the query and candidate abstract.

Evaluation results

The released model aspire-contextualsentence-singlem-biomed is compared against allenai/specter , a bi-encoder baseline and all-mpnet-base-v2 a strong non-contextual sentence-bert baseline model trained on ~1 billion training examples. aspire-contextualsentence-singlem-biomed * is the performance reported in our paper by averaging over 3 re-runs of the model. The released model aspire-contextualsentence-singlem-biomed is the single best run among the 3 re-runs.

TRECCOVID TRECCOVID RELISH RELISH
MAP NDCG%20 MAP NDCG%20
all-mpnet-base-v2 17.35 43.87 52.92 69.69
specter 28.24 59.28 60.62 77.20
aspire-contextualsentence-singlem-biomed * 26.24 56.55 61.29 77.89
aspire-contextualsentence-singlem-biomed 26.68 57.21 61.06 77.70

Alternative models:

Besides the above models consider these alternative models also released in the Aspire paper:

aspire-contextualsentence-singlem-compsci : If you wanted to run on computer science papers and want to use a model trained to match a single sentence between documents.

aspire-contextualsentence-multim-biomed : If you wanted to run on biomedical papers and want to use a model trained to match multiple sentences between documents.

aspire-contextualsentence-multim-compsci : If you wanted to run on computer science papers and want to use a model trained to match multiple sentences between documents.

Runs of allenai aspire-contextualsentence-singlem-biomed on huggingface.co

136
Total runs
0
24-hour runs
2
3-day runs
-4
7-day runs
89
30-day runs

More Information About aspire-contextualsentence-singlem-biomed huggingface.co Model

More aspire-contextualsentence-singlem-biomed license Visit here:

https://choosealicense.com/licenses/apache-2.0

aspire-contextualsentence-singlem-biomed huggingface.co

aspire-contextualsentence-singlem-biomed huggingface.co is an AI model on huggingface.co that provides aspire-contextualsentence-singlem-biomed's model effect (), which can be used instantly with this allenai aspire-contextualsentence-singlem-biomed model. huggingface.co supports a free trial of the aspire-contextualsentence-singlem-biomed model, and also provides paid use of the aspire-contextualsentence-singlem-biomed. Support call aspire-contextualsentence-singlem-biomed model through api, including Node.js, Python, http.

aspire-contextualsentence-singlem-biomed huggingface.co Url

https://huggingface.co/allenai/aspire-contextualsentence-singlem-biomed

allenai aspire-contextualsentence-singlem-biomed online free

aspire-contextualsentence-singlem-biomed huggingface.co is an online trial and call api platform, which integrates aspire-contextualsentence-singlem-biomed's modeling effects, including api services, and provides a free online trial of aspire-contextualsentence-singlem-biomed, you can try aspire-contextualsentence-singlem-biomed online for free by clicking the link below.

allenai aspire-contextualsentence-singlem-biomed online free url in huggingface.co:

https://huggingface.co/allenai/aspire-contextualsentence-singlem-biomed

aspire-contextualsentence-singlem-biomed install

aspire-contextualsentence-singlem-biomed is an open source model from GitHub that offers a free installation service, and any user can find aspire-contextualsentence-singlem-biomed on GitHub to install. At the same time, huggingface.co provides the effect of aspire-contextualsentence-singlem-biomed install, users can directly use aspire-contextualsentence-singlem-biomed installed effect in huggingface.co for debugging and trial. It also supports api for free installation.

aspire-contextualsentence-singlem-biomed install url in huggingface.co:

https://huggingface.co/allenai/aspire-contextualsentence-singlem-biomed

Url of aspire-contextualsentence-singlem-biomed

aspire-contextualsentence-singlem-biomed huggingface.co Url

Provider of aspire-contextualsentence-singlem-biomed huggingface.co

allenai
ORGANIZATIONS

Other API from allenai

huggingface.co

Total runs: 91.7K
Run Growth: 78.6K
Growth Rate: 85.70%
Updated: Oktober 18 2023
huggingface.co

Total runs: 77.3K
Run Growth: -517.1K
Growth Rate: -669.13%
Updated: Oktober 10 2024
huggingface.co

Total runs: 63.3K
Run Growth: 51.7K
Growth Rate: 81.63%
Updated: Oktober 10 2024
huggingface.co

Total runs: 61.6K
Run Growth: -50.5K
Growth Rate: -81.96%
Updated: Dezember 04 2024
huggingface.co

Total runs: 23.0K
Run Growth: 7.7K
Growth Rate: 33.79%
Updated: August 14 2024
huggingface.co

Total runs: 8.5K
Run Growth: 3.3K
Growth Rate: 36.78%
Updated: Juli 16 2024
huggingface.co

Total runs: 6.1K
Run Growth: -21.5K
Growth Rate: -354.06%
Updated: Juli 03 2024
huggingface.co

Total runs: 5.1K
Run Growth: -17.0K
Growth Rate: -321.48%
Updated: Juli 16 2024
huggingface.co

Total runs: 2.5K
Run Growth: -163
Growth Rate: -6.49%
Updated: Dezember 04 2024
huggingface.co

Total runs: 1.7K
Run Growth: -110
Growth Rate: -6.43%
Updated: Juli 16 2024
huggingface.co

Total runs: 895
Run Growth: 878
Growth Rate: 98.10%
Updated: Januar 24 2023
huggingface.co

Total runs: 502
Run Growth: -100
Growth Rate: -21.23%
Updated: Januar 24 2023
huggingface.co

Total runs: 486
Run Growth: 256
Growth Rate: 52.67%
Updated: Februar 12 2024
huggingface.co

Total runs: 374
Run Growth: 354
Growth Rate: 94.65%
Updated: Juni 13 2024
huggingface.co

Total runs: 313
Run Growth: -437
Growth Rate: -139.62%
Updated: April 30 2024
huggingface.co

Total runs: 297
Run Growth: 159
Growth Rate: 53.54%
Updated: April 19 2024