This model represents an improved version of the
SecureBERT
model, trained on a corpus eight times larger than its predecessor, leveraging the computational power of 8xA100 GPUs. This version, known as SecureBERT+, brings forth an average improvment of 9% in the performance of the Masked Language Model (MLM) task. This advancement signifies a substantial stride towards achieving heightened proficiency in language understanding and representation learning within the cybersecurity domain.
SecureBERT is a domain-specific language model based on RoBERTa which is trained on a huge amount of cybersecurity data and fine-tuned/tweaked to understand/represent cybersecurity textual data.
Dataset
Load Model
SecureBER+T has been uploaded to
Huggingface
framework.
from transformers import RobertaTokenizer, RobertaModel
import torch
tokenizer = RobertaTokenizer.from_pretrained("ehsanaghaei/SecureBERT_Plus")
model = RobertaModel.from_pretrained("ehsanaghaei/SecureBERT_Plus")
inputs = tokenizer("This is SecureBERT Plus!", return_tensors="pt")
outputs = model(**inputs)
last_hidden_states = outputs.last_hidden_state
Fill Mask (MLM)
Use the code below to predict the masked word within the given sentences:
#!pip install transformers#!pip install torch#!pip install tokenizersimport torch
import transformers
from transformers import RobertaTokenizer, RobertaTokenizerFast
tokenizer = RobertaTokenizerFast.from_pretrained("ehsanaghaei/SecureBERT_Plus")
model = transformers.RobertaForMaskedLM.from_pretrained("ehsanaghaei/SecureBERT_Plus")
defpredict_mask(sent, tokenizer, model, topk =10, print_results = True):
token_ids = tokenizer.encode(sent, return_tensors='pt')
masked_position = (token_ids.squeeze() == tokenizer.mask_token_id).nonzero()
masked_pos = [mask.item() for mask in masked_position]
words = []
with torch.no_grad():
output = model(token_ids)
last_hidden_state = output[0].squeeze()
list_of_list = []
for index, mask_index inenumerate(masked_pos):
mask_hidden_state = last_hidden_state[mask_index]
idx = torch.topk(mask_hidden_state, k=topk, dim=0)[1]
words = [tokenizer.decode(i.item()).strip() for i in idx]
words = [w.replace(' ','') for w in words]
list_of_list.append(words)
if print_results:
print("Mask ", "Predictions: ", words)
best_guess = ""for j in list_of_list:
best_guess = best_guess + "," + j[0]
return words
whileTrue:
sent = input("Text here: \t")
print("SecureBERT: ")
predict_mask(sent, tokenizer, model)
print("===========================\n")
@inproceedings{aghaei2023securebert,
title={SecureBERT: A Domain-Specific Language Model for Cybersecurity},
author={Aghaei, Ehsan and Niu, Xi and Shadid, Waseem and Al-Shaer, Ehab},
booktitle={Security and Privacy in Communication Networks:
18th EAI International Conference, SecureComm 2022, Virtual Event,
October 2022,
Proceedings},
pages={39--56},
year={2023},
organization={Springer} }
Runs of ehsanaghaei SecureBERT_Plus on huggingface.co
5.0K
Total runs
0
24-hour runs
2.0K
3-day runs
2.0K
7-day runs
-5.3K
30-day runs
More Information About SecureBERT_Plus huggingface.co Model
SecureBERT_Plus huggingface.co is an AI model on huggingface.co that provides SecureBERT_Plus's model effect (), which can be used instantly with this ehsanaghaei SecureBERT_Plus model. huggingface.co supports a free trial of the SecureBERT_Plus model, and also provides paid use of the SecureBERT_Plus. Support call SecureBERT_Plus model through api, including Node.js, Python, http.
SecureBERT_Plus huggingface.co is an online trial and call api platform, which integrates SecureBERT_Plus's modeling effects, including api services, and provides a free online trial of SecureBERT_Plus, you can try SecureBERT_Plus online for free by clicking the link below.
ehsanaghaei SecureBERT_Plus online free url in huggingface.co:
SecureBERT_Plus is an open source model from GitHub that offers a free installation service, and any user can find SecureBERT_Plus on GitHub to install. At the same time, huggingface.co provides the effect of SecureBERT_Plus install, users can directly use SecureBERT_Plus installed effect in huggingface.co for debugging and trial. It also supports api for free installation.