The model is a decoder-only transformer similar to the LLaMA (
Touvron et al., 2023
) architecture with the following modifications:
Parameters
Hidden Size
Layers
Heads
Sequence Length
1,644,417,024
2048
24
32
4096
Position Embeddings
: Rotary Position Embeddings (
Su et al., 2021
) applied to the first 25% of head embedding dimensions for improved throughput following
Black et al. (2022)
.
Biases
: We remove all bias terms from the feed-forward networks and multi-head self-attention layers, except for the biases of the query, key, and value projections (
Bai et al., 2023
).
Tokenizer
: We use Arcade100k, a BPE tokenizer extended from OpenAI's
tiktoken.cl100k_base
. We split digits into individual tokens following findings by
Liu & Low (2023)
.
Training
Training Dataset
The model is trained on a mixture of English and Arabic datasets where 619 billion tokens for English and around 115 billion tokens for Arabic.
Fine-tuning the base
Arabic Stable LM 2 1.6B
for the user’s downstream tasks is recommended.
Training Procedure
The model is a fine-tuned version of Stable LM 1.6B model using a learning scheduler with early cool down. The model is fine-tuned for 300k steps using a cosine and inverse square-root and 200k using a cool down with linear learning rate.
Training Infrastructure
Hardware
: We use two nodes for the training, each with 8 H100 GPUs with a micro-batch size of 6 per GPU. This results in a global batch size of 6 × 2 × 8 = 96 sequences that sums up to around 400K tokens per batch. The full training setup with 500k steps consumes around 197B tokens.
Software
: We use a fork of
gpt-neox
(
EleutherAI, 2021
), train under 2D parallelism (Data and Tensor Parallel) with ZeRO-1 (
Rajbhandari et al., 2019
), and rely on flash-attention as well as SwiGLU and Rotary Embedding kernels from FlashAttention-2 (
Dao et al., 2023
)
Use and Limitations
Intended Use
The model is intended to be used as a foundational base model for application-specific fine-tuning for research only. Users should evaluate the model for safety performance in their specific use case and apply the necessary safeguards and fine-tune the model to facilitate safe performance in downstream applications.
Out-of-scope Use
Out-of-scope uses include use in any manner that violates applicable laws or regulations, Stability AI’s
Acceptable Use Policy
or license agreement, or use in languages outside of those explicitly supported by this model.
Limitations and Bias
As a base model, this model may exhibit unreliable or other undesirable behaviors that should be corrected through evaluation and fine-tuning prior to deployment. Given that each use case is unique, running a suite of tests may help facilitate proper performance of this model. Using this model will require guardrails around the user’s inputs and outputs to ensure that any outputs returned are not harmful. Pairing this model with an input and output classifier may help prevent harmful responses. Users should exercise caution when using these models in production systems and should not use the models if they are unsuitable for the user’s application.
How to Cite
@misc{alyafeai2024arabicstablelmadapting,
title={Arabic Stable LM: Adapting Stable LM 2 1.6B to Arabic},
author={Zaid Alyafeai and Michael Pieler and Hannah Teufel and Jonathan Tow and Marco Bellagente and Duy Phung and Nikhil Pinnaparaju and Reshinth Adithyan and Paulo Rocha and Maksym Zhuravinskyi and Carlos Riquelme},
year={2024},
eprint={2412.04277},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2412.04277},
}
Runs of stabilityai ar-stablelm-2-base on huggingface.co
107
Total runs
7
24-hour runs
10
3-day runs
21
7-day runs
48
30-day runs
More Information About ar-stablelm-2-base huggingface.co Model
ar-stablelm-2-base huggingface.co is an AI model on huggingface.co that provides ar-stablelm-2-base's model effect (), which can be used instantly with this stabilityai ar-stablelm-2-base model. huggingface.co supports a free trial of the ar-stablelm-2-base model, and also provides paid use of the ar-stablelm-2-base. Support call ar-stablelm-2-base model through api, including Node.js, Python, http.
ar-stablelm-2-base huggingface.co is an online trial and call api platform, which integrates ar-stablelm-2-base's modeling effects, including api services, and provides a free online trial of ar-stablelm-2-base, you can try ar-stablelm-2-base online for free by clicking the link below.
stabilityai ar-stablelm-2-base online free url in huggingface.co:
ar-stablelm-2-base is an open source model from GitHub that offers a free installation service, and any user can find ar-stablelm-2-base on GitHub to install. At the same time, huggingface.co provides the effect of ar-stablelm-2-base install, users can directly use ar-stablelm-2-base installed effect in huggingface.co for debugging and trial. It also supports api for free installation.