Arabic AL-BERT

Our pretraining procedure follows training settings of bert with some changes: trained for 7M training steps with batchsize of 64, instead of 125K with batchsize of 4096. These models were trained using Google ALBERT’s github repository on a single TPU v3-8 provided for free from TFRC.

Continue reading…