BERT can be used for both feature extraction and fine-tuning.
In feature extraction, BERT is used as a pre-trained model to extract meaningful features from text data. The pre-trained weights of BERT can be used to encode the input text into a fixed-length vector, which can then be used as input to a downstream task such as text classification or sentiment analysis. The encoded text is also known as “embeddings” which can be used to compare the similarity of the two text.
In fine-tuning, BERT is further trained on a specific task or dataset to adapt it to the task at hand. The pre-trained weights of BERT serve as a starting point, and the model is further trained on a labeled dataset specific to the task.
So, to put it in a nutshell, BERT can be used for both feature extraction and fine-tuning but it is mostly used for fine-tuning on specific task.