Use Mistral LLM for Text Classification and Entity Recognition

Introduction to Mistral model

Mistral 7B is a decoder-based language model trained using almost 7 billion parameters designed to deliver both efficiency and high performance for real-world applications.

Employing attention mechanisms like Sliding Window Attention, Mistral 7B can train with an 8k context length and a fixed cache size, resulting in a theoretical attention span of 128K tokens. This capability allows the model to focus on crucial parts of the text efficiently. Moreover, the model incorporates Grouped Query Attention (GQA) to accelerate inference and reduce cache size, thereby expediting its inference process. Additionally, its Byte-fallback tokenizer ensures consistent representation of characters, eliminating the need for out-of-vocabulary tokens.

Such design features in its architecture equip Mistral 7B for exceptional performance, particularly in tasks related to language comprehension and generation. In this guide we see how we can use the Mistral LLM for text classification and named entity recognition.

Mistral Implementation in arcgis.learn

Install the model backbone

Follow these steps to download and install the Mistral model backbone:

Download the mistral model backbone.
Extract the downloaded zip file.
Open the anaconda prompt and move to the folder that contains arcgis_mistral_backbone-1.0.0-py_0.tar.bz2
Run:

conda install --offline arcgis_mistral_backbone-1.0.0-py_0.tar.bz2

Mistral with the TextClassifier model

Import the TextClassifier class from the arcgis.learn.text module

from arcgis.learn.text import TextClassifier

Initialize the TextClassifier model with a databunch

Prepare databunch for the TextClassifier model using theprepare_textdata method in arcgis.learn.

from arcgis.learn import prepare_textdata
data = prepare_textdata("path_to_data_folder", task="classification",train_file="input_csv_file.csv",
                        text_columns="text_column", label_columns="label_column")

Once the data is prepared, the TextClassifier model object can be instantiated as below with the following parameters:

data: The databunch created using the prepare_textdata method.

backbone: To use mistral as the model backbone, use backbone="mistral".

prompt: Text string describing the task and its guardrails. This is an optional parameter.

classifier_model = TextClassifier(
    data=data,
    backbone="mistral",
    prompt="Classify all the input sentences into the defined labels, do not make up your own labels."
)

Initialize the TextClassifier model without a databunch

A TextClassifier model with a mistral backbone can also be created without a large dataset using only a few examples.

Below are the parameters to be passed into TextClassifier:

backbone: To use mistral as the model backbone, use backbone="mistral".

examples: User defined examples to provide the mistral model, in python dictionary format:

{
    "label_1" :["input_text_example_1", "input_text_example_2", ...],
    "label_2" :["input_text_example_1", "input_text_example_2", ...],
    ...
}

prompt: Text string describing the task and its guardrails. This is an optional parameter.

classifier_model = TextClassifier(
    data=None,
    backbone="mistral",
    prompt="Classify all the input sentences into the defined labels, do not make up your own labels.",
    examples={
                "positive" : [" Good! it was a wonderful experience!", "i really adore your work"],
                "negative" : ["The customer support was unhelpful", "I don`t like your work"]
            }
)

Classify the text using mistral model

To classify text using the mistral model, use the predict method from the TextClassifier class. The input to the method will be a text string or a list of text string.

classifier_model.predict()

Load the model

To load a saved mistral model, use the from_model method from the TextClassifier class.

classifier_model.from_model(r'path_to_dlpk_file')

Save the model

The following method saves the model weights and creates a Deep Learning Package (.dlpk).

classifier_model.save("name_of_the_mistral_model")

Mistral with an EntityRecognizer model

Import the EntityRecognizer class from the arcgis.learn.text module

from arcgis.learn.text import EntityRecognizer

Initialize the EntityRecognizer model with a databunch

Prepare the databunch for the EntityRecognizer model using the prepare_textdata method in arcgis.learn.

from arcgis.learn import prepare_textdata
data = prepare_textdata("path_to_data_file", task="entity_recognition", dataset_type='ner_json')

Once the data is prepared, the EntityRecognizer model object can be instantiated with the following parameters:

data: The databunch created using the prepare_textdata method.

backbone: To use mistral as the model backbone, use backbone="mistral".

prompt: Text string describing the task and its guardrails. This is an optional parameter.

entity_recognizer_model = EntityRecognizer(
    data=data,
    backbone="mistral",
    prompt="Tag the input sentences in the named entity for the given classes, no other class should be tagged."
)

Initialize the EntityRecognizer model without a databunch

An EntityRecognizer model with a mistral backbone can also be created without a large dataset by using only a few examples.

Below are the parameters to be passed into EntityRecognizer :

backbone: To use mistral as the model backbone, use backbone="mistral".

examples: User defined examples for the mistral model, in python list format:

[
    ("input_text_sentence", 
         {
             "class_1":["Named Entity", ...],
             "class_2": ["Named entity", ...],
             ...
         }
    )
    ...
]

Note: The EntityRecognizer class, using the "Mistral" backbone, needs at least six examples to work effectively.

prompt: Text string describing the task and its guardrails. This is an optional parameter.

entity_recognizer_model = EntityRecognizer(
    data=None,
    backbone="mistral",
    prompt="Tag the input sentences in the named entity for the given classes, no other class should be tagged."
    examples=[(
            'Jim stays in London',
            {
                'name': ['Jim'], 
                'location': ['London']
            },
            ...
        )]
)

Extract entities using the mistral model

To extract named entities using the mistral model, use the extract_entities method from the EntityRecognizer class. The input to the method will be a text string or a list of text strings.

entity_recognizer_model.extract_entities()

Load the model

To load a saved mistral model, use the from_model method from the EntityRecognizer class.

entity_recognizer_model.from_model(r'path_to_dlpk_file')

Save the model

This method saves the model weights and creates a Deep Learning Package (.dlpk).

entity_recognizer_model.save("name_of_the_mistral_model")

Conclusion

In this guide we demonstrated the steps to initialize and perform inference using the Mistral LLM as a backbone with the TextClassifier and EntityRecognizer models in arcgis.learn.

References

Mistral-7B HuggingFace: https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2
Mistral-7B MistralAI: https://mistral.ai/news/announcing-mistral-7b