Scan Settings
The following parameters define how your model is scanned.
| Field Name | Description | Values |
|---|---|---|
|
Temperature |
The Temperature value controls the balance between the predictability and creativity of the generated text in the response. Lower temperature value returns more predictable and conservative responses whereas higher temperature value returns less predictable and creative response. |
Range: 0 to 1 Default value: 0.7 |
|
Top K |
This configuration controls the randomness and diversity of the generated text. |
Any integer value. Default value: 10 |
|
Maximum Tokens |
Maximum number of generated tokens by Target LLM. |
Any integer value. Default value: 512 |
|
Top P |
This configuration controls the randomness and diversity of the generated text. It employs a technique called nucleus sampling or top-p sampling. |
Range: 0 to 1 Default value: 0.95 |
|
Repetition Penalty |
This configuration reduces the likelihood of repetitive text in text generation. If set to 1, there is no penalty. Higher values increasingly discourage the repetition of tokens. |
Float value between 1 and 2. Default value: 1.03 |
|
Response Timeout |
Timeout, in seconds, for a single inference request sent to the Target LLM. |
Any integer value Default value: 120. |
|
Retry Attempts |
Number of times a single inference request is attempted. |
Default value: 5 |
|
Parallelism |
The Parallelism value defines how many requests are processed at the same time during a model scan. It helps control the scan rate and performance. |
Default value: 5 Supported range: 1–20 |