File Insight Scan
QScanner collects information about the files associated with the scan target. This is based on the built-in rules.
QScanner collects the following information based on the matched rule configuration.
- File path
- Permission
- File size in bytes
- Origin layer (applicable for image artifacts only)
- MIME type
- MD5 digest
- SHA1 digest
- SHA256 digest
- Capture groups
- Executable Info (available for binaries only)
- Name (ELF - Executable and Linkable Format and PE - Portable Executable)
- Version (ELF and PE)
- PURL (ELF and PE)
- Imported Libraries (ELF and PE)
- Product Name (PE only)
- Product Version (PE only)
- Company Name (PE only)
- Copyright (PE only)
- Description (PE only)
- Publisher (PE only)
File Insight scan does not collect files associated with the OS installation. The final report does not include information about the system-installed files. To collect information about all files including system-installed files, run File Insight scan individually without any other scan.
./qscanner --mode inventory-only --format json --scan-types fileinsight image python
What is the FileInsight Config File?
File Insight scan is governed by the config file specified using --fileinsight-config-file <path/to/config/file>.
It is composed of Rules that define what files to scan and which all attributes to extract from it. If this config file is not provided, QScanner will use built-in config for a File Insight scan.
Built-in Rules Explained
The following are the built-in rules supported by Qscanner's File Insight scan type.
| Rule Name | Category | Enabled | Description |
|---|---|---|---|
| yum_repos | OS Supplement | Yes | Content of all files under /etc/yum.repos.d/*.repo |
| dnf_modules | OS Supplement | Yes | Content of all files under /etc/dnf/modules.d/*.module |
| content_manifests | OS Supplement | Yes | Content of all files under /root/buildinfo/content_manifests/*.json |
| els_release | OS Supplement | Yes | Content of /etc/els-release file. |
| model_files_in_keras_model_path | AI Datapoint | Yes | Basic stats of AI Model files under **/.keras/models/ |
| model_files_in_pytorch_model_path | AI Datapoint | Yes | Basic stats of AI Model files under **/torch/hub/checkpoints/ |
| model_files_in_hf_model_path | AI Datapoint | Yes | Basic stats of AI Model files under **/huggingface/transformers/ |
| model_files_in_intel_openvino_deployment_tools | AI Datapoint | Yes | Basic stats of AI Model files under /opt/intel/openvino/deployment_tools/model_optimizer/ |
| model_files_in_mx_model_path | AI Datapoint | Yes | Basic stats of AI Model files under **/.mxnet/models/ |
What is the FileInsight Rule Structure?
A default built-in rule file contains the following parameters:
- id: UUID for this rule. Each rule must have unique non-empty value for ID.
- category: Defines the broad level type of collection this rule is intended for.
- name: Displays name for the rule. This must be non-empty.
- disabled: If true, this rule will not be applied.
Default value: false. - filter: Filters are used to decide which files to scan. Following filters are supported:
- path_filter: Specify a glob pattern here to filter file based on full-path.
- type_filter: Specify type of file to scan. The following options are supported:
- any: Default option. To scan all files such as - text, binary, image, obj and so on.
- executable: To scan only ELF (
application/x-sharedlibandapplication/x-executable) and PE (application/vnd.microsoft.portable-executable) files. - non-executable: To scan only non-executables.
- matcher: Matchers are used to collect data from the file. It uses regexes and stores data as named capature groups. The following matchers are supported:
- path_matchers: Add capture groups on filepath. Multiple path matchers can be provided for a single rule. If there is a match, others are not evaluated.
- content_matchers: Add capture groups on file content. Multiple content matchers can be provided for a single rule. If there is a match, others are not evaluated.
- attributes: This defines the attributes that need to be collected for this file. If nothing is specified, only basic stats will be collected. If all is provided, all supported attributes are collected.
For the list of supported attributes, referpkg/fileinsightconfig/attribute.go.
What are Matchers?
PathMatchers and ContentMatchers are used for collecting named capture groups based on file path and file content, respectively. In case of ELF, QScanner used the .rodata section of the ELF for matching against the provided rules. For ELF, name, version, and release are special named capture groups that are used by the scanner to identify the binary version (For example, [email protected]). Notable points:
- Only alpha-numerics and underscore characters are allowed [A-Z, a-z, 0-9, _]
- Multiple path and content matchers can be provided. Whichever matches first, is used.
- If there is an overlap in the capture group name between path matcher and content matcher, latter gets preference.
- If multiple rules use same capture group name for the same file, then the rule evaluated later replaces the existing one. Ensure that such rules don't exist in the config file.
What are Special Capture Groups?
When scanning ELF binaries, following are the special capture groups used to identify binary name and version:
- name: Used for
ExecutableInfo.Name. If name is not found in capture group, filename is used as a fallback. - version: Used for
ExecutableInfo.Version. - release: Optional. If found for a rule, it is used to populate the version with format
{$VERSION}-{$RELEASE}.
For ELF, the content matcher will match only if it has a non-empty name and version.