Indexer
claircore/indexer
The Indexer package performs Libindex's heavy lifting. It is responsible for retreiving Manifest layers, parsing the contents of each layer, and computing an IndexReport.
To perform this action in incremental steps the Indexer is implemented as a finite state machine. At each state transition the Indexer persists an updated IndexReport to its datastore.
States
The following diagram expresses the possible states of the Indexer:
stateDiagram-v2 state if_indexed <<choice>> [*] --> CheckManifest CheckManifest --> if_indexed if_indexed --> [*]: Indexed if_indexed --> FetchLayers: Unindexed FetchLayers --> ScanLayers ScanLayers --> Coalesce Coalesce --> IndexManifest IndexManifest --> IndexFinished IndexFinished --> [*] %% These notes make the diagram unreadable :/ %% note left of CheckManifest: Determine if this manifest has been indexed previously. %% note right of FetchLayers: Determine which layers need to be indexed and fetch them. %% note right of ScanLayers: Concurrently run needed Indexers on layers. %% note right of Coalesce: Compute the final contents of the container image. %% note right of IndexManifest: Associate all the discovered data. %% note right of IndexFinished: Persist the results.
Data Model
The Indexer data model focuses on content addressable hashes as primary keys, the deduplication of package/distribution/repostitory information, and the recording of scan artifacts. Scan artifacts are unique artifacts found within a layer which point to a deduplicated general package/distribution/repository record.
The following diagram outlines the current Indexer data model.
%%{init: {"er":{"layoutDirection":"RL"}} }%% erDiagram ManifestLayer many to 1 Manifest: "" ManifestLayer many to 1 Layer: "" ScannedLayer many to 1 Layer: "" ScannedLayer many to 1 Scanner: "" ScannedManifest many to 1 Manifest: "" ScannedManifest many to 1 Scanner: "" TYPE_ScanArtifact 1 to 1 Layer: "" TYPE_ScanArtifact 1 to 1 Scanner: "" TYPE_ScanArtifact 1 to 1 TYPE: "" ManifestIndex many to 1 Manifest: "" ManifestIndex 1 to zero or one TYPE: "" IndexReport 1 to 1 Manifest: "cached result"
Note that TYPE
stands in for each of the Indexer types (i.e. Package
, Repository
, etc.).
HTTP Resources
Indexers as currently built may make network requests. This is an outstanding issue. The following are the URLs used.
- https://search.maven.org/solrsearch/select
- https://catalog.redhat.com/api/containers/
- https://security.access.redhat.com/data/metrics/repository-to-cpe.json
- https://security.access.redhat.com/data/metrics/container-name-repos-map.json