8.9. Elasticsearch internals

See also

Database Schema

Elasticsearch index schema definitions and details

PEAT Elasticsearch indices reference

Table of the Elasticsearch indices used by PEAT

Elasticsearch

Elasticsearch usage and other information.

8.9.1. Notes

  • PEAT follows the Elastic Common Schema (ECS), and any changes must adhere to the ECS (when possible)

  • All indices share the ECS Base and Agent field sets (refer to Database Schema)

  • ALL timestamps are in the UTC timezone

  • Field types (the “Type” column in the tables) are Elasticsearch datatypes (reference). When storing as a plain JSON file, ensure the format it is stored in either matches or can be cohered to the corresponding ES format.

  • The document’s _id field is unique for each document. The format is: peat~<run-id>~<microsecond>, where <microsecond> is an integer.

  • Sub-fields are nested JSON objects. From the ECS Guidelines: “The document structure should be nested JSON objects. If you use Beats or Logstash, the nesting of JSON objects is done for you automatically. If you’re ingesting to Elasticsearch using the API, your fields must be nested objects, not strings containing dots.”

8.9.2. Code documentation

8.9.2.1. Elastic

class PeatElasticSerializer[source]
default(data)[source]
Return type:

bool | str | float | int | list | None

class PeatOpenSearchSerializer[source]
default(data)[source]
Return type:

bool | str | float | int | list | None

class Elastic(server_url='http://localhost:9200/')[source]

Wrapper for interacting with an Elasticsearch or OpenSearch database.

ECS_VERSION = '8.10.0'

Version of ECS PEAT currently follows.

property es: Elasticsearch | OpenSearch

Elasticsearch or OpenSearch client instance.

If it doesn’t exist yet, this will create a client object and connect to the server. Otherwise, will return the existing instance.

info()[source]

Information about the Elasticsearch/OpenSearch server/cluster.

Return type:

str

ping()[source]

Check if the server is online and the connection is working.

Return type:

bool

disconnect()[source]

Disconnect from the Elasticsearch/OpenSearch server.

Return type:

None

doc_exists(index, doc_id)[source]

Check if a document exists on an index.

Note: this won’t auto-resolve dated index names.

Return type:

bool

index_exists(index)[source]

Check if an Elasticsearch/OpenSearch index exists.

This method caches index existence checks to reduce number of requests to the server.

Parameters:

index (str) -- Name of the index to check (this can be any valid index pattern)

Return type:

bool

Returns:

If the index exists

create_index(index, fields_limit=20000)[source]

Create an index in Elasticsearch/OpenSearch if it doesn’t already exist.

Parameters:
  • index (str) -- Name of the index to create

  • fields_limit (int) -- Elastic limits the number of fields in an index to 1000 by default, which is problematic for some devices that have protocol register mappings (e.g. DNP3, Modbus). To avoid this, we raise the limit by default for all PEAT indices. This option allows us to tweak that limit as needed for specific indices.

Return type:

bool

Returns:

If the index was successfully created

search(index, query=None, body=None)[source]

Query for values from an index.

Note

By default, results sorted are in descending order by timestamp

Parameters:
Return type:

list[dict]

Returns:

List of results, in descending order by timestamp (unless a custom body is provided with a custom “sort” argument). The list will be empty if there were no results or an error occurred.

Query for data in Elasticsearch.

Assumes you know what you’re doing and want direct access to the API.

Parameters:

search_args (dict) -- Arguments dict to pass directly to Elasticsearch.search as keyword arguments (aka “kwargs”)

Return type:

dict | None

Returns:

Raw response dict or None if an error occurred

gen_body(content)[source]

Generate the basic body of doc to be pushed to Elasticsearch, auto-populating standard fields such as “observer”, “@timestamp”, etc.

Return type:

dict[str, Any]

bulk_push(index, contents)[source]

Upload multiple docs to an Elasticsearch index.

Note

Index names will have a date appended, unless no_date=True. For example, peat-configs will become peat-configs-2020.01.01.

Parameters:
  • index (str) -- Name of the Elasticsearch Index to push to

  • contents (list[tuple[str, dict]]) -- data to send, as a list of tuple with doc ID and data payload

Return type:

bool

Returns:

True if the bulk push was successful for all docs, False if any docs failed or index creation failed.

push(index, content, doc_id=None, no_date=False)[source]

Upload data to an Elasticsearch index.

Note

Index names will have a date appended, unless no_date=True. For example, peat-configs will become peat-configs-2020.01.01.

Parameters:
  • index (str) -- Name of the Elasticsearch Index to push to

  • content (dict) -- Data to be pushed (this is added to the body)

  • doc_id (str | None) -- Document ID to create or update. If None, a ID will be automatically generated and used instead.

  • no_date (bool) -- Don’t add a date to index name

Return type:

bool

Returns:

True if the push was successful, False if there was an error or if index creation failed.

static bencode(blob)[source]

Encodes bytes into an Elastic-friendly Base64 string.

Return type:

str

static pickle(obj)[source]

Pickle a Python object into an Elastic-friendly Base64 string.

Return type:

str

classmethod convert_tstamp(tstamp)[source]

Converts a timestamp into a format compatible with Elasticsearch.

Return type:

str | None

classmethod time_now()[source]
Return type:

str | None

static gen_id()[source]

Generate a unique string used in a document’s ‘_id’ field.

Return type:

str

Returns:

String in the format peat~<run-id>~<microsecond>~<random>,

where <microsecond> and <random> are integers.

8.9.2.2. Index type mappings

Mappings (the Elastic “schema”) for the various PEAT Elasticsearch indices.

These encode the types defined in the Elastic schemas for integration of third-party tools and other Sandia capabilities. Schema reference: Database Schema

Note

Type is only required for individual fields, NOT documents

Data structure

  • Key: Name of the index (e.g. ot-device-hosts-timeseries).

  • Value: The field mapping for the index, including field types

    and other field configurations, such as tokenizers or filters.

Official Elasticsearch documentation and references