deephaven.experimental.s3

class S3Instructions(region_name=None, max_concurrent_requests=None, read_ahead_count=None, fragment_size=None, connection_timeout=None, read_timeout=None, access_key_id=None, secret_access_key=None, anonymous_access=False, endpoint_override=None, write_part_size=None, num_concurrent_write_parts=None)[source]

Bases: JObjectWrapper

S3Instructions provides specialized instructions for reading from and writing to S3-compatible APIs.

Initializes the instructions.

Parameters:
  • region_name (str) – the region name for reading parquet files. If not provided, the default region will be

  • property (picked by the AWS SDK from 'aws.region' system) –

  • variable ("AWS_REGION" environment) –

  • the

  • files ({user.home}/.aws/credentials or {user.home}/.aws/config) –

  • service (or from EC2 metadata) –

  • in (if running) –

  • EC2.

  • max_concurrent_requests (int) – the maximum number of concurrent requests for reading files, default is 256.

  • read_ahead_count (int) – the number of fragments to send asynchronous read requests for while reading the current fragment. Defaults to 32, which means fetch the next 32 fragments in advance when reading the current fragment.

  • fragment_size (int) – the maximum size of each fragment to read, defaults to 64 KiB. If there are fewer bytes remaining in the file, the fetched fragment can be smaller.

  • connection_timeout (Union[Duration, int, str, datetime.timedelta, np.timedelta64, pd.Timedelta]) – the amount of time to wait when initially establishing a connection before giving up and timing out, can be expressed as an integer in nanoseconds, a time interval string, e.g. “PT00:00:00.001” or “PT1s”, or other time duration types. Default to 2 seconds.

  • read_timeout (Union[Duration, int, str, datetime.timedelta, np.timedelta64, pd.Timedelta]) – the amount of time to wait when reading a fragment before giving up and timing out, can be expressed as an integer in nanoseconds, a time interval string, e.g. “PT00:00:00.001” or “PT1s”, or other time duration types. Default to 2 seconds.

  • access_key_id (str) – the access key for reading files. Both access key and secret access key must be provided to use static credentials, else default credentials will be used.

  • secret_access_key (str) – the secret access key for reading files. Both access key and secret key must be provided to use static credentials, else default credentials will be used.

  • anonymous_access (bool) – use anonymous credentials, this is useful when the S3 policy has been set to allow anonymous access. Can’t be combined with other credentials. By default, is False.

  • endpoint_override (str) – the endpoint to connect to. Callers connecting to AWS do not typically need to set this; it is most useful when connecting to non-AWS, S3-compatible APIs.

  • write_part_size (int) – Writes to S3 are done in parts or chunks, and this value determines the size of each part (in bytes). The default value is 10485760 (= 10 MiB) and minimum allowed part size is 5 MiB. Setting a higher value may increase throughput, but may also increase memory usage. Note that the maximum number of parts allowed for a single file is 10,000. Therefore, for 10 MiB part size, the maximum size of a single file that can be written is roughly 100k MiB (or about 98 GiB).

  • num_concurrent_write_parts (int) – the maximum number of parts that can be uploaded concurrently when writing to S3 without blocking, defaults to 64. Setting a higher value may increase throughput, but may also increase memory usage.

Raises:

DHError – If unable to build the instructions object.

j_object_type

alias of S3Instructions