weaviate.batch package

Module for uploading objects and references to Weaviate in batches.

class weaviate.batch.Batch(connection: Connection)[source]

Bases: object

Batch class used to add multiple objects or object references at once into weaviate. To add data to the Batch use these methods of this class: add_data_object and add_reference. This object also stores 2 recommended batch size variables, one for objects and one for references. The recommended batch size is updated with every batch creation, and is the number of data objects/references that can be sent/processed by the Weaviate server in creation_time interval (see configure or __call__ method on how to set this value, by default it is set to 10). The initial value is None/batch_size and is updated with every batch create methods. The values can be accessed with the getters: recommended_num_objects and recommended_num_references. NOTE: If the UUID of one of the objects already exists then the existing object will be replaced by the new object.

This class can be used in 3 ways:

Case I:

Everything should be done by the user, i.e. the user should add the objects/object-references and create them whenever the user wants. To create one of the data type use these methods of this class: create_objects, create_references and flush. This case has the Batch instance’s batch_size set to None (see docs for the configure or __call__ method). Can be used in a context manager, see below.

Case II:

Batch auto-creates when full. This can be achieved by setting the Batch instance’s batch_size set to a positive integer (see docs for the configure or __call__ method). The batch_size in this case corresponds to the sum of added objects and references. This case does not require the user to create the batch/s, but it can be done. Also to create non-full batches (last batch/es) that do not meet the requirement to be auto-created use the flush method. Can be used in a context manager, see below.

Case III:

Similar to Case II but uses dynamic batching, i.e. auto-creates either objects or references when one of them reached the recommended_num_objects or recommended_num_references respectively. See docs for the configure or __call__ method for how to enable it.

Context-manager support: Can be use with the with statement. When it exists the context-

manager it calls the flush method for you. Can be combined with configure/__call__ method, in order to set it to the desired Case.

Examples

Here are examples for each CASE described above. Here client is an instance of the weaviate.Client.

>>> object_1 = '154cbccd-89f4-4b29-9c1b-001a3339d89d'
>>> object_2 = '154cbccd-89f4-4b29-9c1b-001a3339d89c'
>>> object_3 = '254cbccd-89f4-4b29-9c1b-001a3339d89a'
>>> object_4 = '254cbccd-89f4-4b29-9c1b-001a3339d89b'

For Case I:

>>> client.batch.shape
(0, 0)
>>> client.batch.add_data_object({}, 'MyClass')
>>> client.batch.add_data_object({}, 'MyClass')
>>> client.batch.add_reference(object_1, 'MyClass', 'myProp', object_2)
>>> client.batch.shape
(2, 1)
>>> client.batch.create_objects()
>>> client.batch.shape
(0, 1)
>>> client.batch.create_references()
>>> client.batch.shape
(0, 0)
>>> client.batch.add_data_object({}, 'MyClass')
>>> client.batch.add_reference(object_3, 'MyClass', 'myProp', object_4)
>>> client.batch.shape
(1, 1)
>>> client.batch.flush()
>>> client.batch.shape
(0, 0)

Or with a context manager:

>>> with client.batch as batch:
...     batch.add_data_object({}, 'MyClass')
...     batch.add_reference(object_3, 'MyClass', 'myProp', object_4)
>>> # flush was called
>>> client.batch.shape
(0, 0)

For Case II:

>>> client.batch(batch_size=3)
>>> client.batch.shape
(0, 0)
>>> client.batch.add_data_object({}, 'MyClass')
>>> client.batch.add_reference(object_1, 'MyClass', 'myProp', object_2)
>>> client.batch.shape
(1, 1)
>>> client.batch.add_data_object({}, 'MyClass') # sum of data_objects and references reached
>>> client.batch.shape
(0, 0)

Or with a context manager and __call__ method:

>>> with client.batch(batch_size=3) as batch:
...     batch.add_data_object({}, 'MyClass')
...     batch.add_reference(object_3, 'MyClass', 'myProp', object_4)
...     batch.add_data_object({}, 'MyClass')
...     batch.add_reference(object_1, 'MyClass', 'myProp', object_4)
>>> # flush was called
>>> client.batch.shape
(0, 0)

Or with a context manager and setter:

>>> client.batch.batch_size = 3
>>> with client.batch as batch:
...     batch.add_data_object({}, 'MyClass')
...     batch.add_reference(object_3, 'MyClass', 'myProp', object_4)
...     batch.add_data_object({}, 'MyClass')
...     batch.add_reference(object_1, 'MyClass', 'myProp', object_4)
>>> # flush was called
>>> client.batch.shape
(0, 0)

For Case III: Same as Case II but you need to configure or enable ‘dynamic’ batching.

>>> client.batch.configure(batch_size=3, dynamic=True) # 'batch_size' must be an valid int

Or:

>>> client.batch.batch_size = 3
>>> client.batch.dynamic = True

See the documentation of the configure`( or `__call__) and the setters for more information on how/why and what you need to configure/set in order to use a particular Case.

Initialize a Batch class instance. This defaults to manual creation configuration. See docs for the configure or __call__ method for different types of configurations.

Parameters

connectionweaviate.connect.Connection

Connection object to an active and running weaviate instance.

add_data_object(data_object: dict, class_name: str, uuid: str | UUID | None = None, vector: Sequence | None = None, tenant: str | None = None) str[source]

Add one object to this batch. NOTE: If the UUID of one of the objects already exists then the existing object will be replaced by the new object.

Parameters

data_objectdict

Object to be added as a dict datatype.

class_namestr

The name of the class this object belongs to.

uuidOptional[UUID], optional

The UUID of the object as an uuid.UUID object or str. It can be a Weaviate beacon or Weaviate href. If it is None an UUIDv4 will generated, by default None

vector: Sequence or None, optional

The embedding of the object that should be validated. Can be used when:

  • a class does not have a vectorization module.

  • The given vector was generated using the _identical_ vectorization module that is configured for the

class. In this case this vector takes precedence.

Supported types are list, ‘numpy.ndarray`, torch.Tensor and tf.Tensor, by default None.

Returns

str

The UUID of the added object. If one was not provided a UUIDv4 will be generated.

Raises

TypeError

If an argument passed is not of an appropriate type.

ValueError

If ‘uuid’ is not of a proper form.

add_reference(from_object_uuid: str | UUID, from_object_class_name: str, from_property_name: str, to_object_uuid: str | UUID, to_object_class_name: str | None = None, tenant: str | None = None) None[source]

Add one reference to this batch.

Parameters

from_object_uuidUUID

The UUID of the object, as an uuid.UUID object or str, that should reference another object. It can be a Weaviate beacon or Weaviate href.

from_object_class_namestr

The name of the class that should reference another object.

from_property_namestr

The name of the property that contains the reference.

to_object_uuidUUID

The UUID of the object, as an uuid.UUID object or str, that is actually referenced. It can be a Weaviate beacon or Weaviate href.

to_object_class_nameOptional[str], optional

The referenced object class name to which to add the reference (with UUID to_object_uuid), it is included in Weaviate 1.14.0, where all objects are namespaced by class name. STRONGLY recommended to set it with Weaviate >= 1.14.0. It will be required in future versions of Weaviate Server and Clients. Use None value ONLY for Weaviate < v1.14.0, by default None

tenant: str, optional

Name of the tenant.

Raises

TypeError

If arguments are not of type str.

ValueError

If ‘uuid’ is not valid or cannot be extracted.

property batch_size: int | None

Setter and Getter for batch_size.

Parameters

valueOptional[int]

Setter ONLY: The new value for the batch_size. If NOT None it will try to auto-create the existing data if it meets the requirements. If previous value was None then it will be set to new value and will change the batching type to auto-create with dynamic set to False. See the documentation for configure or __call__ for more info. If recommended_num_objects is None then it is initialized with the new value of the batch_size (same for references).

Returns

Optional[int]

Getter ONLY: The current value of the batch_size. It is NOT the current number of data in the Batch. See the documentation for configure or __call__ for more info.

Raises

TypeError

Setter ONLY: If the new value is not of type int.

ValueError

Setter ONLY: If the new value has a non positive value.

configure(batch_size: int | None = 50, creation_time: ~numbers.Real | None = None, timeout_retries: int = 3, connection_error_retries: int = 3, weaviate_error_retries: ~weaviate.batch.crud_batch.WeaviateErrorRetryConf | None = None, callback: ~typing.Callable[[~typing.List[dict]], None] | None = <function check_batch_result>, dynamic: bool = True, num_workers: int = 1, consistency_level: ~weaviate.data.replication.replication.ConsistencyLevel | None = None) Batch[source]

Warnings

  • It has default values and if you want to change only one use a setter instead or

provide all the configurations, both the old and new ones.
  • This method will return None in the next major release. If you are using the returned

Batch object then you should start using the client.batch object instead.

Parameters

batch_sizeOptional[int], optional

The batch size to be use. This value sets the Batch functionality, if batch_size is None then no auto-creation is done (callback and dynamic are ignored). If it is a positive number auto-creation is enabled and the value represents: 1) in case dynamic is False -> the number of data in the Batch (sum of objects and references) when to auto-create; 2) in case dynamic is True -> the initial value for both recommended_num_objects and recommended_num_references, by default 50

creation_timeReal, optional

How long it should take to create a Batch. Used ONLY for computing dynamic batch sizes. By default None

timeout_retriesint, optional

Number of retries to create a Batch that failed with ReadTimeout, by default 3

connection_error_retriesint, optional

Number of retries to create a Batch that failed with ConnectionError, by default 3

weaviate_error_retries: WeaviateErrorRetryConf, Optional

How often batch-elements with an error originating from weaviate (for example transformer timeouts) should be retried and which errors should be ignored and/or included. See documentation for WeaviateErrorRetryConf for details.

callbackOptional[Callable[[dict], None]], optional

A callback function on the results of each (objects and references) batch types. By default weaviate.util.check_batch_result

dynamicbool, optional

Whether to use dynamic batching or not, by default True

num_workersint, optional

The maximal number of concurrent threads to run batch import. Only used for non-MANUAL batching. i.e. is used only with AUTO or DYNAMIC batching. By default, the multi-threading is disabled. Use with care to not overload your weaviate instance.

Returns

Batch

Updated self.

Raises

TypeError

If one of the arguments is of a wrong type.

ValueError

If the value of one of the arguments is wrong.

property connection_error_retries: int

Setter and Getter for connection_error_retries.

Properties

valueint

Setter ONLY: The new value for connection_error_retries.

Returns

int

Getter ONLY: The connection_error_retries value.

Raises

TypeError

Setter ONLY: If the new value is not of type int.

ValueError

Setter ONLY: If the new value has a non positive value.

property consistency_level: str | None
create_objects() list[source]

Creates multiple Objects at once in Weaviate. This does not guarantee that each batch item is added/created to the Weaviate server. This can lead to a successful batch creation but unsuccessful per batch item creation. See the example bellow. NOTE: If the UUID of one of the objects already exists then the existing object will be replaced by the new object.

Examples

Here client is an instance of the weaviate.Client.

Add objects to the object batch.

>>> client.batch.add_data_object({}, 'NonExistingClass')
>>> client.batch.add_data_object({}, 'ExistingClass')

Note that ‘NonExistingClass’ is not present in the client’s schema and ‘ExistingObject’ is present and has no proprieties. ‘client.batch.add_data_object’ does not raise an exception because the objects added meet the required criteria (See the documentation of the ‘weaviate.Batch.add_data_object’ method for more information).

>>> result = client.batch.create_objects(batch)

Successful batch creation even if one data object is inconsistent with the client’s schema. We can find out more about what objects were successfully created by analyzing the ‘result’ variable.

>>> import json
>>> print(json.dumps(result, indent=4))
[
    {
        "class": "NonExistingClass",
        "creationTimeUnix": 1614852753747,
        "id": "154cbccd-89f4-4b29-9c1b-001a3339d89a",
        "properties": {},
        "deprecations": null,
        "result": {
            "errors": {
                "error": [
                    {
                        "message": "class 'NonExistingClass' not present in schema,
                                                    class NonExistingClass not present"
                    }
                ]
            }
        }
    },
    {
        "class": "ExistingClass",
        "creationTimeUnix": 1614852753746,
        "id": "b7b1cfbe-20da-496c-b932-008d35805f26",
        "properties": {},
        "vector": [
            -0.05244319,
            ...
            0.076136276
        ],
        "deprecations": null,
        "result": {}
    }
]

As it can be noticed the first object from the batch was not added/created, but the batch was successfully created. The batch creation can be successful even if all the objects were NOT created. Check the status of the batch objects to find which object and why creation failed. Alternatively use ‘client.data_object.create’ for Object creation that throw an error if data item is inconsistent or creation/addition failed.

To check the results of batch creation when using the auto-creation Batch, use a ‘callback’ (see the docs configure or __call__ method for more information).

Returns

list

A list with the status of every object that was created.

Raises

requests.ConnectionError

If the network connection to weaviate fails.

weaviate.UnexpectedStatusCodeException

If weaviate reports a none OK status.

create_references() list[source]

Creates multiple References at once in Weaviate. Adding References in batch is faster but it ignores validations like class name and property name, resulting in a SUCCESSFUL reference creation of a nonexistent object types and/or a nonexistent properties. If the consistency of the References is wanted use ‘client.data_object.reference.add’ to have additional validation against the weaviate schema. See Examples below.

Examples

Here client is an instance of the weaviate.Client.

Object that does not exist in weaviate.

>>> object_1 = '154cbccd-89f4-4b29-9c1b-001a3339d89d'

Objects that exist in weaviate.

>>> object_2 = '154cbccd-89f4-4b29-9c1b-001a3339d89c'
>>> object_3 = '254cbccd-89f4-4b29-9c1b-001a3339d89a'
>>> object_4 = '254cbccd-89f4-4b29-9c1b-001a3339d89b'
>>> client.batch.add_reference(object_1, 'NonExistingClass', 'existsWith', object_2)
>>> client.batch.add_reference(object_3, 'ExistingClass', 'existsWith', object_4)

Both references were added to the batch request without error because they meet the required criteria (See the documentation of the ‘weaviate.Batch.add_reference’ method for more information).

>>> result = client.batch.create_references()

As it can be noticed the reference batch creation is successful (no error thrown). Now we can inspect the ‘result’.

>>> import json
>>> print(json.dumps(result, indent=4))
[
    {
        "from": "weaviate://localhost/NonExistingClass/
                                        154cbccd-89f4-4b29-9c1b-001a3339d89a/existsWith",
        "to": "weaviate://localhost/154cbccd-89f4-4b29-9c1b-001a3339d89b",
        "result": {
            "status": "SUCCESS"
        }
    },
    {
        "from": "weaviate://localhost/ExistingClass/
                                        254cbccd-89f4-4b29-9c1b-001a3339d89a/existsWith",
        "to": "weaviate://localhost/254cbccd-89f4-4b29-9c1b-001a3339d89b",
        "result": {
            "status": "SUCCESS"
        }
    }
]

Both references were added successfully but one of them is corrupted (links two objects of nonexisting class and one of the objects is not yet created). To make use of the validation, crete each references individually (see the client.data_object.reference.add method).

Returns

list

A list with the status of every reference added.

Raises

requests.ConnectionError

If the network connection to weaviate fails.

weaviate.UnexpectedStatusCodeException

If weaviate reports a none OK status.

property creation_time: Real

Setter and Getter for creation_time.

Parameters

valueReal

Setter ONLY: Set new value to creation_time. The recommended_num_objects/references values are updated to this new value. If the batch_size is not None it will auto-create the batch if the requirements are met.

Returns

Real

Getter ONLY: The creation_time value.

Raises

TypeError

Setter ONLY: If the new value is not of type Real.

ValueError

Setter ONLY: If the new value has a non positive value.

delete_objects(class_name: str, where: dict, output: str = 'minimal', dry_run: bool = False, tenant: str | None = None) dict[source]

Delete objects that match the ‘match’ in batch.

Parameters

class_namestr

The class name for which to delete objects.

wheredict

The content of the where filter used to match objects that should be deleted.

outputstr, optional

The control of the verbosity of the output, possible values: - “minimal” : The result only includes counts. Information about objects is omitted if the deletes were successful. Only if an error occurred will the object be described. - “verbose” : The result lists all affected objects with their ID and deletion status, including both successful and unsuccessful deletes. By default “minimal”

dry_runbool, optional

If True, objects will not be deleted yet, but merely listed, by default False

Examples

If we want to delete all the data objects that contain the word ‘weather’ we can do it like this:

>>> result = client.batch.delete_objects(
...     class_name='Dataset',
...     output='verbose',
...     dry_run=False,
...     where={
...         'operator': 'Equal',
...         'path': ['description'],
...         'valueText': 'weather'
...     }
... )
>>> print(json.dumps(result, indent=4))
{
    "dryRun": false,
    "match": {
        "class": "Dataset",
        "where": {
            "operands": null,
            "operator": "Equal",
            "path": [
                "description"
            ],
            "valueText": "weather"
        }
    },
    "output": "verbose",
    "results": {
        "failed": 0,
        "limit": 10000,
        "matches": 2,
        "objects": [
            {
                "id": "1eb28f69-c66e-5411-bad4-4e14412b65cd",
                "status": "SUCCESS"
            },
            {
                "id": "da217bdd-4c7c-5568-9576-ebefe17688ba",
                "status": "SUCCESS"
            }
        ],
        "successful": 2
    }
}

Returns

dict

The result/status of the batch delete.

property dynamic: bool

Setter and Getter for dynamic.

Parameters

valuebool

Setter ONLY: En/dis-able the dynamic batching. If batch_size is None the value is not set, otherwise it will set the dynamic to new value and auto-create if it meets the requirements.

Returns

bool

Getter ONLY: Wether the dynamic batching is enabled.

Raises

TypeError

Setter ONLY: If the new value is not of type bool.

empty_objects() None[source]

Remove all the objects from the batch.

empty_references() None[source]

Remove all the references from the batch.

flush() None[source]

Flush both objects and references to the Weaviate server and call the callback function if one is provided. (See the docs for configure or __call__ for how to set one.)

is_empty_objects() bool[source]

Check if batch contains any objects.

Returns

bool

Whether the Batch object list is empty.

is_empty_references() bool[source]

Check if batch contains any references.

Returns

bool

Whether the Batch reference list is empty.

num_objects() int[source]

Get current number of objects in the batch.

Returns

int

The number of objects in the batch.

num_references() int[source]

Get current number of references in the batch.

Returns

int

The number of references in the batch.

pop_object(index: int = -1) dict[source]

Remove and return the object at index (default last).

Parameters

indexint, optional

The index of the object to pop, by default -1 (last item).

Returns

dict

The popped object.

Raises

IndexError

If batch is empty or index is out of range.

pop_reference(index: int = -1) dict[source]

Remove and return the reference at index (default last).

Parameters

indexint, optional

The index of the reference to pop, by default -1 (last item).

Returns

dict

The popped reference.

Raises

IndexError

If batch is empty or index is out of range.

property recommended_num_objects: int | None

The recommended number of objects per batch. If None then it could not be computed.

Returns

Optional[int]

The recommended number of objects per batch. If None then it could not be computed.

property recommended_num_references: int | None

The recommended number of references per batch. If None then it could not be computed.

Returns

Optional[int]

The recommended number of references per batch. If None then it could not be computed.

property shape: Tuple[int, int]

Get current number of objects and references in the batch.

Returns

Tuple[int, int]

The number of objects and references, respectively, in the batch as a tuple, i.e. returns (number of objects, number of references).

shutdown() None[source]

Shutdown the BatchExecutor.

start() Batch[source]

Start the BatchExecutor if it was closed.

Returns

Batch

Updated self.

property timeout_retries: int

Setter and Getter for timeout_retries.

Properties

valueint

Setter ONLY: The new value for timeout_retries.

Returns

int

Getter ONLY: The timeout_retries value.

Raises

TypeError

Setter ONLY: If the new value is not of type int.

ValueError

Setter ONLY: If the new value has a non positive value.

wait_for_vector_indexing(shards: List[Shard] | None = None, how_many_failures: int = 5) None[source]

Wait for the all the vectors of the batch imported objects to be indexed.

Upon network error, it will retry to get the shards’ status for how_many_failures times with exponential backoff (2**n seconds with n=0,1,2,…,how_many_failures).

Parameters

shards {Optional[List[Shard]]} – The shards to check the status of. If None it will

check the status of all the shards of the imported objects in the batch.

how_many_failures {int} – How many times to try to get the shards’ status before

raising an exception. Default 5.

class weaviate.batch.Shard(class_name: str, tenant: str | None = None)[source]

Bases: object

class_name: str
tenant: str | None = None
class weaviate.batch.WeaviateErrorRetryConf(number_retries: int = 3, errors_to_exclude: List[str] | None = None, errors_to_include: List[str] | None = None)[source]

Bases: object

Configures how often objects should be retried when Weaviate returns an error and which errors should be included or excluded. By default, all errors are retried.

Parameters

number_retries: int

How often a batch that includes objects with errors should be retried. Must be >=1.

errors_to_exclude: Optional[List[str]]

Which errors should NOT be retried. All other errors will be retried. An object will be skipped, when the given string is part of the weaviate error message.

Example: errors_to_exclude =[“string1”, “string2”] will match the error with message “Long error message that contains string1”.

errors_to_include: Optional[List[str]]

Which errors should be retried. All other errors will NOT be retried. An object will be included, when the given string is part of the weaviate error message.

Example: errors_to_include =[“string1”, “string2”] will match the error with message “Long error message that contains string1”.

errors_to_exclude: List[str] | None = None
errors_to_include: List[str] | None = None
number_retries: int = 3

Submodules

weaviate.batch.crud_batch module

Batch class definitions.

class weaviate.batch.crud_batch.Batch(connection: Connection)[source]

Bases: object

Batch class used to add multiple objects or object references at once into weaviate. To add data to the Batch use these methods of this class: add_data_object and add_reference. This object also stores 2 recommended batch size variables, one for objects and one for references. The recommended batch size is updated with every batch creation, and is the number of data objects/references that can be sent/processed by the Weaviate server in creation_time interval (see configure or __call__ method on how to set this value, by default it is set to 10). The initial value is None/batch_size and is updated with every batch create methods. The values can be accessed with the getters: recommended_num_objects and recommended_num_references. NOTE: If the UUID of one of the objects already exists then the existing object will be replaced by the new object.

This class can be used in 3 ways:

Case I:

Everything should be done by the user, i.e. the user should add the objects/object-references and create them whenever the user wants. To create one of the data type use these methods of this class: create_objects, create_references and flush. This case has the Batch instance’s batch_size set to None (see docs for the configure or __call__ method). Can be used in a context manager, see below.

Case II:

Batch auto-creates when full. This can be achieved by setting the Batch instance’s batch_size set to a positive integer (see docs for the configure or __call__ method). The batch_size in this case corresponds to the sum of added objects and references. This case does not require the user to create the batch/s, but it can be done. Also to create non-full batches (last batch/es) that do not meet the requirement to be auto-created use the flush method. Can be used in a context manager, see below.

Case III:

Similar to Case II but uses dynamic batching, i.e. auto-creates either objects or references when one of them reached the recommended_num_objects or recommended_num_references respectively. See docs for the configure or __call__ method for how to enable it.

Context-manager support: Can be use with the with statement. When it exists the context-

manager it calls the flush method for you. Can be combined with configure/__call__ method, in order to set it to the desired Case.

Examples

Here are examples for each CASE described above. Here client is an instance of the weaviate.Client.

>>> object_1 = '154cbccd-89f4-4b29-9c1b-001a3339d89d'
>>> object_2 = '154cbccd-89f4-4b29-9c1b-001a3339d89c'
>>> object_3 = '254cbccd-89f4-4b29-9c1b-001a3339d89a'
>>> object_4 = '254cbccd-89f4-4b29-9c1b-001a3339d89b'

For Case I:

>>> client.batch.shape
(0, 0)
>>> client.batch.add_data_object({}, 'MyClass')
>>> client.batch.add_data_object({}, 'MyClass')
>>> client.batch.add_reference(object_1, 'MyClass', 'myProp', object_2)
>>> client.batch.shape
(2, 1)
>>> client.batch.create_objects()
>>> client.batch.shape
(0, 1)
>>> client.batch.create_references()
>>> client.batch.shape
(0, 0)
>>> client.batch.add_data_object({}, 'MyClass')
>>> client.batch.add_reference(object_3, 'MyClass', 'myProp', object_4)
>>> client.batch.shape
(1, 1)
>>> client.batch.flush()
>>> client.batch.shape
(0, 0)

Or with a context manager:

>>> with client.batch as batch:
...     batch.add_data_object({}, 'MyClass')
...     batch.add_reference(object_3, 'MyClass', 'myProp', object_4)
>>> # flush was called
>>> client.batch.shape
(0, 0)

For Case II:

>>> client.batch(batch_size=3)
>>> client.batch.shape
(0, 0)
>>> client.batch.add_data_object({}, 'MyClass')
>>> client.batch.add_reference(object_1, 'MyClass', 'myProp', object_2)
>>> client.batch.shape
(1, 1)
>>> client.batch.add_data_object({}, 'MyClass') # sum of data_objects and references reached
>>> client.batch.shape
(0, 0)

Or with a context manager and __call__ method:

>>> with client.batch(batch_size=3) as batch:
...     batch.add_data_object({}, 'MyClass')
...     batch.add_reference(object_3, 'MyClass', 'myProp', object_4)
...     batch.add_data_object({}, 'MyClass')
...     batch.add_reference(object_1, 'MyClass', 'myProp', object_4)
>>> # flush was called
>>> client.batch.shape
(0, 0)

Or with a context manager and setter:

>>> client.batch.batch_size = 3
>>> with client.batch as batch:
...     batch.add_data_object({}, 'MyClass')
...     batch.add_reference(object_3, 'MyClass', 'myProp', object_4)
...     batch.add_data_object({}, 'MyClass')
...     batch.add_reference(object_1, 'MyClass', 'myProp', object_4)
>>> # flush was called
>>> client.batch.shape
(0, 0)

For Case III: Same as Case II but you need to configure or enable ‘dynamic’ batching.

>>> client.batch.configure(batch_size=3, dynamic=True) # 'batch_size' must be an valid int

Or:

>>> client.batch.batch_size = 3
>>> client.batch.dynamic = True

See the documentation of the configure`( or `__call__) and the setters for more information on how/why and what you need to configure/set in order to use a particular Case.

Initialize a Batch class instance. This defaults to manual creation configuration. See docs for the configure or __call__ method for different types of configurations.

Parameters

connectionweaviate.connect.Connection

Connection object to an active and running weaviate instance.

add_data_object(data_object: dict, class_name: str, uuid: str | UUID | None = None, vector: Sequence | None = None, tenant: str | None = None) str[source]

Add one object to this batch. NOTE: If the UUID of one of the objects already exists then the existing object will be replaced by the new object.

Parameters

data_objectdict

Object to be added as a dict datatype.

class_namestr

The name of the class this object belongs to.

uuidOptional[UUID], optional

The UUID of the object as an uuid.UUID object or str. It can be a Weaviate beacon or Weaviate href. If it is None an UUIDv4 will generated, by default None

vector: Sequence or None, optional

The embedding of the object that should be validated. Can be used when:

  • a class does not have a vectorization module.

  • The given vector was generated using the _identical_ vectorization module that is configured for the

class. In this case this vector takes precedence.

Supported types are list, ‘numpy.ndarray`, torch.Tensor and tf.Tensor, by default None.

Returns

str

The UUID of the added object. If one was not provided a UUIDv4 will be generated.

Raises

TypeError

If an argument passed is not of an appropriate type.

ValueError

If ‘uuid’ is not of a proper form.

add_reference(from_object_uuid: str | UUID, from_object_class_name: str, from_property_name: str, to_object_uuid: str | UUID, to_object_class_name: str | None = None, tenant: str | None = None) None[source]

Add one reference to this batch.

Parameters

from_object_uuidUUID

The UUID of the object, as an uuid.UUID object or str, that should reference another object. It can be a Weaviate beacon or Weaviate href.

from_object_class_namestr

The name of the class that should reference another object.

from_property_namestr

The name of the property that contains the reference.

to_object_uuidUUID

The UUID of the object, as an uuid.UUID object or str, that is actually referenced. It can be a Weaviate beacon or Weaviate href.

to_object_class_nameOptional[str], optional

The referenced object class name to which to add the reference (with UUID to_object_uuid), it is included in Weaviate 1.14.0, where all objects are namespaced by class name. STRONGLY recommended to set it with Weaviate >= 1.14.0. It will be required in future versions of Weaviate Server and Clients. Use None value ONLY for Weaviate < v1.14.0, by default None

tenant: str, optional

Name of the tenant.

Raises

TypeError

If arguments are not of type str.

ValueError

If ‘uuid’ is not valid or cannot be extracted.

property batch_size: int | None

Setter and Getter for batch_size.

Parameters

valueOptional[int]

Setter ONLY: The new value for the batch_size. If NOT None it will try to auto-create the existing data if it meets the requirements. If previous value was None then it will be set to new value and will change the batching type to auto-create with dynamic set to False. See the documentation for configure or __call__ for more info. If recommended_num_objects is None then it is initialized with the new value of the batch_size (same for references).

Returns

Optional[int]

Getter ONLY: The current value of the batch_size. It is NOT the current number of data in the Batch. See the documentation for configure or __call__ for more info.

Raises

TypeError

Setter ONLY: If the new value is not of type int.

ValueError

Setter ONLY: If the new value has a non positive value.

configure(batch_size: int | None = 50, creation_time: ~numbers.Real | None = None, timeout_retries: int = 3, connection_error_retries: int = 3, weaviate_error_retries: ~weaviate.batch.crud_batch.WeaviateErrorRetryConf | None = None, callback: ~typing.Callable[[~typing.List[dict]], None] | None = <function check_batch_result>, dynamic: bool = True, num_workers: int = 1, consistency_level: ~weaviate.data.replication.replication.ConsistencyLevel | None = None) Batch[source]

Warnings

  • It has default values and if you want to change only one use a setter instead or

provide all the configurations, both the old and new ones.
  • This method will return None in the next major release. If you are using the returned

Batch object then you should start using the client.batch object instead.

Parameters

batch_sizeOptional[int], optional

The batch size to be use. This value sets the Batch functionality, if batch_size is None then no auto-creation is done (callback and dynamic are ignored). If it is a positive number auto-creation is enabled and the value represents: 1) in case dynamic is False -> the number of data in the Batch (sum of objects and references) when to auto-create; 2) in case dynamic is True -> the initial value for both recommended_num_objects and recommended_num_references, by default 50

creation_timeReal, optional

How long it should take to create a Batch. Used ONLY for computing dynamic batch sizes. By default None

timeout_retriesint, optional

Number of retries to create a Batch that failed with ReadTimeout, by default 3

connection_error_retriesint, optional

Number of retries to create a Batch that failed with ConnectionError, by default 3

weaviate_error_retries: WeaviateErrorRetryConf, Optional

How often batch-elements with an error originating from weaviate (for example transformer timeouts) should be retried and which errors should be ignored and/or included. See documentation for WeaviateErrorRetryConf for details.

callbackOptional[Callable[[dict], None]], optional

A callback function on the results of each (objects and references) batch types. By default weaviate.util.check_batch_result

dynamicbool, optional

Whether to use dynamic batching or not, by default True

num_workersint, optional

The maximal number of concurrent threads to run batch import. Only used for non-MANUAL batching. i.e. is used only with AUTO or DYNAMIC batching. By default, the multi-threading is disabled. Use with care to not overload your weaviate instance.

Returns

Batch

Updated self.

Raises

TypeError

If one of the arguments is of a wrong type.

ValueError

If the value of one of the arguments is wrong.

property connection_error_retries: int

Setter and Getter for connection_error_retries.

Properties

valueint

Setter ONLY: The new value for connection_error_retries.

Returns

int

Getter ONLY: The connection_error_retries value.

Raises

TypeError

Setter ONLY: If the new value is not of type int.

ValueError

Setter ONLY: If the new value has a non positive value.

property consistency_level: str | None
create_objects() list[source]

Creates multiple Objects at once in Weaviate. This does not guarantee that each batch item is added/created to the Weaviate server. This can lead to a successful batch creation but unsuccessful per batch item creation. See the example bellow. NOTE: If the UUID of one of the objects already exists then the existing object will be replaced by the new object.

Examples

Here client is an instance of the weaviate.Client.

Add objects to the object batch.

>>> client.batch.add_data_object({}, 'NonExistingClass')
>>> client.batch.add_data_object({}, 'ExistingClass')

Note that ‘NonExistingClass’ is not present in the client’s schema and ‘ExistingObject’ is present and has no proprieties. ‘client.batch.add_data_object’ does not raise an exception because the objects added meet the required criteria (See the documentation of the ‘weaviate.Batch.add_data_object’ method for more information).

>>> result = client.batch.create_objects(batch)

Successful batch creation even if one data object is inconsistent with the client’s schema. We can find out more about what objects were successfully created by analyzing the ‘result’ variable.

>>> import json
>>> print(json.dumps(result, indent=4))
[
    {
        "class": "NonExistingClass",
        "creationTimeUnix": 1614852753747,
        "id": "154cbccd-89f4-4b29-9c1b-001a3339d89a",
        "properties": {},
        "deprecations": null,
        "result": {
            "errors": {
                "error": [
                    {
                        "message": "class 'NonExistingClass' not present in schema,
                                                    class NonExistingClass not present"
                    }
                ]
            }
        }
    },
    {
        "class": "ExistingClass",
        "creationTimeUnix": 1614852753746,
        "id": "b7b1cfbe-20da-496c-b932-008d35805f26",
        "properties": {},
        "vector": [
            -0.05244319,
            ...
            0.076136276
        ],
        "deprecations": null,
        "result": {}
    }
]

As it can be noticed the first object from the batch was not added/created, but the batch was successfully created. The batch creation can be successful even if all the objects were NOT created. Check the status of the batch objects to find which object and why creation failed. Alternatively use ‘client.data_object.create’ for Object creation that throw an error if data item is inconsistent or creation/addition failed.

To check the results of batch creation when using the auto-creation Batch, use a ‘callback’ (see the docs configure or __call__ method for more information).

Returns

list

A list with the status of every object that was created.

Raises

requests.ConnectionError

If the network connection to weaviate fails.

weaviate.UnexpectedStatusCodeException

If weaviate reports a none OK status.

create_references() list[source]

Creates multiple References at once in Weaviate. Adding References in batch is faster but it ignores validations like class name and property name, resulting in a SUCCESSFUL reference creation of a nonexistent object types and/or a nonexistent properties. If the consistency of the References is wanted use ‘client.data_object.reference.add’ to have additional validation against the weaviate schema. See Examples below.

Examples

Here client is an instance of the weaviate.Client.

Object that does not exist in weaviate.

>>> object_1 = '154cbccd-89f4-4b29-9c1b-001a3339d89d'

Objects that exist in weaviate.

>>> object_2 = '154cbccd-89f4-4b29-9c1b-001a3339d89c'
>>> object_3 = '254cbccd-89f4-4b29-9c1b-001a3339d89a'
>>> object_4 = '254cbccd-89f4-4b29-9c1b-001a3339d89b'
>>> client.batch.add_reference(object_1, 'NonExistingClass', 'existsWith', object_2)
>>> client.batch.add_reference(object_3, 'ExistingClass', 'existsWith', object_4)

Both references were added to the batch request without error because they meet the required criteria (See the documentation of the ‘weaviate.Batch.add_reference’ method for more information).

>>> result = client.batch.create_references()

As it can be noticed the reference batch creation is successful (no error thrown). Now we can inspect the ‘result’.

>>> import json
>>> print(json.dumps(result, indent=4))
[
    {
        "from": "weaviate://localhost/NonExistingClass/
                                        154cbccd-89f4-4b29-9c1b-001a3339d89a/existsWith",
        "to": "weaviate://localhost/154cbccd-89f4-4b29-9c1b-001a3339d89b",
        "result": {
            "status": "SUCCESS"
        }
    },
    {
        "from": "weaviate://localhost/ExistingClass/
                                        254cbccd-89f4-4b29-9c1b-001a3339d89a/existsWith",
        "to": "weaviate://localhost/254cbccd-89f4-4b29-9c1b-001a3339d89b",
        "result": {
            "status": "SUCCESS"
        }
    }
]

Both references were added successfully but one of them is corrupted (links two objects of nonexisting class and one of the objects is not yet created). To make use of the validation, crete each references individually (see the client.data_object.reference.add method).

Returns

list

A list with the status of every reference added.

Raises

requests.ConnectionError

If the network connection to weaviate fails.

weaviate.UnexpectedStatusCodeException

If weaviate reports a none OK status.

property creation_time: Real

Setter and Getter for creation_time.

Parameters

valueReal

Setter ONLY: Set new value to creation_time. The recommended_num_objects/references values are updated to this new value. If the batch_size is not None it will auto-create the batch if the requirements are met.

Returns

Real

Getter ONLY: The creation_time value.

Raises

TypeError

Setter ONLY: If the new value is not of type Real.

ValueError

Setter ONLY: If the new value has a non positive value.

delete_objects(class_name: str, where: dict, output: str = 'minimal', dry_run: bool = False, tenant: str | None = None) dict[source]

Delete objects that match the ‘match’ in batch.

Parameters

class_namestr

The class name for which to delete objects.

wheredict

The content of the where filter used to match objects that should be deleted.

outputstr, optional

The control of the verbosity of the output, possible values: - “minimal” : The result only includes counts. Information about objects is omitted if the deletes were successful. Only if an error occurred will the object be described. - “verbose” : The result lists all affected objects with their ID and deletion status, including both successful and unsuccessful deletes. By default “minimal”

dry_runbool, optional

If True, objects will not be deleted yet, but merely listed, by default False

Examples

If we want to delete all the data objects that contain the word ‘weather’ we can do it like this:

>>> result = client.batch.delete_objects(
...     class_name='Dataset',
...     output='verbose',
...     dry_run=False,
...     where={
...         'operator': 'Equal',
...         'path': ['description'],
...         'valueText': 'weather'
...     }
... )
>>> print(json.dumps(result, indent=4))
{
    "dryRun": false,
    "match": {
        "class": "Dataset",
        "where": {
            "operands": null,
            "operator": "Equal",
            "path": [
                "description"
            ],
            "valueText": "weather"
        }
    },
    "output": "verbose",
    "results": {
        "failed": 0,
        "limit": 10000,
        "matches": 2,
        "objects": [
            {
                "id": "1eb28f69-c66e-5411-bad4-4e14412b65cd",
                "status": "SUCCESS"
            },
            {
                "id": "da217bdd-4c7c-5568-9576-ebefe17688ba",
                "status": "SUCCESS"
            }
        ],
        "successful": 2
    }
}

Returns

dict

The result/status of the batch delete.

property dynamic: bool

Setter and Getter for dynamic.

Parameters

valuebool

Setter ONLY: En/dis-able the dynamic batching. If batch_size is None the value is not set, otherwise it will set the dynamic to new value and auto-create if it meets the requirements.

Returns

bool

Getter ONLY: Wether the dynamic batching is enabled.

Raises

TypeError

Setter ONLY: If the new value is not of type bool.

empty_objects() None[source]

Remove all the objects from the batch.

empty_references() None[source]

Remove all the references from the batch.

flush() None[source]

Flush both objects and references to the Weaviate server and call the callback function if one is provided. (See the docs for configure or __call__ for how to set one.)

is_empty_objects() bool[source]

Check if batch contains any objects.

Returns

bool

Whether the Batch object list is empty.

is_empty_references() bool[source]

Check if batch contains any references.

Returns

bool

Whether the Batch reference list is empty.

num_objects() int[source]

Get current number of objects in the batch.

Returns

int

The number of objects in the batch.

num_references() int[source]

Get current number of references in the batch.

Returns

int

The number of references in the batch.

pop_object(index: int = -1) dict[source]

Remove and return the object at index (default last).

Parameters

indexint, optional

The index of the object to pop, by default -1 (last item).

Returns

dict

The popped object.

Raises

IndexError

If batch is empty or index is out of range.

pop_reference(index: int = -1) dict[source]

Remove and return the reference at index (default last).

Parameters

indexint, optional

The index of the reference to pop, by default -1 (last item).

Returns

dict

The popped reference.

Raises

IndexError

If batch is empty or index is out of range.

property recommended_num_objects: int | None

The recommended number of objects per batch. If None then it could not be computed.

Returns

Optional[int]

The recommended number of objects per batch. If None then it could not be computed.

property recommended_num_references: int | None

The recommended number of references per batch. If None then it could not be computed.

Returns

Optional[int]

The recommended number of references per batch. If None then it could not be computed.

property shape: Tuple[int, int]

Get current number of objects and references in the batch.

Returns

Tuple[int, int]

The number of objects and references, respectively, in the batch as a tuple, i.e. returns (number of objects, number of references).

shutdown() None[source]

Shutdown the BatchExecutor.

start() Batch[source]

Start the BatchExecutor if it was closed.

Returns

Batch

Updated self.

property timeout_retries: int

Setter and Getter for timeout_retries.

Properties

valueint

Setter ONLY: The new value for timeout_retries.

Returns

int

Getter ONLY: The timeout_retries value.

Raises

TypeError

Setter ONLY: If the new value is not of type int.

ValueError

Setter ONLY: If the new value has a non positive value.

wait_for_vector_indexing(shards: List[Shard] | None = None, how_many_failures: int = 5) None[source]

Wait for the all the vectors of the batch imported objects to be indexed.

Upon network error, it will retry to get the shards’ status for how_many_failures times with exponential backoff (2**n seconds with n=0,1,2,…,how_many_failures).

Parameters

shards {Optional[List[Shard]]} – The shards to check the status of. If None it will

check the status of all the shards of the imported objects in the batch.

how_many_failures {int} – How many times to try to get the shards’ status before

raising an exception. Default 5.

class weaviate.batch.crud_batch.BatchExecutor(max_workers=None, thread_name_prefix='', initializer=None, initargs=())[source]

Bases: ThreadPoolExecutor

Weaviate Batch Executor to run batch requests in separate thread. This class implements an additional method is_shutdown that us used my the context manager.

Initializes a new ThreadPoolExecutor instance.

Args:
max_workers: The maximum number of threads that can be used to

execute the given calls.

thread_name_prefix: An optional name prefix to give our threads. initializer: A callable used to initialize worker threads. initargs: A tuple of arguments to pass to the initializer.

is_shutdown() bool[source]

Check if executor is shutdown.

Returns

bool

Whether the BatchExecutor is shutdown.

class weaviate.batch.crud_batch.Shard(class_name: str, tenant: str | None = None)[source]

Bases: object

class_name: str
tenant: str | None = None
class weaviate.batch.crud_batch.WeaviateErrorRetryConf(number_retries: int = 3, errors_to_exclude: List[str] | None = None, errors_to_include: List[str] | None = None)[source]

Bases: object

Configures how often objects should be retried when Weaviate returns an error and which errors should be included or excluded. By default, all errors are retried.

Parameters

number_retries: int

How often a batch that includes objects with errors should be retried. Must be >=1.

errors_to_exclude: Optional[List[str]]

Which errors should NOT be retried. All other errors will be retried. An object will be skipped, when the given string is part of the weaviate error message.

Example: errors_to_exclude =[“string1”, “string2”] will match the error with message “Long error message that contains string1”.

errors_to_include: Optional[List[str]]

Which errors should be retried. All other errors will NOT be retried. An object will be included, when the given string is part of the weaviate error message.

Example: errors_to_include =[“string1”, “string2”] will match the error with message “Long error message that contains string1”.

errors_to_exclude: List[str] | None = None
errors_to_include: List[str] | None = None
number_retries: int = 3

weaviate.batch.requests module

BatchRequest class definitions.

class weaviate.batch.requests.BatchRequest[source]

Bases: ABC

BatchRequest abstract class used as a interface for batch requests.

abstract add(*args, **kwargs)[source]

Add objects to BatchRequest.

abstract add_failed_objects_from_response(response_item: List[Dict[str, Any]], errors_to_exclude: List[str] | None, errors_to_include: List[str] | None) List[Dict[str, Any]][source]

Add failed items from a weaviate response.

Parameters

response_itemBatchResponse

Weaviate response that contains the status for all objects.

errors_to_excludeOptional[List[str]]

Which errors should NOT be retried.

errors_to_includeOptional[List[str]]

Which errors should be retried.

Returns

BatchResponse: Contains responses form all successful object, eg. those that have not been added to this batch.

empty() None[source]

Remove all the items from the BatchRequest.

abstract get_request_body() List[Dict[str, Any]] | Dict[str, Any][source]

Return the request body to be digested by weaviate that contains all batch items.

is_empty() bool[source]

Check if BatchRequest is empty.

Returns

bool

Whether the BatchRequest is empty.

pop(index: int = -1) dict[source]

Remove and return item at index (default last).

Parameters

indexint, optional

The index of the item to pop, by default -1 (last item).

Returns

dict

The popped item.

Raises

IndexError

If batch is empty or index is out of range.

class weaviate.batch.requests.ObjectsBatchRequest[source]

Bases: BatchRequest

Collect objects for one batch request to weaviate. Caution this batch will not be validated through weaviate.

add(data_object: dict, class_name: str, uuid: str | UUID | None = None, vector: Sequence | None = None, tenant: str | None = None) str[source]

Add one object to this batch. Does NOT validate the consistency of the object against the client’s schema. Checks the arguments’ type and UUIDs’ format.

Parameters

class_namestr

The name of the class this object belongs to.

data_objectdict

Object to be added as a dict datatype.

uuidstr or None, optional

UUID of the object as a string, by default None

vector: Sequence or None, optional

The embedding of the object that should be validated. Can be used when:

  • a class does not have a vectorization module.

  • The given vector was generated using the _identical_ vectorization module that is configured for the

class. In this case this vector takes precedence.

Supported types are list, ‘numpy.ndarray`, torch.Tensor and tf.Tensor, by default None.

tenant: str, optional

Tenant of the object

Returns

str

The UUID of the added object. If one was not provided a UUIDv3 will be generated.

Raises

TypeError

If an argument passed is not of an appropriate type.

ValueError

If ‘uuid’ is not of a proper form.

add_failed_objects_from_response(response: List[Dict[str, Any]], errors_to_exclude: List[str] | None, errors_to_include: List[str] | None) List[Dict[str, Any]][source]

Add failed items from a weaviate response.

Parameters

response_itemBatchResponse

Weaviate response that contains the status for all objects.

errors_to_excludeOptional[List[str]]

Which errors should NOT be retried.

errors_to_includeOptional[List[str]]

Which errors should be retried.

Returns

BatchResponse: Contains responses form all successful object, eg. those that have not been added to this batch.

get_request_body() Dict[str, Any][source]

Get the request body as it is needed for the Weaviate server.

Returns

dict

The request body as a dict.

class weaviate.batch.requests.ReferenceBatchRequest[source]

Bases: BatchRequest

Collect Weaviate-object references to add them in one request to Weaviate. Caution this request will miss some validations to be faster.

add(from_object_class_name: str, from_object_uuid: str | UUID, from_property_name: str, to_object_uuid: str | UUID, to_object_class_name: str | None = None, tenant: str | None = None) None[source]

Add one Weaviate-object reference to this batch. Does NOT validate the consistency of the reference against the class schema. Checks the arguments’ type and UUIDs’ format.

Parameters

from_object_class_namestr

The name of the class that should reference another object.

from_object_uuidstr

The UUID or URL of the object that should reference another object.

from_property_namestr

The name of the property that contains the reference.

to_object_uuidstr

The UUID or URL of the object that is actually referenced.

to_object_class_nameOptional[str], optional

The referenced object class name to which to add the reference (with UUID to_object_uuid), it is included in Weaviate 1.14.0, where all objects are namespaced by class name. STRONGLY recommended to set it with Weaviate >= 1.14.0. It will be required in future versions of Weaviate Server and Clients. Use None value ONLY for Weaviate < v1.14.0, by default None

Raises

TypeError

If arguments are not of type str.

ValueError

If ‘uuid’ is not valid or cannot be extracted.

add_failed_objects_from_response(response: List[Dict[str, Any]], errors_to_exclude: List[str] | None, errors_to_include: List[str] | None) List[Dict[str, Any]][source]

Add failed items from a weaviate response.

Parameters

response_itemBatchResponse

Weaviate response that contains the status for all objects.

errors_to_excludeOptional[List[str]]

Which errors should NOT be retried.

errors_to_includeOptional[List[str]]

Which errors should be retried.

Returns

BatchResponse: Contains responses form all successful object, eg. those that have not been added to this batch.

get_request_body() List[Dict[str, Any]][source]

Get request body as a list of dictionaries, where each dictionary is a Weaviate-object reference.

Returns

List[dict]

A list of Weaviate-objects references as dictionaries.