weaviate.batch

Module for uploading objects and references to Weaviate in batches.

class weaviate.batch.Batch(connection: weaviate.connect.connection.Connection)

Bases: object

Batch class used to add multiple objects or object references at once into weaviate. To add data to the Batch use these methods of this class: add_data_object and add_reference. This object also stores 2 recommended batch size variables, one for objects and one for references. The recommended batch size is updated with every batch creation, and is the number of data objects/references that can be sent/processed by the Weaviate server in creation_time interval (see configure or __call__ method on how to set this value, by default it is set to 10). The initial value is None/batch_size and is updated with every batch create methods. The values can be accessed with the getters: recommended_num_objects and recommended_num_references. NOTE: If the UUID of one of the objects already exists then the existing object will be replaced by the new object.

This class can be used in 3 ways:

Case I:

Everything should be done by the user, i.e. the user should add the objects/object-references and create them whenever the user wants. To create one of the data type use these methods of this class: create_objects, create_references and flush. This case has the Batch instance’s batch_size set to None (see docs for the configure or __call__ method). Can be used in a context manager, see below.

Case II:

Batch auto-creates when full. This can be achieved by setting the Batch instance’s batch_size set to a positive integer (see docs for the configure or __call__ method). The batch_size in this case corresponds to the sum of added objects and references. This case does not require the user to create the batch/s, but it can be done. Also to create non-full batches (last batch/es) that do not meet the requirement to be auto-created use the flush method. Can be used in a context manager, see below.

Case III:

Similar to Case II but uses dynamic batching, i.e. auto-creates either objects or references when one of them reached the recommended_num_objects or recommended_num_references respectively. See docs for the configure or __call__ method for how to enable it.

Context-manager support: Can be use with the with statement. When it exists the context-

manager it calls the flush method for you. Can be combined with configure/__call__ method, in order to set it to the desired Case.

Examples

Here are examples for each CASE described above. Here client is an instance of the weaviate.Client.

>>> object_1 = '154cbccd-89f4-4b29-9c1b-001a3339d89d'
>>> object_2 = '154cbccd-89f4-4b29-9c1b-001a3339d89c'
>>> object_3 = '254cbccd-89f4-4b29-9c1b-001a3339d89a'
>>> object_4 = '254cbccd-89f4-4b29-9c1b-001a3339d89b'

For Case I:

>>> client.batch.shape
(0, 0)
>>> client.batch.add_data_object({}, 'MyClass')
>>> client.batch.add_data_object({}, 'MyClass')
>>> client.batch.add_reference(object_1, 'MyClass', 'myProp', object_2)
>>> client.batch.shape
(2, 1)
>>> client.batch.create_objects()
>>> client.batch.shape
(0, 1)
>>> client.batch.create_references()
>>> client.batch.shape
(0, 0)
>>> client.batch.add_data_object({}, 'MyClass')
>>> client.batch.add_reference(object_3, 'MyClass', 'myProp', object_4)
>>> client.batch.shape
(1, 1)
>>> client.batch.flush()
>>> client.batch.shape
(0, 0)

Or with a context manager:

>>> with client.batch as batch:
...     batch.add_data_object({}, 'MyClass')
...     batch.add_reference(object_3, 'MyClass', 'myProp', object_4)
>>> # flush was called
>>> client.batch.shape
(0, 0)

For Case II:

>>> client.batch(batch_size=3)
>>> client.batch.shape
(0, 0)
>>> client.batch.add_data_object({}, 'MyClass')
>>> client.batch.add_reference(object_1, 'MyClass', 'myProp', object_2)
>>> client.batch.shape
(1, 1)
>>> client.batch.add_data_object({}, 'MyClass') # sum of data_objects and references reached
>>> client.batch.shape
(0, 0)

Or with a context manager and __call__ method:

>>> with client.batch(batch_size=3) as batch:
...     batch.add_data_object({}, 'MyClass')
...     batch.add_reference(object_3, 'MyClass', 'myProp', object_4)
...     batch.add_data_object({}, 'MyClass')
...     batch.add_reference(object_1, 'MyClass', 'myProp', object_4)
>>> # flush was called
>>> client.batch.shape
(0, 0)

Or with a context manager and setter:

>>> client.batch.batch_size = 3
>>> with client.batch as batch:
...     batch.add_data_object({}, 'MyClass')
...     batch.add_reference(object_3, 'MyClass', 'myProp', object_4)
...     batch.add_data_object({}, 'MyClass')
...     batch.add_reference(object_1, 'MyClass', 'myProp', object_4)
>>> # flush was called
>>> client.batch.shape
(0, 0)

For Case III: Same as Case II but you need to configure or enable ‘dynamic’ batching.

>>> client.batch.configure(batch_size=3, dynamic=True) # 'batch_size' must be an valid int

Or:

>>> client.batch.batch_size = 3
>>> client.batch.dynamic = True

See the documentation of the configure`( or `__call__) and the setters for more information on how/why and what you need to configure/set in order to use a particular Case.

Initialize a Batch class instance. This defaults to manual creation configuration. See docs for the configure or __call__ method for different types of configurations.

Parameters

connection (weaviate.connect.Connection) – Connection object to an active and running weaviate instance.

__call__(batch_size: typing.Optional[int] = None, creation_time: numbers.Real = 10, timeout_retries: int = 0, callback: typing.Optional[typing.Callable[[dict], None]] = <function check_batch_result>, dynamic: bool = False) weaviate.batch.crud_batch.Batch

Configure the instance to your needs. (__call__ and configure methods are the same). NOTE: It has default values and if you want to change only one use a setter instead, or provide all the configurations.

Parameters
  • batch_size (Optional[int], optional) – The batch size to be use. This value sets the Batch functionality, if batch_size is None then no auto-creation is done (callback and dynamic are ignored). If it is a positive number auto-creation is enabled and the value represents: 1) in case dynamic is False -> the number of data in the Batch (sum of objects and references) when to auto-create; 2) in case dynamic is True -> the initial value for both recommended_num_objects and recommended_num_references, by default None

  • creation_time (Real, optional) – The time interval it should take the Batch to be created, used ONLY for computing recommended_num_objects and recommended_num_references, by default 10

  • timeout_retries (int, optional) – Number of times to retry to create a Batch that failed with TimeOut error, by default 0

  • callback (Optional[Callable[[dict], None]], optional) – A callback function on the results of each (objects and references) batch types. By default weaviate.util.check_batch_result

  • dynamic (bool, optional) – Whether to use dynamic batching or not, by default False

Returns

Updated self.

Return type

Batch

Raises
  • TypeError – If one of the arguments is of a wrong type.

  • ValueError – If the value of one of the arguments is wrong.

add_data_object(data_object: dict, class_name: str, uuid: Optional[str] = None, vector: Optional[Sequence] = None) None

Add one object to this batch. NOTE: If the UUID of one of the objects already exists then the existing object will be replaced by the new object.

Parameters
  • data_object (dict) – Object to be added as a dict datatype.

  • class_name (str) – The name of the class this object belongs to.

  • uuid (str, optional) – UUID of the object as a string, by default None

  • vector (Sequence, optional) – The embedding of the object that should be created. Used only class objects that do not have a vectorization module. Supported types are list, ‘numpy.ndarray`, torch.Tensor and tf.Tensor, by default None.

Raises
  • TypeError – If an argument passed is not of an appropriate type.

  • ValueError – If ‘uuid’ is not of a proper form.

add_reference(from_object_uuid: str, from_object_class_name: str, from_property_name: str, to_object_uuid: str, to_object_class_name: Optional[str] = None) None

Add one reference to this batch.

Parameters
  • from_object_uuid (str) – The UUID or URL of the object that should reference another object.

  • from_object_class_name (str) – The name of the class that should reference another object.

  • from_property_name (str) – The name of the property that contains the reference.

  • to_object_uuid (str) – The UUID or URL of the object that is actually referenced.

  • to_object_class_name (Optional[str], optional) – The referenced object class name to which to add the reference (with UUID to_object_uuid), it is included in Weaviate 1.14.0, where all objects are namespaced by class name. STRONGLY recommended to set it with Weaviate >= 1.14.0. It will be required in future versions of Weaviate Server and Clients. Use None value ONLY for Weaviate < v1.14.0, by default None

Raises
  • TypeError – If arguments are not of type str.

  • ValueError – If ‘uuid’ is not valid or cannot be extracted.

property batch_size: Optional[int]

Setter and Getter for batch_size.

Parameters

value (Optional[int]) – Setter ONLY: The new value for the batch_size. If NOT None it will try to auto-create the existing data if it meets the requirements. If previous value was None then it will be set to new value and will change the batching type to auto-create with dynamic set to False. See the documentation for configure or __call__ for more info. If recommended_num_objects is None then it is initialized with the new value of the batch_size (same for references).

Returns

Getter ONLY: The current value of the batch_size. It is NOT the current number of data in the Batch. See the documentation for configure or __call__ for more info.

Return type

Optional[int]

Raises
  • TypeError – Setter ONLY: If the new value is not of type int.

  • ValueError – Setter ONLY: If the new value has a non positive value.

configure(batch_size: typing.Optional[int] = None, creation_time: numbers.Real = 10, timeout_retries: int = 0, callback: typing.Optional[typing.Callable[[dict], None]] = <function check_batch_result>, dynamic: bool = False) weaviate.batch.crud_batch.Batch

Configure the instance to your needs. (__call__ and configure methods are the same). NOTE: It has default values and if you want to change only one use a setter instead, or provide all the configurations.

Parameters
  • batch_size (Optional[int], optional) – The batch size to be use. This value sets the Batch functionality, if batch_size is None then no auto-creation is done (callback and dynamic are ignored). If it is a positive number auto-creation is enabled and the value represents: 1) in case dynamic is False -> the number of data in the Batch (sum of objects and references) when to auto-create; 2) in case dynamic is True -> the initial value for both recommended_num_objects and recommended_num_references, by default None

  • creation_time (Real, optional) – The time interval it should take the Batch to be created, used ONLY for computing recommended_num_objects and recommended_num_references, by default 10

  • timeout_retries (int, optional) – Number of times to retry to create a Batch that failed with TimeOut error, by default 0

  • callback (Optional[Callable[[dict], None]], optional) – A callback function on the results of each (objects and references) batch types. By default weaviate.util.check_batch_result.

  • dynamic (bool, optional) – Whether to use dynamic batching or not, by default False

Returns

Updated self.

Return type

Batch

Raises
  • TypeError – If one of the arguments is of a wrong type.

  • ValueError – If the value of one of the arguments is wrong.

create_objects() list

Creates multiple Objects at once in Weaviate. This does not guarantee that each batch item is added/created to the Weaviate server. This can lead to a successful batch creation but unsuccessful per batch item creation. See the example bellow. NOTE: If the UUID of one of the objects already exists then the existing object will be replaced by the new object.

Examples

Here client is an instance of the weaviate.Client.

Add objects to the object batch.

>>> client.batch.add_data_object({}, 'NonExistingClass')
>>> client.batch.add_data_object({}, 'ExistingClass')

Note that ‘NonExistingClass’ is not present in the client’s schema and ‘ExistingObject’ is present and has no proprieties. ‘client.batch.add_data_object’ does not raise an exception because the objects added meet the required criteria (See the documentation of the ‘weaviate.Batch.add_data_object’ method for more information).

>>> result = client.batch.create_objects(batch)

Successful batch creation even if one data object is inconsistent with the client’s schema. We can find out more about what objects were successfully created by analyzing the ‘result’ variable.

>>> import json
>>> print(json.dumps(result, indent=4))
[
    {
        "class": "NonExistingClass",
        "creationTimeUnix": 1614852753747,
        "id": "154cbccd-89f4-4b29-9c1b-001a3339d89a",
        "properties": {},
        "deprecations": null,
        "result": {
            "errors": {
                "error": [
                    {
                        "message": "class 'NonExistingClass' not present in schema,
                                                    class NonExistingClass not present"
                    }
                ]
            }
        }
    },
    {
        "class": "ExistingClass",
        "creationTimeUnix": 1614852753746,
        "id": "b7b1cfbe-20da-496c-b932-008d35805f26",
        "properties": {},
        "vector": [
            -0.05244319,
            ...
            0.076136276
        ],
        "deprecations": null,
        "result": {}
    }
]

As it can be noticed the first object from the batch was not added/created, but the batch was successfully created. The batch creation can be successful even if all the objects were NOT created. Check the status of the batch objects to find which object and why creation failed. Alternatively use ‘client.data_object.create’ for Object creation that throw an error if data item is inconsistent or creation/addition failed.

To check the results of batch creation when using the auto-creation Batch, use a ‘callback’ (see the docs configure or __call__ method for more information).

Returns

A list with the status of every object that was created.

Return type

list

Raises
create_references() list

Creates multiple References at once in Weaviate. Adding References in batch is faster but it ignores validations like class name and property name, resulting in a SUCCESSFUL reference creation of a nonexistent object types and/or a nonexistent properties. If the consistency of the References is wanted use ‘client.data_object.reference.add’ to have additional validation against the weaviate schema. See Examples below.

Examples

Here client is an instance of the weaviate.Client.

Object that does not exist in weaviate.

>>> object_1 = '154cbccd-89f4-4b29-9c1b-001a3339d89d'

Objects that exist in weaviate.

>>> object_2 = '154cbccd-89f4-4b29-9c1b-001a3339d89c'
>>> object_3 = '254cbccd-89f4-4b29-9c1b-001a3339d89a'
>>> object_4 = '254cbccd-89f4-4b29-9c1b-001a3339d89b'
>>> client.batch.add_reference(object_1, 'NonExistingClass', 'existsWith', object_2)
>>> client.batch.add_reference(object_3, 'ExistingClass', 'existsWith', object_4)

Both references were added to the batch request without error because they meet the required criteria (See the documentation of the ‘weaviate.Batch.add_reference’ method for more information).

>>> result = client.batch.create_references()

As it can be noticed the reference batch creation is successful (no error thrown). Now we can inspect the ‘result’.

>>> import json
>>> print(json.dumps(result, indent=4))
[
    {
        "from": "weaviate://localhost/NonExistingClass/
                                        154cbccd-89f4-4b29-9c1b-001a3339d89a/existsWith",
        "to": "weaviate://localhost/154cbccd-89f4-4b29-9c1b-001a3339d89b",
        "result": {
            "status": "SUCCESS"
        }
    },
    {
        "from": "weaviate://localhost/ExistingClass/
                                        254cbccd-89f4-4b29-9c1b-001a3339d89a/existsWith",
        "to": "weaviate://localhost/254cbccd-89f4-4b29-9c1b-001a3339d89b",
        "result": {
            "status": "SUCCESS"
        }
    }
]

Both references were added successfully but one of them is corrupted (links two objects of nonexisting class and one of the objects is not yet created). To make use of the validation, crete each references individually (see the client.data_object.reference.add method).

Returns

A list with the status of every reference added.

Return type

list

Raises
property creation_time: numbers.Real

Setter and Getter for creation_time.

Parameters

value (Real) – Setter ONLY: Set new value to creation_time. The recommended_num_objects/references values are updated to this new value. If the batch_size is not None it will auto-create the batch if the requirements are met.

Returns

Getter ONLY: The creation_time value.

Return type

Real

Raises
  • TypeError – Setter ONLY: If the new value is not of type Real.

  • ValueError – Setter ONLY: If the new value has a non positive value.

delete_objects(class_name: str, where: dict, output: str = 'minimal', dry_run: bool = False) dict

Delete objects that match the ‘match’ in batch.

Parameters
  • class_name (str) – The class name for which to delete objects.

  • where (dict) – The content of the where filter used to match objects that should be deleted.

  • output (str, optional) – The control of the verbosity of the output, possible values: - “minimal” : The result only includes counts. Information about objects is omitted if the deletes were successful. Only if an error occurred will the object be described. - “verbose” : The result lists all affected objects with their ID and deletion status, including both successful and unsuccessful deletes. By default “minimal”

  • dry_run (bool, optional) – If True, objects will not be deleted yet, but merely listed, by default False

Examples

If we want to delete all the data objects that contain the word ‘weather’ we can do it like this:

>>> result = client.batch.delete_objects(
...     class_name='Dataset',
...     output='verbose',
...     dry_run=False,
...     where={
...         'operator': 'Equal',
...         'path': ['description'],
...         'valueText': 'weather'
...     }
... )
>>> print(json.dumps(result, indent=4))
{
    "dryRun": false,
    "match": {
        "class": "Dataset",
        "where": {
            "operands": null,
            "operator": "Equal",
            "path": [
                "description"
            ],
            "valueText": "weather"
        }
    },
    "output": "verbose",
    "results": {
        "failed": 0,
        "limit": 10000,
        "matches": 2,
        "objects": [
            {
                "id": "1eb28f69-c66e-5411-bad4-4e14412b65cd",
                "status": "SUCCESS"
            },
            {
                "id": "da217bdd-4c7c-5568-9576-ebefe17688ba",
                "status": "SUCCESS"
            }
        ],
        "successful": 2
    }
}
Returns

The result/status of the batch delete.

Return type

dict

property dynamic: bool

Setter and Getter for dynamic.

Parameters

value (bool) – Setter ONLY: En/dis-able the dynamic batching. If batch_size is None the value is not set, otherwise it will set the dynamic to new value and auto-create if it meets the requirements.

Returns

Getter ONLY: Wether the dynamic batching is enabled.

Return type

bool

Raises

TypeError – Setter ONLY: If the new value is not of type bool.

empty_objects() None

Remove all the objects from the batch.

empty_references() None

Remove all the references from the batch.

flush() None

Flush both objects and references to the Weaviate server and call the callback function if one is provided. (See the docs for configure or __call__ for how to set one.)

is_empty_objects() bool

Check if batch contains any objects.

Returns

Whether the Batch object list is empty.

Return type

bool

is_empty_references() bool

Check if batch contains any references.

Returns

Whether the Batch reference list is empty.

Return type

bool

num_objects() int

Get current number of objects in the batch.

Returns

The number of objects in the batch.

Return type

int

num_references() int

Get current number of references in the batch.

Returns

The number of references in the batch.

Return type

int

pop_object(index: int = - 1) dict

Remove and return the object at index (default last).

Parameters

index (int, optional) – The index of the object to pop, by default -1 (last item).

Returns

The popped object.

Return type

dict

Raises

IndexError – If batch is empty or index is out of range.

pop_reference(index: int = - 1) dict

Remove and return the reference at index (default last).

Parameters

index (int, optional) – The index of the reference to pop, by default -1 (last item).

Returns

The popped reference.

Return type

dict

Raises

IndexError – If batch is empty or index is out of range.

property recommended_num_objects: Optional[int]

The recommended number of objects per batch. If None then it could not be computed.

Returns

The recommended number of objects per batch. If None then it could not be computed.

Return type

Optional[int]

property recommended_num_references: Optional[int]

The recommended number of references per batch. If None then it could not be computed.

Returns

The recommended number of references per batch. If None then it could not be computed.

Return type

Optional[int]

property shape: Tuple[int, int]

Get current number of objects and references in the batch.

Returns

The number of objects and references, respectively, in the batch as a tuple, i.e. returns (number of objects, number of references).

Return type

Tuple[int, int]

property timeout_retries: int

Setter and Getter for timeout_retries.

valueint

Setter ONLY: The new value for timeout_retries.

Returns

Getter ONLY: The timeout_retries value.

Return type

int

Raises
  • TypeError – Setter ONLY: If the new value is not of type int.

  • ValueError – Setter ONLY: If the new value has a non positive value.