weaviate.batch

Module for uploading objects and references to Weaviate in batches.

class weaviate.batch.Batch(connection: weaviate.connect.connection.Connection)

Bases: object

Batch class used to add multiple objects or object references at once into weaviate. To add data to the Batch use these methods of this class: add_data_object and add_reference. This object also stores 2 recommended batch size variables, one for objects and one for references. The recommended batch size is updated with every batch creation, and is the number of data objects/references that can be sent/processed by the Weaviate server in creation_time interval (see configure or __call__ method on how to set this value, by default it is set to 10). The initial value is None/batch_size and is updated with every batch create methods. The values can be accessed with the getters: recommended_num_objects and recommended_num_references.

This class can be used in 3 ways:

Case I:

Everything should be done by the user, i.e. the user should add the objects/object-references and create them whenever the user wants. To create one of the data type use these methods of this class: create_objects, create_references and flush. This case has the Batch instance’s batch_size set to None (see docs for the configure or __call__ method). Can be used in a context manager, see below.

Case II:

Batch auto-creates when full. This can be achieved by setting the Batch instance’s batch_size set to a positive integer (see docs for the configure or __call__ method). The batch_size in this case corresponds to the sum of added objects and references. This case does not require the user to create the batch/s, but it can be done. Also to create non-full batches (last batche/s) that do not meet the requirement to be auto-created use the flush method. Can be used in a context manager, see below.

Case III:

Similar to Case II but uses dynamic batching, i.e. auto-creates either objects or references when one of them reached the recommended_num_objects or recommended_num_references respectively. See docs for the configure or __call__ method for how to enable it.

Context-manager support: Can be use with the with statement. When it exists the context-

manager it calls the flush method for you. Can be combined with configure/__call__ method, in order to set it to the desired Case.

Examples

Here are examples for each CASE described above. Here client is an instance of the weaviate.Client.

>>> object_1 = '154cbccd-89f4-4b29-9c1b-001a3339d89d'
>>> object_2 = '154cbccd-89f4-4b29-9c1b-001a3339d89c'
>>> object_3 = '254cbccd-89f4-4b29-9c1b-001a3339d89a'
>>> object_4 = '254cbccd-89f4-4b29-9c1b-001a3339d89b'

For Case I:

>>> client.batch.shape
(0, 0)
>>> client.batch.add_data_object({}, 'MyClass')
>>> client.batch.add_data_object({}, 'MyClass')
>>> client.batch.add_reference(object_1, 'MyClass', 'myProp', object_2)
>>> client.batch.shape
(2, 1)
>>> client.batch.create_objects()
>>> client.batch.shape
(0, 1)
>>> client.batch.create_references()
>>> client.batch.shape
(0, 0)
>>> client.batch.add_data_object({}, 'MyClass')
>>> client.batch.add_reference(object_3, 'MyClass', 'myProp', object_4)
>>> client.batch.shape
(1, 1)
>>> client.batch.flush()
>>> client.batch.shape
(0, 0)

Or with a context manager:

>>> with client.batch as batch:
...     batch.add_data_object({}, 'MyClass')
...     batch.add_reference(object_3, 'MyClass', 'myProp', object_4)
>>> # flush was called
>>> client.batch.shape
(0, 0)

For Case II:

>>> client.batch(batch_size=3)
>>> client.batch.shape
(0, 0)
>>> client.batch.add_data_object({}, 'MyClass')
>>> client.batch.add_reference(object_1, 'MyClass', 'myProp', object_2)
>>> client.batch.shape
(1, 1)
>>> client.batch.add_data_object({}, 'MyClass') # sum of data_objects and references reached
>>> client.batch.shape
(0, 0)

Or with a context manager and __call__ method:

>>> with client.batch(batch_size=3) as batch:
...     batch.add_data_object({}, 'MyClass')
...     batch.add_reference(object_3, 'MyClass', 'myProp', object_4)
...     batch.add_data_object({}, 'MyClass')
...     batch.add_reference(object_1, 'MyClass', 'myProp', object_4)
>>> # flush was called
>>> client.batch.shape
(0, 0)

Or with a context manager and setter:

>>> client.batch.batch_size = 3
>>> with client.batch as batch:
...     batch.add_data_object({}, 'MyClass')
...     batch.add_reference(object_3, 'MyClass', 'myProp', object_4)
...     batch.add_data_object({}, 'MyClass')
...     batch.add_reference(object_1, 'MyClass', 'myProp', object_4)
>>> # flush was called
>>> client.batch.shape
(0, 0)

For Case III: Same as Case II but you need to configure or enable ‘dynamic’ batching.

>>> client.batch.configure(batch_size=3, dynamic=True) # 'batch_size' must be an valid int

Or:

>>> client.batch.batch_size = 3
>>> client.batch.dynamic = True

See the documentation of the configure`( or `__call__) and the setters for more information on how/why and what you need to configure/set in order to use a particular Case.

Initialize a Batch class instance. This defaults to manual creation configuration. See docs for the configure or __call__ method for different types of configurations.

Parameters

connection (weaviate.connect.Connection) – Connection object to an active and running weaviate instance.

add_data_object(data_object: dict, class_name: str, uuid: Optional[str] = None, vector: Optional[Sequence] = None)None

Add one object to this batch.

Parameters
  • data_object (dict) – Object to be added as a dict datatype.

  • class_name (str) – The name of the class this object belongs to.

  • uuid (str, optional) – UUID of the object as a string, by default None

  • vector (Sequence, optional) – The embedding of the object that should be created. Used only class objects that do not have a vectorization module. Supported types are list, ‘numpy.ndarray`, torch.Tensor and tf.Tensor, by default None.

Raises
  • TypeError – If an argument passed is not of an appropriate type.

  • ValueError – If ‘uuid’ is not of a propper form.

add_reference(from_object_uuid: str, from_object_class_name: str, from_property_name: str, to_object_uuid: str)None

Add one reference to this batch.

Parameters
  • from_object_uuid (str) – The UUID or URL of the object that should reference another object.

  • from_object_class_name (str) – The name of the class that should reference another object.

  • from_property_name (str) – The name of the property that contains the reference.

  • to_object_uuid (str) – The UUID or URL of the object that is actually referenced.

Raises
  • TypeError – If arguments are not of type str.

  • ValueError – If ‘uuid’ is not valid or cannot be extracted.

property batch_size

Setter and Getter for batch_size.

Parameters

value (Optional[int]) – Setter ONLY: The new value for the batch_size. If NOT None it will try to auto-create the existing data if it meets the requirements. If previous value was None then it will be set to new value and will change the batching type to auto-create with dynamic set to False. See the documentation for configure or __call__ for more info. If recommended_num_objects is None then it is initialized with the new value of the batch_size (same for references).

Returns

Getter ONLY: The current value of the batch_size. It is NOT the current number of data in the Batch. See the documentation for configure or __call__ for more info.

Return type

Optional[int]

Raises
  • TypeError – Setter ONLY: If the new value is not of type int.

  • ValueError – Setter ONLY: If the new value has a non positive value.

create_objects()list

Creates multiple Objects at once in Weaviate. This does not guarantee that each batch item is added/created to the Weaviate server. This can lead to a successfull batch creation but unsuccessfull per batch item creation. See the example bellow.

Examples

Here client is an instance of the weaviate.Client.

Add objects to the object batch.

>>> client.batch.add_data_object({}, 'NonExistingClass')
>>> client.batch.add_data_object({}, 'ExistingClass')

Note that ‘NonExistingClass’ is not present in the client’s schema and ‘ExistingObject’ is present and has no proprieties. ‘client.batch.add_data_object’ does not raise an exception because the objects added meet the required criteria (See the documentation of the ‘weaviate.Batch.add_data_object’ method for more information).

>>> result = client.batch.create_objects(batch)

Successful batch creation even if one data object is inconsistent with the client’s schema. We can find out more about what objects were successfully created by analyzing the ‘result’ variable.

>>> import json
>>> print(json.dumps(result, indent=4))
[
    {
        "class": "NonExistingClass",
        "creationTimeUnix": 1614852753747,
        "id": "154cbccd-89f4-4b29-9c1b-001a3339d89a",
        "properties": {},
        "deprecations": null,
        "result": {
            "errors": {
                "error": [
                    {
                        "message": "class 'NonExistingClass' not present in schema,
                                                    class NonExistingClass not present"
                    }
                ]
            }
        }
    },
    {
        "class": "ExistingClass",
        "creationTimeUnix": 1614852753746,
        "id": "b7b1cfbe-20da-496c-b932-008d35805f26",
        "properties": {},
        "vector": [
            -0.05244319,
            ...
            0.076136276
        ],
        "deprecations": null,
        "result": {}
    }
]

As it can be noticed the first object from the batch was not added/created, but the batch was successfully created. The batch creation can be successful even if all the objects were NOT created. Check the status of the batch objects to find which object and why creation failed. Alternatively use ‘client.data_object.create’ for Object creation that throw an error if data item is inconsistent or creation/addition failed.

To check the results of batch creation when using the auto-creation Batch, use a ‘callback’ (see the docs configure or __call__ method for more information).

Returns

A list with the status of every object that was created.

Return type

list

Raises
create_references()list

Creates multiple References at once in Weaviate. Adding References in batch is faster but it ignors validations like class name and property name, resulting in a SUCCESSFUL reference creation of a nonexistent object types and/or a nonexistent properties. If the consistency of the References is wanted use ‘client.data_object.reference.add’ to have additional validation against the weaviate schema. See Examples below.

Examples

Here client is an instance of the weaviate.Client.

Object that does not exist in weaviate.

>>> object_1 = '154cbccd-89f4-4b29-9c1b-001a3339d89d'

Objects that exist in weaviate.

>>> object_2 = '154cbccd-89f4-4b29-9c1b-001a3339d89c'
>>> object_3 = '254cbccd-89f4-4b29-9c1b-001a3339d89a'
>>> object_4 = '254cbccd-89f4-4b29-9c1b-001a3339d89b'
>>> client.batch.add_reference(object_1, 'NonExistingClass', 'existsWith', object_2)
>>> client.batch.add_reference(object_3, 'ExistingClass', 'existsWith', object_4)

Both references were added to the batch request without error because they meet the required citeria (See the documentation of the ‘weaviate.Batch.add_reference’ method for more information).

>>> result = client.batch.create_references()

As it can be noticed the reference batch creation is successful (no error thrown). Now we can inspect the ‘result’.

>>> import json
>>> print(result, indent=4))
[
    {
        "from": "weaviate://localhost/NonExistingClass/
                                        154cbccd-89f4-4b29-9c1b-001a3339d89a/existsWith",
        "to": "weaviate://localhost/154cbccd-89f4-4b29-9c1b-001a3339d89b",
        "result": {
            "status": "SUCCESS"
        }
    },
    {
        "from": "weaviate://localhost/ExistingClass/
                                        254cbccd-89f4-4b29-9c1b-001a3339d89a/existsWith",
        "to": "weaviate://localhost/254cbccd-89f4-4b29-9c1b-001a3339d89b",
        "result": {
            "status": "SUCCESS"
        }
    }
]

Both references were added successfully but one of them is corrupted (links two objects of nonexisting class and one of the objects is not yet created). To make use of the validation, crete each references individually (see the client.data_object.reference.add method).

Returns

A list with the status of every reference added.

Return type

list

Raises
property creation_time

Setter and Getter for creation_time.

Parameters

value (Real) – Setter ONLY: Set new value to creation_time. The recommended_num_objects/references values are updated to this new value. If the batch_size is not None it will auto-create the batch if the requirements are met.

Returns

Getter ONLY: The creation_time value.

Return type

Real

Raises
  • TypeError – Setter ONLY: If the new value is not of type Real.

  • ValueError – Setter ONLY: If the new value has a non positive value.

property dynamic

Setter and Getter for dynamic.

Parameters

value (bool) – Setter ONLY: En/dis-able the dynamic batching. If batch_size is None the value is not set, otherwise it will set the dynamic to new value and auto-create if it meets the requirements.

Returns

Getter ONLY: Wether the dynamic batching is enabled.

Return type

bool

Raises

TypeError – Setter ONLY: If the new value is not of type bool.

flush()None

Flush both objects and references to the Weaviate server and call the callback function if one is provided. (See the docs for configure or __call__ for how to set one.)

num_objects()int

Get current number of objects in the batch.

Returns

The number of objects in the batch.

Return type

int

num_references()int

Get current number of references in the batch.

Returns

The number of references in the batch.

Return type

int

property recommended_num_objects

The recommended number of objects per batch. If None then it could not be computed.

Returns

The recommended number of objects per batch. If None then it could not be computed.

Return type

Optional[int]

property recommended_num_references

The recommended number of references per batch. If None then it could not be computed.

Returns

The recommended number of references per batch. If None then it could not be computed.

Return type

Optional[int]

property shape

Get current number of objects and references in the batch.

Returns

The number of objects and references, respectively, in the batch as a tuple, i.e. returns (number of objects, number of references).

Return type

Tuple[int, int]

property timeout_retries

Setter and Getter for timeout_retries.

valueint

Setter ONLY: The new value for timeout_retries.

Returns

Getter ONLY: The timeout_retries value.

Return type

int

Raises
  • TypeError – Setter ONLY: If the new value is not of type int.

  • ValueError – Setter ONLY: If the new value has a non positive value.