weaviate.batch package
Module for uploading objects and references to Weaviate in batches.
- class weaviate.batch.Batch(connection: Connection)[source]
Bases:
object
Batch class used to add multiple objects or object references at once into weaviate. To add data to the Batch use these methods of this class: add_data_object and add_reference. This object also stores 2 recommended batch size variables, one for objects and one for references. The recommended batch size is updated with every batch creation, and is the number of data objects/references that can be sent/processed by the Weaviate server in creation_time interval (see configure or __call__ method on how to set this value, by default it is set to 10). The initial value is None/batch_size and is updated with every batch create methods. The values can be accessed with the getters: recommended_num_objects and recommended_num_references. NOTE: If the UUID of one of the objects already exists then the existing object will be replaced by the new object.
This class can be used in 3 ways:
- Case I:
Everything should be done by the user, i.e. the user should add the objects/object-references and create them whenever the user wants. To create one of the data type use these methods of this class: create_objects, create_references and flush. This case has the Batch instance’s batch_size set to None (see docs for the configure or __call__ method). Can be used in a context manager, see below.
- Case II:
Batch auto-creates when full. This can be achieved by setting the Batch instance’s batch_size set to a positive integer (see docs for the configure or __call__ method). The batch_size in this case corresponds to the sum of added objects and references. This case does not require the user to create the batch/s, but it can be done. Also to create non-full batches (last batch/es) that do not meet the requirement to be auto-created use the flush method. Can be used in a context manager, see below.
- Case III:
Similar to Case II but uses dynamic batching, i.e. auto-creates either objects or references when one of them reached the recommended_num_objects or recommended_num_references respectively. See docs for the configure or __call__ method for how to enable it.
- Context-manager support: Can be use with the with statement. When it exists the context-
manager it calls the flush method for you. Can be combined with configure/__call__ method, in order to set it to the desired Case.
Examples
Here are examples for each CASE described above. Here client is an instance of the weaviate.Client.
>>> object_1 = '154cbccd-89f4-4b29-9c1b-001a3339d89d' >>> object_2 = '154cbccd-89f4-4b29-9c1b-001a3339d89c' >>> object_3 = '254cbccd-89f4-4b29-9c1b-001a3339d89a' >>> object_4 = '254cbccd-89f4-4b29-9c1b-001a3339d89b'
For Case I:
>>> client.batch.shape (0, 0) >>> client.batch.add_data_object({}, 'MyClass') >>> client.batch.add_data_object({}, 'MyClass') >>> client.batch.add_reference(object_1, 'MyClass', 'myProp', object_2) >>> client.batch.shape (2, 1) >>> client.batch.create_objects() >>> client.batch.shape (0, 1) >>> client.batch.create_references() >>> client.batch.shape (0, 0) >>> client.batch.add_data_object({}, 'MyClass') >>> client.batch.add_reference(object_3, 'MyClass', 'myProp', object_4) >>> client.batch.shape (1, 1) >>> client.batch.flush() >>> client.batch.shape (0, 0)
Or with a context manager:
>>> with client.batch as batch: ... batch.add_data_object({}, 'MyClass') ... batch.add_reference(object_3, 'MyClass', 'myProp', object_4) >>> # flush was called >>> client.batch.shape (0, 0)
For Case II:
>>> client.batch(batch_size=3) >>> client.batch.shape (0, 0) >>> client.batch.add_data_object({}, 'MyClass') >>> client.batch.add_reference(object_1, 'MyClass', 'myProp', object_2) >>> client.batch.shape (1, 1) >>> client.batch.add_data_object({}, 'MyClass') # sum of data_objects and references reached >>> client.batch.shape (0, 0)
Or with a context manager and __call__ method:
>>> with client.batch(batch_size=3) as batch: ... batch.add_data_object({}, 'MyClass') ... batch.add_reference(object_3, 'MyClass', 'myProp', object_4) ... batch.add_data_object({}, 'MyClass') ... batch.add_reference(object_1, 'MyClass', 'myProp', object_4) >>> # flush was called >>> client.batch.shape (0, 0)
Or with a context manager and setter:
>>> client.batch.batch_size = 3 >>> with client.batch as batch: ... batch.add_data_object({}, 'MyClass') ... batch.add_reference(object_3, 'MyClass', 'myProp', object_4) ... batch.add_data_object({}, 'MyClass') ... batch.add_reference(object_1, 'MyClass', 'myProp', object_4) >>> # flush was called >>> client.batch.shape (0, 0)
For Case III: Same as Case II but you need to configure or enable ‘dynamic’ batching.
>>> client.batch.configure(batch_size=3, dynamic=True) # 'batch_size' must be an valid int
Or:
>>> client.batch.batch_size = 3 >>> client.batch.dynamic = True
See the documentation of the configure`( or `__call__) and the setters for more information on how/why and what you need to configure/set in order to use a particular Case.
Initialize a Batch class instance. This defaults to manual creation configuration. See docs for the configure or __call__ method for different types of configurations.
Parameters
- connectionweaviate.connect.Connection
Connection object to an active and running weaviate instance.
- add_data_object(data_object: dict, class_name: str, uuid: str | UUID | None = None, vector: Sequence | None = None, tenant: str | None = None) str [source]
Add one object to this batch. NOTE: If the UUID of one of the objects already exists then the existing object will be replaced by the new object.
Parameters
- data_objectdict
Object to be added as a dict datatype.
- class_namestr
The name of the class this object belongs to.
- uuidOptional[UUID], optional
The UUID of the object as an uuid.UUID object or str. It can be a Weaviate beacon or Weaviate href. If it is None an UUIDv4 will generated, by default None
- vector: Sequence or None, optional
The embedding of the object that should be validated. Can be used when:
a class does not have a vectorization module.
The given vector was generated using the _identical_ vectorization module that is configured for the
class. In this case this vector takes precedence.
Supported types are list, ‘numpy.ndarray`, torch.Tensor and tf.Tensor, by default None.
Returns
- str
The UUID of the added object. If one was not provided a UUIDv4 will be generated.
Raises
- TypeError
If an argument passed is not of an appropriate type.
- ValueError
If ‘uuid’ is not of a proper form.
- add_reference(from_object_uuid: str | UUID, from_object_class_name: str, from_property_name: str, to_object_uuid: str | UUID, to_object_class_name: str | None = None, tenant: str | None = None) None [source]
Add one reference to this batch.
Parameters
- from_object_uuidUUID
The UUID of the object, as an uuid.UUID object or str, that should reference another object. It can be a Weaviate beacon or Weaviate href.
- from_object_class_namestr
The name of the class that should reference another object.
- from_property_namestr
The name of the property that contains the reference.
- to_object_uuidUUID
The UUID of the object, as an uuid.UUID object or str, that is actually referenced. It can be a Weaviate beacon or Weaviate href.
- to_object_class_nameOptional[str], optional
The referenced object class name to which to add the reference (with UUID to_object_uuid), it is included in Weaviate 1.14.0, where all objects are namespaced by class name. STRONGLY recommended to set it with Weaviate >= 1.14.0. It will be required in future versions of Weaviate Server and Clients. Use None value ONLY for Weaviate < v1.14.0, by default None
- tenant: str, optional
Name of the tenant.
Raises
- TypeError
If arguments are not of type str.
- ValueError
If ‘uuid’ is not valid or cannot be extracted.
- property batch_size: int | None
Setter and Getter for batch_size.
Parameters
- valueOptional[int]
Setter ONLY: The new value for the batch_size. If NOT None it will try to auto-create the existing data if it meets the requirements. If previous value was None then it will be set to new value and will change the batching type to auto-create with dynamic set to False. See the documentation for configure or __call__ for more info. If recommended_num_objects is None then it is initialized with the new value of the batch_size (same for references).
Returns
- Optional[int]
Getter ONLY: The current value of the batch_size. It is NOT the current number of data in the Batch. See the documentation for configure or __call__ for more info.
Raises
- TypeError
Setter ONLY: If the new value is not of type int.
- ValueError
Setter ONLY: If the new value has a non positive value.
- configure(batch_size: int | None = 50, creation_time: ~numbers.Real | None = None, timeout_retries: int = 3, connection_error_retries: int = 3, weaviate_error_retries: ~weaviate.batch.crud_batch.WeaviateErrorRetryConf | None = None, callback: ~typing.Callable[[~typing.List[dict]], None] | None = <function check_batch_result>, dynamic: bool = True, num_workers: int = 1, consistency_level: ~weaviate.data.replication.replication.ConsistencyLevel | None = None) Batch [source]
Warnings
It has default values and if you want to change only one use a setter instead or
- provide all the configurations, both the old and new ones.
This method will return None in the next major release. If you are using the returned
Batch object then you should start using the client.batch object instead.
Parameters
- batch_sizeOptional[int], optional
The batch size to be use. This value sets the Batch functionality, if batch_size is None then no auto-creation is done (callback and dynamic are ignored). If it is a positive number auto-creation is enabled and the value represents: 1) in case dynamic is False -> the number of data in the Batch (sum of objects and references) when to auto-create; 2) in case dynamic is True -> the initial value for both recommended_num_objects and recommended_num_references, by default 50
- creation_timeReal, optional
How long it should take to create a Batch. Used ONLY for computing dynamic batch sizes. By default None
- timeout_retriesint, optional
Number of retries to create a Batch that failed with ReadTimeout, by default 3
- connection_error_retriesint, optional
Number of retries to create a Batch that failed with ConnectionError, by default 3
- weaviate_error_retries: WeaviateErrorRetryConf, Optional
How often batch-elements with an error originating from weaviate (for example transformer timeouts) should be retried and which errors should be ignored and/or included. See documentation for WeaviateErrorRetryConf for details.
- callbackOptional[Callable[[dict], None]], optional
A callback function on the results of each (objects and references) batch types. By default weaviate.util.check_batch_result
- dynamicbool, optional
Whether to use dynamic batching or not, by default True
- num_workersint, optional
The maximal number of concurrent threads to run batch import. Only used for non-MANUAL batching. i.e. is used only with AUTO or DYNAMIC batching. By default, the multi-threading is disabled. Use with care to not overload your weaviate instance.
Returns
- Batch
Updated self.
Raises
- TypeError
If one of the arguments is of a wrong type.
- ValueError
If the value of one of the arguments is wrong.
- property connection_error_retries: int
Setter and Getter for connection_error_retries.
Properties
- valueint
Setter ONLY: The new value for connection_error_retries.
Returns
- int
Getter ONLY: The connection_error_retries value.
Raises
- TypeError
Setter ONLY: If the new value is not of type int.
- ValueError
Setter ONLY: If the new value has a non positive value.
- property consistency_level: str | None
- create_objects() list [source]
Creates multiple Objects at once in Weaviate. This does not guarantee that each batch item is added/created to the Weaviate server. This can lead to a successful batch creation but unsuccessful per batch item creation. See the example bellow. NOTE: If the UUID of one of the objects already exists then the existing object will be replaced by the new object.
Examples
Here client is an instance of the weaviate.Client.
Add objects to the object batch.
>>> client.batch.add_data_object({}, 'NonExistingClass') >>> client.batch.add_data_object({}, 'ExistingClass')
Note that ‘NonExistingClass’ is not present in the client’s schema and ‘ExistingObject’ is present and has no proprieties. ‘client.batch.add_data_object’ does not raise an exception because the objects added meet the required criteria (See the documentation of the ‘weaviate.Batch.add_data_object’ method for more information).
>>> result = client.batch.create_objects(batch)
Successful batch creation even if one data object is inconsistent with the client’s schema. We can find out more about what objects were successfully created by analyzing the ‘result’ variable.
>>> import json >>> print(json.dumps(result, indent=4)) [ { "class": "NonExistingClass", "creationTimeUnix": 1614852753747, "id": "154cbccd-89f4-4b29-9c1b-001a3339d89a", "properties": {}, "deprecations": null, "result": { "errors": { "error": [ { "message": "class 'NonExistingClass' not present in schema, class NonExistingClass not present" } ] } } }, { "class": "ExistingClass", "creationTimeUnix": 1614852753746, "id": "b7b1cfbe-20da-496c-b932-008d35805f26", "properties": {}, "vector": [ -0.05244319, ... 0.076136276 ], "deprecations": null, "result": {} } ]
As it can be noticed the first object from the batch was not added/created, but the batch was successfully created. The batch creation can be successful even if all the objects were NOT created. Check the status of the batch objects to find which object and why creation failed. Alternatively use ‘client.data_object.create’ for Object creation that throw an error if data item is inconsistent or creation/addition failed.
To check the results of batch creation when using the auto-creation Batch, use a ‘callback’ (see the docs configure or __call__ method for more information).
Returns
- list
A list with the status of every object that was created.
Raises
- requests.ConnectionError
If the network connection to weaviate fails.
- weaviate.UnexpectedStatusCodeException
If weaviate reports a none OK status.
- create_references() list [source]
Creates multiple References at once in Weaviate. Adding References in batch is faster but it ignores validations like class name and property name, resulting in a SUCCESSFUL reference creation of a nonexistent object types and/or a nonexistent properties. If the consistency of the References is wanted use ‘client.data_object.reference.add’ to have additional validation against the weaviate schema. See Examples below.
Examples
Here client is an instance of the weaviate.Client.
Object that does not exist in weaviate.
>>> object_1 = '154cbccd-89f4-4b29-9c1b-001a3339d89d'
Objects that exist in weaviate.
>>> object_2 = '154cbccd-89f4-4b29-9c1b-001a3339d89c' >>> object_3 = '254cbccd-89f4-4b29-9c1b-001a3339d89a' >>> object_4 = '254cbccd-89f4-4b29-9c1b-001a3339d89b'
>>> client.batch.add_reference(object_1, 'NonExistingClass', 'existsWith', object_2) >>> client.batch.add_reference(object_3, 'ExistingClass', 'existsWith', object_4)
Both references were added to the batch request without error because they meet the required criteria (See the documentation of the ‘weaviate.Batch.add_reference’ method for more information).
>>> result = client.batch.create_references()
As it can be noticed the reference batch creation is successful (no error thrown). Now we can inspect the ‘result’.
>>> import json >>> print(json.dumps(result, indent=4)) [ { "from": "weaviate://localhost/NonExistingClass/ 154cbccd-89f4-4b29-9c1b-001a3339d89a/existsWith", "to": "weaviate://localhost/154cbccd-89f4-4b29-9c1b-001a3339d89b", "result": { "status": "SUCCESS" } }, { "from": "weaviate://localhost/ExistingClass/ 254cbccd-89f4-4b29-9c1b-001a3339d89a/existsWith", "to": "weaviate://localhost/254cbccd-89f4-4b29-9c1b-001a3339d89b", "result": { "status": "SUCCESS" } } ]
Both references were added successfully but one of them is corrupted (links two objects of nonexisting class and one of the objects is not yet created). To make use of the validation, crete each references individually (see the client.data_object.reference.add method).
Returns
- list
A list with the status of every reference added.
Raises
- requests.ConnectionError
If the network connection to weaviate fails.
- weaviate.UnexpectedStatusCodeException
If weaviate reports a none OK status.
- property creation_time: Real
Setter and Getter for creation_time.
Parameters
- valueReal
Setter ONLY: Set new value to creation_time. The recommended_num_objects/references values are updated to this new value. If the batch_size is not None it will auto-create the batch if the requirements are met.
Returns
- Real
Getter ONLY: The creation_time value.
Raises
- TypeError
Setter ONLY: If the new value is not of type Real.
- ValueError
Setter ONLY: If the new value has a non positive value.
- delete_objects(class_name: str, where: dict, output: str = 'minimal', dry_run: bool = False, tenant: str | None = None) dict [source]
Delete objects that match the ‘match’ in batch.
Parameters
- class_namestr
The class name for which to delete objects.
- wheredict
The content of the where filter used to match objects that should be deleted.
- outputstr, optional
The control of the verbosity of the output, possible values: - “minimal” : The result only includes counts. Information about objects is omitted if the deletes were successful. Only if an error occurred will the object be described. - “verbose” : The result lists all affected objects with their ID and deletion status, including both successful and unsuccessful deletes. By default “minimal”
- dry_runbool, optional
If True, objects will not be deleted yet, but merely listed, by default False
Examples
If we want to delete all the data objects that contain the word ‘weather’ we can do it like this:
>>> result = client.batch.delete_objects( ... class_name='Dataset', ... output='verbose', ... dry_run=False, ... where={ ... 'operator': 'Equal', ... 'path': ['description'], ... 'valueText': 'weather' ... } ... ) >>> print(json.dumps(result, indent=4)) { "dryRun": false, "match": { "class": "Dataset", "where": { "operands": null, "operator": "Equal", "path": [ "description" ], "valueText": "weather" } }, "output": "verbose", "results": { "failed": 0, "limit": 10000, "matches": 2, "objects": [ { "id": "1eb28f69-c66e-5411-bad4-4e14412b65cd", "status": "SUCCESS" }, { "id": "da217bdd-4c7c-5568-9576-ebefe17688ba", "status": "SUCCESS" } ], "successful": 2 } }
Returns
- dict
The result/status of the batch delete.
- property dynamic: bool
Setter and Getter for dynamic.
Parameters
- valuebool
Setter ONLY: En/dis-able the dynamic batching. If batch_size is None the value is not set, otherwise it will set the dynamic to new value and auto-create if it meets the requirements.
Returns
- bool
Getter ONLY: Wether the dynamic batching is enabled.
Raises
- TypeError
Setter ONLY: If the new value is not of type bool.
- flush() None [source]
Flush both objects and references to the Weaviate server and call the callback function if one is provided. (See the docs for configure or __call__ for how to set one.)
- is_empty_objects() bool [source]
Check if batch contains any objects.
Returns
- bool
Whether the Batch object list is empty.
- is_empty_references() bool [source]
Check if batch contains any references.
Returns
- bool
Whether the Batch reference list is empty.
- num_objects() int [source]
Get current number of objects in the batch.
Returns
- int
The number of objects in the batch.
- num_references() int [source]
Get current number of references in the batch.
Returns
- int
The number of references in the batch.
- pop_object(index: int = -1) dict [source]
Remove and return the object at index (default last).
Parameters
- indexint, optional
The index of the object to pop, by default -1 (last item).
Returns
- dict
The popped object.
Raises
- IndexError
If batch is empty or index is out of range.
- pop_reference(index: int = -1) dict [source]
Remove and return the reference at index (default last).
Parameters
- indexint, optional
The index of the reference to pop, by default -1 (last item).
Returns
- dict
The popped reference.
Raises
- IndexError
If batch is empty or index is out of range.
- property recommended_num_objects: int | None
The recommended number of objects per batch. If None then it could not be computed.
Returns
- Optional[int]
The recommended number of objects per batch. If None then it could not be computed.
- property recommended_num_references: int | None
The recommended number of references per batch. If None then it could not be computed.
Returns
- Optional[int]
The recommended number of references per batch. If None then it could not be computed.
- property shape: Tuple[int, int]
Get current number of objects and references in the batch.
Returns
- Tuple[int, int]
The number of objects and references, respectively, in the batch as a tuple, i.e. returns (number of objects, number of references).
- property timeout_retries: int
Setter and Getter for timeout_retries.
Properties
- valueint
Setter ONLY: The new value for timeout_retries.
Returns
- int
Getter ONLY: The timeout_retries value.
Raises
- TypeError
Setter ONLY: If the new value is not of type int.
- ValueError
Setter ONLY: If the new value has a non positive value.
- wait_for_vector_indexing(shards: List[Shard] | None = None, how_many_failures: int = 5) None [source]
Wait for the all the vectors of the batch imported objects to be indexed.
Upon network error, it will retry to get the shards’ status for how_many_failures times with exponential backoff (2**n seconds with n=0,1,2,…,how_many_failures).
Parameters
- shards {Optional[List[Shard]]} – The shards to check the status of. If None it will
check the status of all the shards of the imported objects in the batch.
- how_many_failures {int} – How many times to try to get the shards’ status before
raising an exception. Default 5.
- class weaviate.batch.Shard(class_name: str, tenant: str | None = None)[source]
Bases:
object
- class_name: str
- tenant: str | None = None
- class weaviate.batch.WeaviateErrorRetryConf(number_retries: int = 3, errors_to_exclude: List[str] | None = None, errors_to_include: List[str] | None = None)[source]
Bases:
object
Configures how often objects should be retried when Weaviate returns an error and which errors should be included or excluded. By default, all errors are retried.
Parameters
- number_retries: int
How often a batch that includes objects with errors should be retried. Must be >=1.
- errors_to_exclude: Optional[List[str]]
Which errors should NOT be retried. All other errors will be retried. An object will be skipped, when the given string is part of the weaviate error message.
Example: errors_to_exclude =[“string1”, “string2”] will match the error with message “Long error message that contains string1”.
- errors_to_include: Optional[List[str]]
Which errors should be retried. All other errors will NOT be retried. An object will be included, when the given string is part of the weaviate error message.
Example: errors_to_include =[“string1”, “string2”] will match the error with message “Long error message that contains string1”.
- errors_to_exclude: List[str] | None = None
- errors_to_include: List[str] | None = None
- number_retries: int = 3
Submodules
weaviate.batch.crud_batch module
Batch class definitions.
- class weaviate.batch.crud_batch.Batch(connection: Connection)[source]
Bases:
object
Batch class used to add multiple objects or object references at once into weaviate. To add data to the Batch use these methods of this class: add_data_object and add_reference. This object also stores 2 recommended batch size variables, one for objects and one for references. The recommended batch size is updated with every batch creation, and is the number of data objects/references that can be sent/processed by the Weaviate server in creation_time interval (see configure or __call__ method on how to set this value, by default it is set to 10). The initial value is None/batch_size and is updated with every batch create methods. The values can be accessed with the getters: recommended_num_objects and recommended_num_references. NOTE: If the UUID of one of the objects already exists then the existing object will be replaced by the new object.
This class can be used in 3 ways:
- Case I:
Everything should be done by the user, i.e. the user should add the objects/object-references and create them whenever the user wants. To create one of the data type use these methods of this class: create_objects, create_references and flush. This case has the Batch instance’s batch_size set to None (see docs for the configure or __call__ method). Can be used in a context manager, see below.
- Case II:
Batch auto-creates when full. This can be achieved by setting the Batch instance’s batch_size set to a positive integer (see docs for the configure or __call__ method). The batch_size in this case corresponds to the sum of added objects and references. This case does not require the user to create the batch/s, but it can be done. Also to create non-full batches (last batch/es) that do not meet the requirement to be auto-created use the flush method. Can be used in a context manager, see below.
- Case III:
Similar to Case II but uses dynamic batching, i.e. auto-creates either objects or references when one of them reached the recommended_num_objects or recommended_num_references respectively. See docs for the configure or __call__ method for how to enable it.
- Context-manager support: Can be use with the with statement. When it exists the context-
manager it calls the flush method for you. Can be combined with configure/__call__ method, in order to set it to the desired Case.
Examples
Here are examples for each CASE described above. Here client is an instance of the weaviate.Client.
>>> object_1 = '154cbccd-89f4-4b29-9c1b-001a3339d89d' >>> object_2 = '154cbccd-89f4-4b29-9c1b-001a3339d89c' >>> object_3 = '254cbccd-89f4-4b29-9c1b-001a3339d89a' >>> object_4 = '254cbccd-89f4-4b29-9c1b-001a3339d89b'
For Case I:
>>> client.batch.shape (0, 0) >>> client.batch.add_data_object({}, 'MyClass') >>> client.batch.add_data_object({}, 'MyClass') >>> client.batch.add_reference(object_1, 'MyClass', 'myProp', object_2) >>> client.batch.shape (2, 1) >>> client.batch.create_objects() >>> client.batch.shape (0, 1) >>> client.batch.create_references() >>> client.batch.shape (0, 0) >>> client.batch.add_data_object({}, 'MyClass') >>> client.batch.add_reference(object_3, 'MyClass', 'myProp', object_4) >>> client.batch.shape (1, 1) >>> client.batch.flush() >>> client.batch.shape (0, 0)
Or with a context manager:
>>> with client.batch as batch: ... batch.add_data_object({}, 'MyClass') ... batch.add_reference(object_3, 'MyClass', 'myProp', object_4) >>> # flush was called >>> client.batch.shape (0, 0)
For Case II:
>>> client.batch(batch_size=3) >>> client.batch.shape (0, 0) >>> client.batch.add_data_object({}, 'MyClass') >>> client.batch.add_reference(object_1, 'MyClass', 'myProp', object_2) >>> client.batch.shape (1, 1) >>> client.batch.add_data_object({}, 'MyClass') # sum of data_objects and references reached >>> client.batch.shape (0, 0)
Or with a context manager and __call__ method:
>>> with client.batch(batch_size=3) as batch: ... batch.add_data_object({}, 'MyClass') ... batch.add_reference(object_3, 'MyClass', 'myProp', object_4) ... batch.add_data_object({}, 'MyClass') ... batch.add_reference(object_1, 'MyClass', 'myProp', object_4) >>> # flush was called >>> client.batch.shape (0, 0)
Or with a context manager and setter:
>>> client.batch.batch_size = 3 >>> with client.batch as batch: ... batch.add_data_object({}, 'MyClass') ... batch.add_reference(object_3, 'MyClass', 'myProp', object_4) ... batch.add_data_object({}, 'MyClass') ... batch.add_reference(object_1, 'MyClass', 'myProp', object_4) >>> # flush was called >>> client.batch.shape (0, 0)
For Case III: Same as Case II but you need to configure or enable ‘dynamic’ batching.
>>> client.batch.configure(batch_size=3, dynamic=True) # 'batch_size' must be an valid int
Or:
>>> client.batch.batch_size = 3 >>> client.batch.dynamic = True
See the documentation of the configure`( or `__call__) and the setters for more information on how/why and what you need to configure/set in order to use a particular Case.
Initialize a Batch class instance. This defaults to manual creation configuration. See docs for the configure or __call__ method for different types of configurations.
Parameters
- connectionweaviate.connect.Connection
Connection object to an active and running weaviate instance.
- add_data_object(data_object: dict, class_name: str, uuid: str | UUID | None = None, vector: Sequence | None = None, tenant: str | None = None) str [source]
Add one object to this batch. NOTE: If the UUID of one of the objects already exists then the existing object will be replaced by the new object.
Parameters
- data_objectdict
Object to be added as a dict datatype.
- class_namestr
The name of the class this object belongs to.
- uuidOptional[UUID], optional
The UUID of the object as an uuid.UUID object or str. It can be a Weaviate beacon or Weaviate href. If it is None an UUIDv4 will generated, by default None
- vector: Sequence or None, optional
The embedding of the object that should be validated. Can be used when:
a class does not have a vectorization module.
The given vector was generated using the _identical_ vectorization module that is configured for the
class. In this case this vector takes precedence.
Supported types are list, ‘numpy.ndarray`, torch.Tensor and tf.Tensor, by default None.
Returns
- str
The UUID of the added object. If one was not provided a UUIDv4 will be generated.
Raises
- TypeError
If an argument passed is not of an appropriate type.
- ValueError
If ‘uuid’ is not of a proper form.
- add_reference(from_object_uuid: str | UUID, from_object_class_name: str, from_property_name: str, to_object_uuid: str | UUID, to_object_class_name: str | None = None, tenant: str | None = None) None [source]
Add one reference to this batch.
Parameters
- from_object_uuidUUID
The UUID of the object, as an uuid.UUID object or str, that should reference another object. It can be a Weaviate beacon or Weaviate href.
- from_object_class_namestr
The name of the class that should reference another object.
- from_property_namestr
The name of the property that contains the reference.
- to_object_uuidUUID
The UUID of the object, as an uuid.UUID object or str, that is actually referenced. It can be a Weaviate beacon or Weaviate href.
- to_object_class_nameOptional[str], optional
The referenced object class name to which to add the reference (with UUID to_object_uuid), it is included in Weaviate 1.14.0, where all objects are namespaced by class name. STRONGLY recommended to set it with Weaviate >= 1.14.0. It will be required in future versions of Weaviate Server and Clients. Use None value ONLY for Weaviate < v1.14.0, by default None
- tenant: str, optional
Name of the tenant.
Raises
- TypeError
If arguments are not of type str.
- ValueError
If ‘uuid’ is not valid or cannot be extracted.
- property batch_size: int | None
Setter and Getter for batch_size.
Parameters
- valueOptional[int]
Setter ONLY: The new value for the batch_size. If NOT None it will try to auto-create the existing data if it meets the requirements. If previous value was None then it will be set to new value and will change the batching type to auto-create with dynamic set to False. See the documentation for configure or __call__ for more info. If recommended_num_objects is None then it is initialized with the new value of the batch_size (same for references).
Returns
- Optional[int]
Getter ONLY: The current value of the batch_size. It is NOT the current number of data in the Batch. See the documentation for configure or __call__ for more info.
Raises
- TypeError
Setter ONLY: If the new value is not of type int.
- ValueError
Setter ONLY: If the new value has a non positive value.
- configure(batch_size: int | None = 50, creation_time: ~numbers.Real | None = None, timeout_retries: int = 3, connection_error_retries: int = 3, weaviate_error_retries: ~weaviate.batch.crud_batch.WeaviateErrorRetryConf | None = None, callback: ~typing.Callable[[~typing.List[dict]], None] | None = <function check_batch_result>, dynamic: bool = True, num_workers: int = 1, consistency_level: ~weaviate.data.replication.replication.ConsistencyLevel | None = None) Batch [source]
Warnings
It has default values and if you want to change only one use a setter instead or
- provide all the configurations, both the old and new ones.
This method will return None in the next major release. If you are using the returned
Batch object then you should start using the client.batch object instead.
Parameters
- batch_sizeOptional[int], optional
The batch size to be use. This value sets the Batch functionality, if batch_size is None then no auto-creation is done (callback and dynamic are ignored). If it is a positive number auto-creation is enabled and the value represents: 1) in case dynamic is False -> the number of data in the Batch (sum of objects and references) when to auto-create; 2) in case dynamic is True -> the initial value for both recommended_num_objects and recommended_num_references, by default 50
- creation_timeReal, optional
How long it should take to create a Batch. Used ONLY for computing dynamic batch sizes. By default None
- timeout_retriesint, optional
Number of retries to create a Batch that failed with ReadTimeout, by default 3
- connection_error_retriesint, optional
Number of retries to create a Batch that failed with ConnectionError, by default 3
- weaviate_error_retries: WeaviateErrorRetryConf, Optional
How often batch-elements with an error originating from weaviate (for example transformer timeouts) should be retried and which errors should be ignored and/or included. See documentation for WeaviateErrorRetryConf for details.
- callbackOptional[Callable[[dict], None]], optional
A callback function on the results of each (objects and references) batch types. By default weaviate.util.check_batch_result
- dynamicbool, optional
Whether to use dynamic batching or not, by default True
- num_workersint, optional
The maximal number of concurrent threads to run batch import. Only used for non-MANUAL batching. i.e. is used only with AUTO or DYNAMIC batching. By default, the multi-threading is disabled. Use with care to not overload your weaviate instance.
Returns
- Batch
Updated self.
Raises
- TypeError
If one of the arguments is of a wrong type.
- ValueError
If the value of one of the arguments is wrong.
- property connection_error_retries: int
Setter and Getter for connection_error_retries.
Properties
- valueint
Setter ONLY: The new value for connection_error_retries.
Returns
- int
Getter ONLY: The connection_error_retries value.
Raises
- TypeError
Setter ONLY: If the new value is not of type int.
- ValueError
Setter ONLY: If the new value has a non positive value.
- property consistency_level: str | None
- create_objects() list [source]
Creates multiple Objects at once in Weaviate. This does not guarantee that each batch item is added/created to the Weaviate server. This can lead to a successful batch creation but unsuccessful per batch item creation. See the example bellow. NOTE: If the UUID of one of the objects already exists then the existing object will be replaced by the new object.
Examples
Here client is an instance of the weaviate.Client.
Add objects to the object batch.
>>> client.batch.add_data_object({}, 'NonExistingClass') >>> client.batch.add_data_object({}, 'ExistingClass')
Note that ‘NonExistingClass’ is not present in the client’s schema and ‘ExistingObject’ is present and has no proprieties. ‘client.batch.add_data_object’ does not raise an exception because the objects added meet the required criteria (See the documentation of the ‘weaviate.Batch.add_data_object’ method for more information).
>>> result = client.batch.create_objects(batch)
Successful batch creation even if one data object is inconsistent with the client’s schema. We can find out more about what objects were successfully created by analyzing the ‘result’ variable.
>>> import json >>> print(json.dumps(result, indent=4)) [ { "class": "NonExistingClass", "creationTimeUnix": 1614852753747, "id": "154cbccd-89f4-4b29-9c1b-001a3339d89a", "properties": {}, "deprecations": null, "result": { "errors": { "error": [ { "message": "class 'NonExistingClass' not present in schema, class NonExistingClass not present" } ] } } }, { "class": "ExistingClass", "creationTimeUnix": 1614852753746, "id": "b7b1cfbe-20da-496c-b932-008d35805f26", "properties": {}, "vector": [ -0.05244319, ... 0.076136276 ], "deprecations": null, "result": {} } ]
As it can be noticed the first object from the batch was not added/created, but the batch was successfully created. The batch creation can be successful even if all the objects were NOT created. Check the status of the batch objects to find which object and why creation failed. Alternatively use ‘client.data_object.create’ for Object creation that throw an error if data item is inconsistent or creation/addition failed.
To check the results of batch creation when using the auto-creation Batch, use a ‘callback’ (see the docs configure or __call__ method for more information).
Returns
- list
A list with the status of every object that was created.
Raises
- requests.ConnectionError
If the network connection to weaviate fails.
- weaviate.UnexpectedStatusCodeException
If weaviate reports a none OK status.
- create_references() list [source]
Creates multiple References at once in Weaviate. Adding References in batch is faster but it ignores validations like class name and property name, resulting in a SUCCESSFUL reference creation of a nonexistent object types and/or a nonexistent properties. If the consistency of the References is wanted use ‘client.data_object.reference.add’ to have additional validation against the weaviate schema. See Examples below.
Examples
Here client is an instance of the weaviate.Client.
Object that does not exist in weaviate.
>>> object_1 = '154cbccd-89f4-4b29-9c1b-001a3339d89d'
Objects that exist in weaviate.
>>> object_2 = '154cbccd-89f4-4b29-9c1b-001a3339d89c' >>> object_3 = '254cbccd-89f4-4b29-9c1b-001a3339d89a' >>> object_4 = '254cbccd-89f4-4b29-9c1b-001a3339d89b'
>>> client.batch.add_reference(object_1, 'NonExistingClass', 'existsWith', object_2) >>> client.batch.add_reference(object_3, 'ExistingClass', 'existsWith', object_4)
Both references were added to the batch request without error because they meet the required criteria (See the documentation of the ‘weaviate.Batch.add_reference’ method for more information).
>>> result = client.batch.create_references()
As it can be noticed the reference batch creation is successful (no error thrown). Now we can inspect the ‘result’.
>>> import json >>> print(json.dumps(result, indent=4)) [ { "from": "weaviate://localhost/NonExistingClass/ 154cbccd-89f4-4b29-9c1b-001a3339d89a/existsWith", "to": "weaviate://localhost/154cbccd-89f4-4b29-9c1b-001a3339d89b", "result": { "status": "SUCCESS" } }, { "from": "weaviate://localhost/ExistingClass/ 254cbccd-89f4-4b29-9c1b-001a3339d89a/existsWith", "to": "weaviate://localhost/254cbccd-89f4-4b29-9c1b-001a3339d89b", "result": { "status": "SUCCESS" } } ]
Both references were added successfully but one of them is corrupted (links two objects of nonexisting class and one of the objects is not yet created). To make use of the validation, crete each references individually (see the client.data_object.reference.add method).
Returns
- list
A list with the status of every reference added.
Raises
- requests.ConnectionError
If the network connection to weaviate fails.
- weaviate.UnexpectedStatusCodeException
If weaviate reports a none OK status.
- property creation_time: Real
Setter and Getter for creation_time.
Parameters
- valueReal
Setter ONLY: Set new value to creation_time. The recommended_num_objects/references values are updated to this new value. If the batch_size is not None it will auto-create the batch if the requirements are met.
Returns
- Real
Getter ONLY: The creation_time value.
Raises
- TypeError
Setter ONLY: If the new value is not of type Real.
- ValueError
Setter ONLY: If the new value has a non positive value.
- delete_objects(class_name: str, where: dict, output: str = 'minimal', dry_run: bool = False, tenant: str | None = None) dict [source]
Delete objects that match the ‘match’ in batch.
Parameters
- class_namestr
The class name for which to delete objects.
- wheredict
The content of the where filter used to match objects that should be deleted.
- outputstr, optional
The control of the verbosity of the output, possible values: - “minimal” : The result only includes counts. Information about objects is omitted if the deletes were successful. Only if an error occurred will the object be described. - “verbose” : The result lists all affected objects with their ID and deletion status, including both successful and unsuccessful deletes. By default “minimal”
- dry_runbool, optional
If True, objects will not be deleted yet, but merely listed, by default False
Examples
If we want to delete all the data objects that contain the word ‘weather’ we can do it like this:
>>> result = client.batch.delete_objects( ... class_name='Dataset', ... output='verbose', ... dry_run=False, ... where={ ... 'operator': 'Equal', ... 'path': ['description'], ... 'valueText': 'weather' ... } ... ) >>> print(json.dumps(result, indent=4)) { "dryRun": false, "match": { "class": "Dataset", "where": { "operands": null, "operator": "Equal", "path": [ "description" ], "valueText": "weather" } }, "output": "verbose", "results": { "failed": 0, "limit": 10000, "matches": 2, "objects": [ { "id": "1eb28f69-c66e-5411-bad4-4e14412b65cd", "status": "SUCCESS" }, { "id": "da217bdd-4c7c-5568-9576-ebefe17688ba", "status": "SUCCESS" } ], "successful": 2 } }
Returns
- dict
The result/status of the batch delete.
- property dynamic: bool
Setter and Getter for dynamic.
Parameters
- valuebool
Setter ONLY: En/dis-able the dynamic batching. If batch_size is None the value is not set, otherwise it will set the dynamic to new value and auto-create if it meets the requirements.
Returns
- bool
Getter ONLY: Wether the dynamic batching is enabled.
Raises
- TypeError
Setter ONLY: If the new value is not of type bool.
- flush() None [source]
Flush both objects and references to the Weaviate server and call the callback function if one is provided. (See the docs for configure or __call__ for how to set one.)
- is_empty_objects() bool [source]
Check if batch contains any objects.
Returns
- bool
Whether the Batch object list is empty.
- is_empty_references() bool [source]
Check if batch contains any references.
Returns
- bool
Whether the Batch reference list is empty.
- num_objects() int [source]
Get current number of objects in the batch.
Returns
- int
The number of objects in the batch.
- num_references() int [source]
Get current number of references in the batch.
Returns
- int
The number of references in the batch.
- pop_object(index: int = -1) dict [source]
Remove and return the object at index (default last).
Parameters
- indexint, optional
The index of the object to pop, by default -1 (last item).
Returns
- dict
The popped object.
Raises
- IndexError
If batch is empty or index is out of range.
- pop_reference(index: int = -1) dict [source]
Remove and return the reference at index (default last).
Parameters
- indexint, optional
The index of the reference to pop, by default -1 (last item).
Returns
- dict
The popped reference.
Raises
- IndexError
If batch is empty or index is out of range.
- property recommended_num_objects: int | None
The recommended number of objects per batch. If None then it could not be computed.
Returns
- Optional[int]
The recommended number of objects per batch. If None then it could not be computed.
- property recommended_num_references: int | None
The recommended number of references per batch. If None then it could not be computed.
Returns
- Optional[int]
The recommended number of references per batch. If None then it could not be computed.
- property shape: Tuple[int, int]
Get current number of objects and references in the batch.
Returns
- Tuple[int, int]
The number of objects and references, respectively, in the batch as a tuple, i.e. returns (number of objects, number of references).
- property timeout_retries: int
Setter and Getter for timeout_retries.
Properties
- valueint
Setter ONLY: The new value for timeout_retries.
Returns
- int
Getter ONLY: The timeout_retries value.
Raises
- TypeError
Setter ONLY: If the new value is not of type int.
- ValueError
Setter ONLY: If the new value has a non positive value.
- wait_for_vector_indexing(shards: List[Shard] | None = None, how_many_failures: int = 5) None [source]
Wait for the all the vectors of the batch imported objects to be indexed.
Upon network error, it will retry to get the shards’ status for how_many_failures times with exponential backoff (2**n seconds with n=0,1,2,…,how_many_failures).
Parameters
- shards {Optional[List[Shard]]} – The shards to check the status of. If None it will
check the status of all the shards of the imported objects in the batch.
- how_many_failures {int} – How many times to try to get the shards’ status before
raising an exception. Default 5.
- class weaviate.batch.crud_batch.BatchExecutor(max_workers=None, thread_name_prefix='', initializer=None, initargs=())[source]
Bases:
ThreadPoolExecutor
Weaviate Batch Executor to run batch requests in separate thread. This class implements an additional method is_shutdown that us used my the context manager.
Initializes a new ThreadPoolExecutor instance.
- Args:
- max_workers: The maximum number of threads that can be used to
execute the given calls.
thread_name_prefix: An optional name prefix to give our threads. initializer: A callable used to initialize worker threads. initargs: A tuple of arguments to pass to the initializer.
- class weaviate.batch.crud_batch.Shard(class_name: str, tenant: str | None = None)[source]
Bases:
object
- class_name: str
- tenant: str | None = None
- class weaviate.batch.crud_batch.WeaviateErrorRetryConf(number_retries: int = 3, errors_to_exclude: List[str] | None = None, errors_to_include: List[str] | None = None)[source]
Bases:
object
Configures how often objects should be retried when Weaviate returns an error and which errors should be included or excluded. By default, all errors are retried.
Parameters
- number_retries: int
How often a batch that includes objects with errors should be retried. Must be >=1.
- errors_to_exclude: Optional[List[str]]
Which errors should NOT be retried. All other errors will be retried. An object will be skipped, when the given string is part of the weaviate error message.
Example: errors_to_exclude =[“string1”, “string2”] will match the error with message “Long error message that contains string1”.
- errors_to_include: Optional[List[str]]
Which errors should be retried. All other errors will NOT be retried. An object will be included, when the given string is part of the weaviate error message.
Example: errors_to_include =[“string1”, “string2”] will match the error with message “Long error message that contains string1”.
- errors_to_exclude: List[str] | None = None
- errors_to_include: List[str] | None = None
- number_retries: int = 3
weaviate.batch.requests module
BatchRequest class definitions.
- class weaviate.batch.requests.BatchRequest[source]
Bases:
ABC
BatchRequest abstract class used as a interface for batch requests.
- abstract add_failed_objects_from_response(response_item: List[Dict[str, Any]], errors_to_exclude: List[str] | None, errors_to_include: List[str] | None) List[Dict[str, Any]] [source]
Add failed items from a weaviate response.
Parameters
- response_itemBatchResponse
Weaviate response that contains the status for all objects.
- errors_to_excludeOptional[List[str]]
Which errors should NOT be retried.
- errors_to_includeOptional[List[str]]
Which errors should be retried.
Returns
BatchResponse: Contains responses form all successful object, eg. those that have not been added to this batch.
- abstract get_request_body() List[Dict[str, Any]] | Dict[str, Any] [source]
Return the request body to be digested by weaviate that contains all batch items.
- class weaviate.batch.requests.ObjectsBatchRequest[source]
Bases:
BatchRequest
Collect objects for one batch request to weaviate. Caution this batch will not be validated through weaviate.
- add(data_object: dict, class_name: str, uuid: str | UUID | None = None, vector: Sequence | None = None, tenant: str | None = None) str [source]
Add one object to this batch. Does NOT validate the consistency of the object against the client’s schema. Checks the arguments’ type and UUIDs’ format.
Parameters
- class_namestr
The name of the class this object belongs to.
- data_objectdict
Object to be added as a dict datatype.
- uuidstr or None, optional
UUID of the object as a string, by default None
- vector: Sequence or None, optional
The embedding of the object that should be validated. Can be used when:
a class does not have a vectorization module.
The given vector was generated using the _identical_ vectorization module that is configured for the
class. In this case this vector takes precedence.
Supported types are list, ‘numpy.ndarray`, torch.Tensor and tf.Tensor, by default None.
- tenant: str, optional
Tenant of the object
Returns
- str
The UUID of the added object. If one was not provided a UUIDv3 will be generated.
Raises
- TypeError
If an argument passed is not of an appropriate type.
- ValueError
If ‘uuid’ is not of a proper form.
- add_failed_objects_from_response(response: List[Dict[str, Any]], errors_to_exclude: List[str] | None, errors_to_include: List[str] | None) List[Dict[str, Any]] [source]
Add failed items from a weaviate response.
Parameters
- response_itemBatchResponse
Weaviate response that contains the status for all objects.
- errors_to_excludeOptional[List[str]]
Which errors should NOT be retried.
- errors_to_includeOptional[List[str]]
Which errors should be retried.
Returns
BatchResponse: Contains responses form all successful object, eg. those that have not been added to this batch.
- class weaviate.batch.requests.ReferenceBatchRequest[source]
Bases:
BatchRequest
Collect Weaviate-object references to add them in one request to Weaviate. Caution this request will miss some validations to be faster.
- add(from_object_class_name: str, from_object_uuid: str | UUID, from_property_name: str, to_object_uuid: str | UUID, to_object_class_name: str | None = None, tenant: str | None = None) None [source]
Add one Weaviate-object reference to this batch. Does NOT validate the consistency of the reference against the class schema. Checks the arguments’ type and UUIDs’ format.
Parameters
- from_object_class_namestr
The name of the class that should reference another object.
- from_object_uuidstr
The UUID or URL of the object that should reference another object.
- from_property_namestr
The name of the property that contains the reference.
- to_object_uuidstr
The UUID or URL of the object that is actually referenced.
- to_object_class_nameOptional[str], optional
The referenced object class name to which to add the reference (with UUID to_object_uuid), it is included in Weaviate 1.14.0, where all objects are namespaced by class name. STRONGLY recommended to set it with Weaviate >= 1.14.0. It will be required in future versions of Weaviate Server and Clients. Use None value ONLY for Weaviate < v1.14.0, by default None
Raises
- TypeError
If arguments are not of type str.
- ValueError
If ‘uuid’ is not valid or cannot be extracted.
- add_failed_objects_from_response(response: List[Dict[str, Any]], errors_to_exclude: List[str] | None, errors_to_include: List[str] | None) List[Dict[str, Any]] [source]
Add failed items from a weaviate response.
Parameters
- response_itemBatchResponse
Weaviate response that contains the status for all objects.
- errors_to_excludeOptional[List[str]]
Which errors should NOT be retried.
- errors_to_includeOptional[List[str]]
Which errors should be retried.
Returns
BatchResponse: Contains responses form all successful object, eg. those that have not been added to this batch.