RUFAS.data_validator module#

class RUFAS.data_validator.ElementState(*values)#

Bases: Enum

An enumeration of the states a data element can be in during validation. An element cannot be in more than one state at a time.

Attributes#

VALIDint: The element is valid.
INVALIDint: The element is invalid and cannot be fixed.
FIXEDint: The element is invalid initially but has been fixed.

VALID = 'valid'#

INVALID = 'invalid'#

FIXED = 'fixed'#

class RUFAS.data_validator.ElementsCounter#

Bases: object

A class to keep track of the number of elements in each state during validation.

Attributes#

valid_elementsint: The number of valid elements.
invalid_elementsint: The number of invalid elements.
fixed_elementsint: The number of fixed elements.

__init__() → None#

update(state: ElementState, value: int) → None#

Updates the count of elements in a given state.

Parameters#

stateElementState: The state of the element.
valueint: The value by which the count should be updated.

Raises#

ValueError: If the state is not one of the valid states.

increment(state: ElementState) → None#

Increments the count of elements in a given state by one.

Parameters#

stateElementState: The state of the element.

reset() → None#: Resets the counts of all element states to zero.

total_elements() → int#: Returns the total number of elements by adding the counts of valid, invalid, and fixed elements.

class RUFAS.data_validator.Modifiability(*values)#

Bases: Enum

Enum class representing the modifiability status of a variable.

This Enum defines various levels of modifiability for a variable, indicating whether a variable is required at initialization and if it can be modified during runtime.

Attributes#

REQUIRED_LOCKEDstr: Indicates the variable must be initialized with a value and cannot be modified thereafter.
REQUIRED_UNLOCKEDstr: Indicates the variable must be initialized with a value but can be modified during runtime.
UNREQUIRED_UNLOCKEDstr: Indicates the variable does not need to be initialized with a value and can be modified during runtime.

REQUIRED_LOCKED: str = 'required locked'#

REQUIRED_UNLOCKED: str = 'required unlocked'#

UNREQUIRED_UNLOCKED: str = 'unrequired unlocked'#

classmethod values() → list[str]#

Provides a list of the string values of the enum members.

Returns#

List[str]: A list containing the string values of the enum members.

classmethod get_required_during_initialization() → list[Modifiability]#

classmethod get_modifiable_at_runtime() → list[Modifiability]#

class RUFAS.data_validator.DataValidator#

Bases: object

This class is will be utilized to validate all types of data across RuFas codebase.

__init__() → None#

validate_properties(metadata: dict[str, Any], metadata_depth_limit: int) → tuple[bool, str]#

Iteratively traverses the metadata properties to check the max depth and routes properties to be validated by type.

return#

Tuple[bool, str]: boolean to indicate the validation status, error message in str if there’s error that should be raised by the caller.

_validate_metadata_properties_keys(required_properties_keys: set[str], optional_properties_keys: set[str], properties: dict[str, Any], path: list[str]) → tuple[bool, str]#: Validates that keys in the metadata properties sections.

_metadata_number_validator(key_path: list[str], value: dict[str, Any]) → tuple[bool, str]#: Validates number type properties in metadata.

_metadata_string_validator(key_path: list[str], value: dict[str, Any]) → tuple[bool, str]#: Validates string type properties in metadata.

_metadata_bool_validator(key_path: list[str], value: dict[str, Any]) → tuple[bool, str]#: Validates bool type properties in metadata.

_metadata_array_validator(key_path: list[str], value: dict[str, Any]) → tuple[bool, str]#: Validates array type properties in metadata.

_metadata_object_validator(key_path: list[str], value: dict[str, Any]) → tuple[bool, str]#: Validates object type properties in metadata.

validate_metadata(metadata: dict[str, Any], valid_data_types: set[str], address_to_data: str) → tuple[bool, str]#: Checks that top-level metadata has valid and required keys and values.

validate_data_by_type(variable_properties: dict[str, Any], variable_path: list[str | int], data: dict[str, Any], eager_termination: bool, properties_blob_key: str, elements_counter: ElementsCounter, called_during_initialization: bool, fixable_data_types: set[str]) → bool#

Validates the data based on its specified type.

Parameters#

variable_propertiesDict[str, Any]: A dictionary containing properties relevant to the validation.
variable_pathList[str | int]: The path to the variable being validated.
dataDict[str, Any]: The data to be validated.
eager_terminationbool: If True, the process will be terminated as soon as finding invalid data and failing to fix it.
properties_blob_keystr: The metadata properties for the data file being checked.
elements_counterElementsCounter: A counter to keep track of the number of valid, invalid, and fixed elements.
called_during_initialization: bool: Boolean variable indicating whether the function is being called during initialization.
fixable_data_types: set[str]: Set enumerating the data types that the caller will attempt to fix while validating data.

Returns#

bool: True if the data is valid, False otherwise.

Raises#

KeyError: If the variable’s properties does not specify a “type”.

Notes#

Fixing invalid data will only be attempted if the data is a “simple” type (i.e. a string, bool or number).

_validate_array_container_properties(variable_path: list[str | int], variable_properties: dict[str, Any], data: Any, properties_blob_key: str) → bool#

Validates the container properties of an array data element.

Parameters#

variable_pathList[str | int]: The path to the variable being validated.
variable_propertiesDict[str, Any]: The metadata properties for the variable being validated.
dataAny: The data to be validated.
properties_blob_keystr: The metadata properties for the data file being checked.

Returns#

bool: True if the array container properties are valid, False otherwise.

_array_type_validator(variable_path: list[str | int], variable_properties: dict[str, Any], data: dict[str, Any], eager_termination: bool, properties_blob_key: str, elements_counter: ElementsCounter, called_during_initialization: bool, fixable_data_types: set[str]) → bool#

Validates a data element of type array.

Parameters#

variable_pathList[str | int]: The path to the variable being validated.
variable_propertiesDict[str, Any]: The metadata properties for the variable being validated.
dataDict[str, Any]: The data to be validated.
eager_terminationbool: If True, the process will be terminated upon finding invalid data.
properties_blob_keystr: The metadata properties for the data file being checked.
elements_counterElementsCounter: A counter to keep track of the number of valid, invalid, and fixed elements.
called_during_initialization: bool: Boolean variable indicating whether the function is being called during initialization.
fixable_data_types: set[str]: Set of data types that are fixable.

Returns#

bool: True if the data element is valid or fixable, False otherwise.

_object_type_validator(variable_path: list[str | int], variable_properties: dict[str, Any], data: dict[str, Any], eager_termination: bool, properties_blob_key: str, elements_counter: ElementsCounter, called_during_initialization: bool, fixable_data_types: set[str]) → bool#

Validates a data element of type object.

Parameters#

variable_pathList[str | int]: The path to the variable being validated.
variable_propertiesDict[str, Any]: The metadata properties for the variable being validated.
dataDict[str, Any]: The data to be validated.
eager_terminationbool: If True, the process will be terminated upon finding invalid data.
properties_blob_keystr: The metadata properties for the data file being checked.
elements_counterElementsCounter: A counter to keep track of the number of valid, invalid, and fixed elements.
called_during_initialization: bool: Boolean variable indicating whether the function is being called during initialization.

Returns#

bool: True if the data element is valid or fixable, False otherwise.

Notes#

This method will look for and delete any keys in the data that do not have properties specified for them in the metadata properties.

_number_type_validator(variable_path: list[str | int], variable_properties: dict[str, Any], data: dict[str, Any], eager_termination: bool, properties_blob_key: str, elements_counter: ElementsCounter, called_during_initialization: bool, fixable_data_types: set[str]) → bool#: Validates an data number element.

_string_type_validator(variable_path: list[str | int], variable_properties: dict[str, Any], data: dict[str, Any], eager_termination: bool, properties_blob_key: str, elements_counter: ElementsCounter, called_during_initialization: bool, fixable_data_types: set[str]) → bool#: Validates a data string element.

_bool_type_validator(variable_path: list[str | int], variable_properties: dict[str, Any], data: dict[str, Any], eager_termination: bool, properties_blob_key: str, elements_counter: ElementsCounter, called_during_initialization: bool, fixable_data_types: set[str]) → bool#: Validates a data bool element.

_fix_data(variable_properties: dict[str, Any], element_hierarchy: list[str | int], data: dict[str, Any], properties_blob_key: str) → bool#

Attempt to fix the invalid data.

Parameters#

variable_propertiesdict[str, Any]: The properties for the variable of interest.
element_hierarchy: list: A list indicating the path to reach the variable of interest in self.__metadata and self.__pool.
data: dict[str, Any]: A buffer dictionary that holds the data for validation and fixing.
properties_blob_keystr: The metadata properties section keyword for the data file being checked.

Returns#

bool: True if the data is fixed, False otherwise.

_extract_data_by_key_list(data: list[Any] | dict[str, Any], variable_path: Sequence[str | int], variable_properties: dict[str, Any], called_during_initialization: bool) → Any#

Extracts a value from the data based on a specified path and handles missing data by calling DataValidator._log_missing_data().

Parameters#

dataList[Any] | Dict[str, Any]: The data containing the value to be extracted.
variable_pathList[str | int]: A list of keys to be used to extract the value from the data.
variable_propertiesDict[str, Any]: The metadata properties for the variable being validated.
called_during_initialization: bool: Boolean variable indicating whether the function is being called during initialization.

Returns#

Any: The value extracted from the data if found. None if not found.

Notes#

This function navigates through the given data (which can be a list or a dictionary) following the path specified in variable_path. If the path leads to a value, it is returned. If a KeyError occurs during this process (i.e., a key or index is missing in the path), the function extracts the variable name by finding the last string element in the variable_path array and handles this missing data by calling DataValidator._log_missing_data().

_log_missing_data(variable_properties: dict[str, Any], var_name: str, called_during_initialization: bool) → None#

Handles logging for missing data for a variable, logging errors or warnings based on the context of initialization or runtime updates.

Parameters#

variable_propertiesDict[str, Any]: Properties of the variable, potentially including its modifiability status.
var_namestr: The name of the variable with missing data.
called_during_initialization: bool: Boolean variable indicating whether the function is being called during initialization

Raises#

KeyError: Raised if the missing data is deemed necessary, either during initialization or for a runtime update.

Notes#

This function determines if it’s being called during the initialization phase and checks if the missing variable data is required at this stage using ‘_is_data_required_upon_initialization’. If required, it logs an error and raises a KeyError. If not, it logs a warning.

_is_data_required_upon_initialization(variable_name: str, variable_properties: dict[str, Any]) → bool#

Determines whether a variable requires a data value upon initialization based on its modifiability status.

This function utilizes the ‘_get_variable_modifiability’ method to ascertain the modifiability status of the variable identified by ‘variable_name’ and described by ‘variable_properties’. It then checks if the modifiability status is either ‘REQUIRED_AND_LOCKED’ or ‘REQUIRED_AND_UNLOCKED’, indicating that the variable must be initialized with a value.

Parameters#

variable_namestr: The name of the variable being evaluated for its initialization requirements.
variable_propertiesDict[str, Any]: A dictionary containing the properties of the variable, which should include its modifiability status among others.

Returns#

bool: True if the variable’s modifiability status necessitates a data value upon initialization, False otherwise.

_get_variable_modifiability(variable_name: str, variable_properties: dict[str, Any]) → Modifiability#

Determines the modifiability status of a variable based on its properties and returns the corresponding enum value.

Notes#

This function looks for a ‘modifiability’ key within variable_properties. If present and its value is not empty, the function attempts to map this value to an enum member in Modifiability. If the value does not correspond to any enum members, a KeyError is raised after logging the error. If ‘modifiability’ is absent or its value is empty, the function defaults to Modifiability.NOT_REQUIRED_AND_UNLOCKED.

Parameters#

variable_namestr: The name of the variable for which the modifiability status is being determined. Used for error logging.
variable_propertiesDict[str, Any]: A dictionary containing the properties of the variable, containing the desired ‘modifiability’ property.

Returns#

Modifiability: An enum member representing the variable’s modifiability status.

Raises#

KeyError: If ‘modifiability’ in variable_properties does not match any enum member in Modifiability. The error message includes the invalid modifiability value and suggests valid values.

convert_variable_path_to_str(variable_path: list[str | int]) → str#

Converts a list of keys (int or str) into a string representation of the path to a variable.

Parameters#

variable_pathList[str | int]: A list of keys to be used to extract the value from the data.

Returns#

str: A string representation of the path to a variable.

Examples#

>>> input_manager = InputManager()
>>> var_path = ["animal", "herd_information", "calf_num"]
>>> DataValidator.convert_variable_path_to_str(var_path)
'animal.herd_information.calf_num'

>>> input_manager = InputManager()
>>> var_path = ["manure_management_scenarios", 0, "bedding_type"]
>>> DataValidator.convert_variable_path_to_str(var_path)
'manure_management_scenarios.[0].bedding_type'

extract_value_by_key_list(data: list[Any] | dict[str, Any], variable_path: Sequence[str | int]) → Any#

Extracts a value from a nested list or dictionary using a list of keys (int or str).

Parameters#

dataList[Any] | Dict[str, Any]: The data containing the value to be extracted.
variable_pathList[str | int]: A list of keys to be used to extract the value from the data.

Returns#

Any: The value extracted from the data.

Raises#

KeyError: If the value cannot be extracted from the data using the provided variable path.

Examples#

>>> data_validator = DataValidator()
>>> example_data = {
...     "animal": {
...         "herd_information": {
...             "calf_num": 8,
...             "heiferI_num": 44,
...             "heiferII_num": 38,
...             "heiferIII_num_springers": 12
...         }
...     }
... }
>>> var_path = ["animal", "herd_information", "calf_num"]
>>> DataValidator.extract_value_by_key_list(example_data, var_path)
8

>>> data_validator = DataValidator()
>>> example_data = {
...     "manure_management_scenarios": [
...         {
...             "bedding_type": "straw",
...             "manure_handler": "manual scraping"
...         },
...         {
...             "bedding_type": "sawdust",
...             "manure_handler": "flush system"
...         }
...     ]
... }
>>> var_path = ["manure_management_scenarios", 0, "bedding_type"]
>>> DataValidator.extract_value_by_key_list(example_data, var_path)
'straw'