RUFAS.data_validator module#

class RUFAS.data_validator.ElementState(*values)#

Bases: Enum

An enumeration of the states a data element can be in during validation. An element cannot be in more than one state at a time.

Attributes#

VALIDint

The element is valid.

INVALIDint

The element is invalid and cannot be fixed.

FIXEDint

The element is invalid initially but has been fixed.

VALID = 'valid'#
INVALID = 'invalid'#
FIXED = 'fixed'#
class RUFAS.data_validator.ElementsCounter#

Bases: object

A class to keep track of the number of elements in each state during validation.

Attributes#

valid_elementsint

The number of valid elements.

invalid_elementsint

The number of invalid elements.

fixed_elementsint

The number of fixed elements.

__init__() None#
update(state: ElementState, value: int) None#

Updates the count of elements in a given state.

Parameters#

stateElementState

The state of the element.

valueint

The value by which the count should be updated.

Raises#

ValueError

If the state is not one of the valid states.

increment(state: ElementState) None#

Increments the count of elements in a given state by one.

Parameters#

stateElementState

The state of the element.

reset() None#

Resets the counts of all element states to zero.

total_elements() int#

Returns the total number of elements by adding the counts of valid, invalid, and fixed elements.

class RUFAS.data_validator.Modifiability(*values)#

Bases: Enum

Enum class representing the modifiability status of a variable.

This Enum defines various levels of modifiability for a variable, indicating whether a variable is required at initialization and if it can be modified during runtime.

Attributes#

REQUIRED_LOCKEDstr

Indicates the variable must be initialized with a value and cannot be modified thereafter.

REQUIRED_UNLOCKEDstr

Indicates the variable must be initialized with a value but can be modified during runtime.

UNREQUIRED_UNLOCKEDstr

Indicates the variable does not need to be initialized with a value and can be modified during runtime.

REQUIRED_LOCKED: str = 'required locked'#
REQUIRED_UNLOCKED: str = 'required unlocked'#
UNREQUIRED_UNLOCKED: str = 'unrequired unlocked'#
classmethod values() list[str]#

Provides a list of the string values of the enum members.

Returns#

List[str]

A list containing the string values of the enum members.

classmethod get_required_during_initialization() list[Modifiability]#
classmethod get_modifiable_at_runtime() list[Modifiability]#
class RUFAS.data_validator.DataValidator#

Bases: object

This class is will be utilized to validate all types of data across RuFas codebase.

__init__() None#
validate_properties(metadata: dict[str, Any], metadata_depth_limit: int) tuple[bool, str]#

Iteratively traverses the metadata properties to check the max depth and routes properties to be validated by type.

return#

Tuple[bool, str]

boolean to indicate the validation status, error message in str if there’s error that should be raised by the caller.

_validate_metadata_properties_keys(required_properties_keys: set[str], optional_properties_keys: set[str], properties: dict[str, Any], path: list[str]) tuple[bool, str]#

Validates that keys in the metadata properties sections.

_metadata_number_validator(key_path: list[str], value: dict[str, Any]) tuple[bool, str]#

Validates number type properties in metadata.

_metadata_string_validator(key_path: list[str], value: dict[str, Any]) tuple[bool, str]#

Validates string type properties in metadata.

_metadata_bool_validator(key_path: list[str], value: dict[str, Any]) tuple[bool, str]#

Validates bool type properties in metadata.

_metadata_array_validator(key_path: list[str], value: dict[str, Any]) tuple[bool, str]#

Validates array type properties in metadata.

_metadata_object_validator(key_path: list[str], value: dict[str, Any]) tuple[bool, str]#

Validates object type properties in metadata.

validate_metadata(metadata: dict[str, Any], valid_data_types: set[str], address_to_data: str) tuple[bool, str]#

Checks that top-level metadata has valid and required keys and values.

validate_data_by_type(variable_properties: dict[str, Any], variable_path: list[str | int], data: dict[str, Any], eager_termination: bool, properties_blob_key: str, elements_counter: ElementsCounter, called_during_initialization: bool, fixable_data_types: set[str]) bool#

Validates the data based on its specified type.

Parameters#

variable_propertiesDict[str, Any]

A dictionary containing properties relevant to the validation.

variable_pathList[str | int]

The path to the variable being validated.

dataDict[str, Any]

The data to be validated.

eager_terminationbool

If True, the process will be terminated as soon as finding invalid data and failing to fix it.

properties_blob_keystr

The metadata properties for the data file being checked.

elements_counterElementsCounter

A counter to keep track of the number of valid, invalid, and fixed elements.

called_during_initialization: bool

Boolean variable indicating whether the function is being called during initialization.

fixable_data_types: set[str]

Set enumerating the data types that the caller will attempt to fix while validating data.

Returns#

bool

True if the data is valid, False otherwise.

Raises#

KeyError

If the variable’s properties does not specify a “type”.

Notes#

Fixing invalid data will only be attempted if the data is a “simple” type (i.e. a string, bool or number).

_validate_array_container_properties(variable_path: list[str | int], variable_properties: dict[str, Any], data: Any, properties_blob_key: str) bool#

Validates the container properties of an array data element.

Parameters#

variable_pathList[str | int]

The path to the variable being validated.

variable_propertiesDict[str, Any]

The metadata properties for the variable being validated.

dataAny

The data to be validated.

properties_blob_keystr

The metadata properties for the data file being checked.

Returns#

bool

True if the array container properties are valid, False otherwise.

_array_type_validator(variable_path: list[str | int], variable_properties: dict[str, Any], data: dict[str, Any], eager_termination: bool, properties_blob_key: str, elements_counter: ElementsCounter, called_during_initialization: bool, fixable_data_types: set[str]) bool#

Validates a data element of type array.

Parameters#

variable_pathList[str | int]

The path to the variable being validated.

variable_propertiesDict[str, Any]

The metadata properties for the variable being validated.

dataDict[str, Any]

The data to be validated.

eager_terminationbool

If True, the process will be terminated upon finding invalid data.

properties_blob_keystr

The metadata properties for the data file being checked.

elements_counterElementsCounter

A counter to keep track of the number of valid, invalid, and fixed elements.

called_during_initialization: bool

Boolean variable indicating whether the function is being called during initialization.

fixable_data_types: set[str]

Set of data types that are fixable.

Returns#

bool

True if the data element is valid or fixable, False otherwise.

_object_type_validator(variable_path: list[str | int], variable_properties: dict[str, Any], data: dict[str, Any], eager_termination: bool, properties_blob_key: str, elements_counter: ElementsCounter, called_during_initialization: bool, fixable_data_types: set[str]) bool#

Validates a data element of type object.

Parameters#

variable_pathList[str | int]

The path to the variable being validated.

variable_propertiesDict[str, Any]

The metadata properties for the variable being validated.

dataDict[str, Any]

The data to be validated.

eager_terminationbool

If True, the process will be terminated upon finding invalid data.

properties_blob_keystr

The metadata properties for the data file being checked.

elements_counterElementsCounter

A counter to keep track of the number of valid, invalid, and fixed elements.

called_during_initialization: bool

Boolean variable indicating whether the function is being called during initialization.

Returns#

bool

True if the data element is valid or fixable, False otherwise.

Notes#

This method will look for and delete any keys in the data that do not have properties specified for them in the metadata properties.

_number_type_validator(variable_path: list[str | int], variable_properties: dict[str, Any], data: dict[str, Any], eager_termination: bool, properties_blob_key: str, elements_counter: ElementsCounter, called_during_initialization: bool, fixable_data_types: set[str]) bool#

Validates an data number element.

_string_type_validator(variable_path: list[str | int], variable_properties: dict[str, Any], data: dict[str, Any], eager_termination: bool, properties_blob_key: str, elements_counter: ElementsCounter, called_during_initialization: bool, fixable_data_types: set[str]) bool#

Validates a data string element.

_bool_type_validator(variable_path: list[str | int], variable_properties: dict[str, Any], data: dict[str, Any], eager_termination: bool, properties_blob_key: str, elements_counter: ElementsCounter, called_during_initialization: bool, fixable_data_types: set[str]) bool#

Validates a data bool element.

_fix_data(variable_properties: dict[str, Any], element_hierarchy: list[str | int], data: dict[str, Any], properties_blob_key: str) bool#

Attempt to fix the invalid data.

Parameters#

variable_propertiesdict[str, Any]

The properties for the variable of interest.

element_hierarchy: list

A list indicating the path to reach the variable of interest in self.__metadata and self.__pool.

data: dict[str, Any]

A buffer dictionary that holds the data for validation and fixing.

properties_blob_keystr

The metadata properties section keyword for the data file being checked.

Returns#

bool

True if the data is fixed, False otherwise.

_extract_data_by_key_list(data: list[Any] | dict[str, Any], variable_path: Sequence[str | int], variable_properties: dict[str, Any], called_during_initialization: bool) Any#

Extracts a value from the data based on a specified path and handles missing data by calling DataValidator._log_missing_data().

Parameters#

dataList[Any] | Dict[str, Any]

The data containing the value to be extracted.

variable_pathList[str | int]

A list of keys to be used to extract the value from the data.

variable_propertiesDict[str, Any]

The metadata properties for the variable being validated.

called_during_initialization: bool

Boolean variable indicating whether the function is being called during initialization.

Returns#

Any

The value extracted from the data if found. None if not found.

Notes#

This function navigates through the given data (which can be a list or a dictionary) following the path specified in variable_path. If the path leads to a value, it is returned. If a KeyError occurs during this process (i.e., a key or index is missing in the path), the function extracts the variable name by finding the last string element in the variable_path array and handles this missing data by calling DataValidator._log_missing_data().

_log_missing_data(variable_properties: dict[str, Any], var_name: str, called_during_initialization: bool) None#

Handles logging for missing data for a variable, logging errors or warnings based on the context of initialization or runtime updates.

Parameters#

variable_propertiesDict[str, Any]

Properties of the variable, potentially including its modifiability status.

var_namestr

The name of the variable with missing data.

called_during_initialization: bool

Boolean variable indicating whether the function is being called during initialization

Raises#

KeyError

Raised if the missing data is deemed necessary, either during initialization or for a runtime update.

Notes#

This function determines if it’s being called during the initialization phase and checks if the missing variable data is required at this stage using ‘_is_data_required_upon_initialization’. If required, it logs an error and raises a KeyError. If not, it logs a warning.

_is_data_required_upon_initialization(variable_name: str, variable_properties: dict[str, Any]) bool#

Determines whether a variable requires a data value upon initialization based on its modifiability status.

This function utilizes the ‘_get_variable_modifiability’ method to ascertain the modifiability status of the variable identified by ‘variable_name’ and described by ‘variable_properties’. It then checks if the modifiability status is either ‘REQUIRED_AND_LOCKED’ or ‘REQUIRED_AND_UNLOCKED’, indicating that the variable must be initialized with a value.

Parameters#

variable_namestr

The name of the variable being evaluated for its initialization requirements.

variable_propertiesDict[str, Any]

A dictionary containing the properties of the variable, which should include its modifiability status among others.

Returns#

bool

True if the variable’s modifiability status necessitates a data value upon initialization, False otherwise.

_get_variable_modifiability(variable_name: str, variable_properties: dict[str, Any]) Modifiability#

Determines the modifiability status of a variable based on its properties and returns the corresponding enum value.

Notes#

This function looks for a ‘modifiability’ key within variable_properties. If present and its value is not empty, the function attempts to map this value to an enum member in Modifiability. If the value does not correspond to any enum members, a KeyError is raised after logging the error. If ‘modifiability’ is absent or its value is empty, the function defaults to Modifiability.NOT_REQUIRED_AND_UNLOCKED.

Parameters#

variable_namestr

The name of the variable for which the modifiability status is being determined. Used for error logging.

variable_propertiesDict[str, Any]

A dictionary containing the properties of the variable, containing the desired ‘modifiability’ property.

Returns#

Modifiability

An enum member representing the variable’s modifiability status.

Raises#

KeyError

If ‘modifiability’ in variable_properties does not match any enum member in Modifiability. The error message includes the invalid modifiability value and suggests valid values.

convert_variable_path_to_str(variable_path: list[str | int]) str#

Converts a list of keys (int or str) into a string representation of the path to a variable.

Parameters#

variable_pathList[str | int]

A list of keys to be used to extract the value from the data.

Returns#

str

A string representation of the path to a variable.

Examples#

>>> input_manager = InputManager()
>>> var_path = ["animal", "herd_information", "calf_num"]
>>> DataValidator.convert_variable_path_to_str(var_path)
'animal.herd_information.calf_num'
>>> input_manager = InputManager()
>>> var_path = ["manure_management_scenarios", 0, "bedding_type"]
>>> DataValidator.convert_variable_path_to_str(var_path)
'manure_management_scenarios.[0].bedding_type'
extract_value_by_key_list(data: list[Any] | dict[str, Any], variable_path: Sequence[str | int]) Any#

Extracts a value from a nested list or dictionary using a list of keys (int or str).

Parameters#

dataList[Any] | Dict[str, Any]

The data containing the value to be extracted.

variable_pathList[str | int]

A list of keys to be used to extract the value from the data.

Returns#

Any

The value extracted from the data.

Raises#

KeyError

If the value cannot be extracted from the data using the provided variable path.

Examples#

>>> data_validator = DataValidator()
>>> example_data = {
...     "animal": {
...         "herd_information": {
...             "calf_num": 8,
...             "heiferI_num": 44,
...             "heiferII_num": 38,
...             "heiferIII_num_springers": 12
...         }
...     }
... }
>>> var_path = ["animal", "herd_information", "calf_num"]
>>> DataValidator.extract_value_by_key_list(example_data, var_path)
8
>>> data_validator = DataValidator()
>>> example_data = {
...     "manure_management_scenarios": [
...         {
...             "bedding_type": "straw",
...             "manure_handler": "manual scraping"
...         },
...         {
...             "bedding_type": "sawdust",
...             "manure_handler": "flush system"
...         }
...     ]
... }
>>> var_path = ["manure_management_scenarios", 0, "bedding_type"]
>>> DataValidator.extract_value_by_key_list(example_data, var_path)
'straw'