RUFAS.data_validator module#

class RUFAS.data_validator.ElementState(*values)#

Bases: Enum

An enumeration of the states a data element can be in during validation. An element cannot be in more than one state at a time.

Attributes#

VALIDint

The element is valid.

INVALIDint

The element is invalid and cannot be fixed.

FIXEDint

The element is invalid initially but has been fixed.

VALID = 'valid'#
INVALID = 'invalid'#
FIXED = 'fixed'#
class RUFAS.data_validator.ElementsCounter#

Bases: object

A class to keep track of the number of elements in each state during validation.

Attributes#

valid_elementsint

The number of valid elements.

invalid_elementsint

The number of invalid elements.

fixed_elementsint

The number of fixed elements.

__init__() None#
update(state: ElementState, value: int) None#

Updates the count of elements in a given state.

Parameters#

stateElementState

The state of the element.

valueint

The value by which the count should be updated.

Raises#

ValueError

If the state is not one of the valid states.

increment(state: ElementState) None#

Increments the count of elements in a given state by one.

Parameters#

stateElementState

The state of the element.

reset() None#

Resets the counts of all element states to zero.

total_elements() int#

Returns the total number of elements by adding the counts of valid, invalid, and fixed elements.

class RUFAS.data_validator.Modifiability(*values)#

Bases: Enum

Enum class representing the modifiability status of a variable.

This Enum defines various levels of modifiability for a variable, indicating whether a variable is required at initialization and if it can be modified during runtime.

Attributes#

REQUIRED_LOCKEDstr

Indicates the variable must be initialized with a value and cannot be modified thereafter.

REQUIRED_UNLOCKEDstr

Indicates the variable must be initialized with a value but can be modified during runtime.

UNREQUIRED_UNLOCKEDstr

Indicates the variable does not need to be initialized with a value and can be modified during runtime.

REQUIRED_LOCKED = 'required locked'#
REQUIRED_UNLOCKED = 'required unlocked'#
UNREQUIRED_UNLOCKED = 'unrequired unlocked'#
classmethod values() list[str]#

Provides a list of the string values of the enum members.

Returns#

List[str]

A list containing the string values of the enum members.

classmethod get_required_during_initialization() list[Modifiability]#
classmethod get_modifiable_at_runtime() list[Modifiability]#
class RUFAS.data_validator.DataValidator#

Bases: object

This class is will be utilized to validate all types of data across RuFas codebase.

__init__() None#
validate_properties(metadata: dict[str, Any], metadata_depth_limit: int) tuple[bool, str]#

Iteratively traverses the metadata properties to check the max depth and routes properties to be validated by type.

return#

Tuple[bool, str]

boolean to indicate the validation status, error message in str if there’s error that should be raised by the caller.

_validate_metadata_properties_keys(required_properties_keys: set[str], optional_properties_keys: set[str], properties: dict[str, Any], path: list[str]) tuple[bool, str]#

Validates that keys in the metadata properties sections.

_metadata_number_validator(key_path: list[str], value: dict[str, Any]) tuple[bool, str]#

Validates number type properties in metadata.

_metadata_string_validator(key_path: list[str], value: dict[str, Any]) tuple[bool, str]#

Validates string type properties in metadata.

_metadata_bool_validator(key_path: list[str], value: dict[str, Any]) tuple[bool, str]#

Validates bool type properties in metadata.

_metadata_array_validator(key_path: list[str], value: dict[str, Any]) tuple[bool, str]#

Validates array type properties in metadata.

_metadata_object_validator(key_path: list[str], value: dict[str, Any]) tuple[bool, str]#

Validates object type properties in metadata.

validate_metadata(metadata: dict[str, Any], valid_data_types: set[str], address_to_data: str) tuple[bool, str]#

Checks that top-level metadata has valid and required keys and values.

validate_data_by_type(variable_properties: dict[str, Any], variable_path: list[str | int], data: dict[str, Any], eager_termination: bool, properties_blob_key: str, elements_counter: ElementsCounter, called_during_initialization: bool, fixable_data_types: set[str]) bool#

Validates the data based on its specified type.

Parameters#

variable_propertiesDict[str, Any]

A dictionary containing properties relevant to the validation.

variable_pathList[str | int]

The path to the variable being validated.

dataDict[str, Any]

The data to be validated.

eager_terminationbool

If True, the process will be terminated as soon as finding invalid data and failing to fix it.

properties_blob_keystr

The metadata properties for the data file being checked.

elements_counterElementsCounter

A counter to keep track of the number of valid, invalid, and fixed elements.

called_during_initialization: bool

Boolean variable indicating whether the function is being called during initialization.

fixable_data_types: set[str]

Set enumerating the data types that the caller will attempt to fix while validating data.

Returns#

bool

True if the data is valid, False otherwise.

Raises#

KeyError

If the variable’s properties does not specify a “type”.

Notes#

Fixing invalid data will only be attempted if the data is a “simple” type (i.e. a string, bool or number).

_validate_array_container_properties(variable_path: list[str | int], variable_properties: dict[str, Any], data: Any, properties_blob_key: str) bool#

Validates the container properties of an array data element.

Parameters#

variable_pathList[str | int]

The path to the variable being validated.

variable_propertiesDict[str, Any]

The metadata properties for the variable being validated.

dataAny

The data to be validated.

properties_blob_keystr

The metadata properties for the data file being checked.

Returns#

bool

True if the array container properties are valid, False otherwise.

_array_type_validator(variable_path: list[str | int], variable_properties: dict[str, Any], data: dict[str, Any], eager_termination: bool, properties_blob_key: str, elements_counter: ElementsCounter, called_during_initialization: bool, fixable_data_types: set[str]) bool#

Validates a data element of type array.

Parameters#

variable_pathList[str | int]

The path to the variable being validated.

variable_propertiesDict[str, Any]

The metadata properties for the variable being validated.

dataDict[str, Any]

The data to be validated.

eager_terminationbool

If True, the process will be terminated upon finding invalid data.

properties_blob_keystr

The metadata properties for the data file being checked.

elements_counterElementsCounter

A counter to keep track of the number of valid, invalid, and fixed elements.

called_during_initialization: bool

Boolean variable indicating whether the function is being called during initialization.

fixable_data_types: set[str]

Set of data types that are fixable.

Returns#

bool

True if the data element is valid or fixable, False otherwise.

_object_type_validator(variable_path: list[str | int], variable_properties: dict[str, Any], data: dict[str, Any], eager_termination: bool, properties_blob_key: str, elements_counter: ElementsCounter, called_during_initialization: bool, fixable_data_types: set[str]) bool#

Validates a data element of type object.

Parameters#

variable_pathList[str | int]

The path to the variable being validated.

variable_propertiesDict[str, Any]

The metadata properties for the variable being validated.

dataDict[str, Any]

The data to be validated.

eager_terminationbool

If True, the process will be terminated upon finding invalid data.

properties_blob_keystr

The metadata properties for the data file being checked.

elements_counterElementsCounter

A counter to keep track of the number of valid, invalid, and fixed elements.

called_during_initialization: bool

Boolean variable indicating whether the function is being called during initialization.

Returns#

bool

True if the data element is valid or fixable, False otherwise.

Notes#

This method will look for and delete any keys in the data that do not have properties specified for them in the metadata properties.

_number_type_validator(variable_path: list[str | int], variable_properties: dict[str, Any], data: dict[str, Any], eager_termination: bool, properties_blob_key: str, elements_counter: ElementsCounter, called_during_initialization: bool, fixable_data_types: set[str]) bool#

Validates an data number element.

_string_type_validator(variable_path: list[str | int], variable_properties: dict[str, Any], data: dict[str, Any], eager_termination: bool, properties_blob_key: str, elements_counter: ElementsCounter, called_during_initialization: bool, fixable_data_types: set[str]) bool#

Validates a data string element.

_bool_type_validator(variable_path: list[str | int], variable_properties: dict[str, Any], data: dict[str, Any], eager_termination: bool, properties_blob_key: str, elements_counter: ElementsCounter, called_during_initialization: bool, fixable_data_types: set[str]) bool#

Validates a data bool element.

_fix_data(variable_properties: dict[str, Any], element_hierarchy: list[str | int], data: dict[str, Any], properties_blob_key: str) bool#

Attempt to fix the invalid data.

Parameters#

variable_propertiesdict[str, Any]

The properties for the variable of interest.

element_hierarchy: list

A list indicating the path to reach the variable of interest in self.__metadata and self.__pool.

data: dict[str, Any]

A buffer dictionary that holds the data for validation and fixing.

properties_blob_keystr

The metadata properties section keyword for the data file being checked.

Returns#

bool

True if the data is fixed, False otherwise.

_extract_data_by_key_list(data: list[Any] | dict[str, Any], variable_path: Sequence[str | int], variable_properties: dict[str, Any], called_during_initialization: bool) Any#

Extracts a value from the data based on a specified path and handles missing data by calling DataValidator._log_missing_data().

Parameters#

dataList[Any] | Dict[str, Any]

The data containing the value to be extracted.

variable_pathList[str | int]

A list of keys to be used to extract the value from the data.

variable_propertiesDict[str, Any]

The metadata properties for the variable being validated.

called_during_initialization: bool

Boolean variable indicating whether the function is being called during initialization.

Returns#

Any

The value extracted from the data if found. None if not found.

Notes#

This function navigates through the given data (which can be a list or a dictionary) following the path specified in variable_path. If the path leads to a value, it is returned. If a KeyError occurs during this process (i.e., a key or index is missing in the path), the function extracts the variable name by finding the last string element in the variable_path array and handles this missing data by calling DataValidator._log_missing_data().

_log_missing_data(variable_properties: dict[str, Any], var_name: str, called_during_initialization: bool) None#

Handles logging for missing data for a variable, logging errors or warnings based on the context of initialization or runtime updates.

Parameters#

variable_propertiesDict[str, Any]

Properties of the variable, potentially including its modifiability status.

var_namestr

The name of the variable with missing data.

called_during_initialization: bool

Boolean variable indicating whether the function is being called during initialization

Raises#

KeyError

Raised if the missing data is deemed necessary, either during initialization or for a runtime update.

Notes#

This function determines if it’s being called during the initialization phase and checks if the missing variable data is required at this stage using ‘_is_data_required_upon_initialization’. If required, it logs an error and raises a KeyError. If not, it logs a warning.

_is_data_required_upon_initialization(variable_name: str, variable_properties: dict[str, Any]) bool#

Determines whether a variable requires a data value upon initialization based on its modifiability status.

This function utilizes the ‘_get_variable_modifiability’ method to ascertain the modifiability status of the variable identified by ‘variable_name’ and described by ‘variable_properties’. It then checks if the modifiability status is either ‘REQUIRED_AND_LOCKED’ or ‘REQUIRED_AND_UNLOCKED’, indicating that the variable must be initialized with a value.

Parameters#

variable_namestr

The name of the variable being evaluated for its initialization requirements.

variable_propertiesDict[str, Any]

A dictionary containing the properties of the variable, which should include its modifiability status among others.

Returns#

bool

True if the variable’s modifiability status necessitates a data value upon initialization, False otherwise.

_get_variable_modifiability(variable_name: str, variable_properties: dict[str, Any]) Modifiability#

Determines the modifiability status of a variable based on its properties and returns the corresponding enum value.

Notes#

This function looks for a ‘modifiability’ key within variable_properties. If present and its value is not empty, the function attempts to map this value to an enum member in Modifiability. If the value does not correspond to any enum members, a KeyError is raised after logging the error. If ‘modifiability’ is absent or its value is empty, the function defaults to Modifiability.NOT_REQUIRED_AND_UNLOCKED.

Parameters#

variable_namestr

The name of the variable for which the modifiability status is being determined. Used for error logging.

variable_propertiesDict[str, Any]

A dictionary containing the properties of the variable, containing the desired ‘modifiability’ property.

Returns#

Modifiability

An enum member representing the variable’s modifiability status.

Raises#

KeyError

If ‘modifiability’ in variable_properties does not match any enum member in Modifiability. The error message includes the invalid modifiability value and suggests valid values.

convert_variable_path_to_str(variable_path: list[str | int]) str#

Converts a list of keys (int or str) into a string representation of the path to a variable.

Parameters#

variable_pathList[str | int]

A list of keys to be used to extract the value from the data.

Returns#

str

A string representation of the path to a variable.

Examples#

>>> input_manager = InputManager()
>>> var_path = ["animal", "herd_information", "calf_num"]
>>> DataValidator.convert_variable_path_to_str(var_path)
'animal.herd_information.calf_num'
>>> input_manager = InputManager()
>>> var_path = ["manure_management_scenarios", 0, "bedding_type"]
>>> DataValidator.convert_variable_path_to_str(var_path)
'manure_management_scenarios.[0].bedding_type'
extract_value_by_key_list(data: list[Any] | dict[str, Any], variable_path: Sequence[str | int]) Any#

Extracts a value from a nested list or dictionary using a list of keys (int or str).

Parameters#

dataList[Any] | Dict[str, Any]

The data containing the value to be extracted.

variable_pathList[str | int]

A list of keys to be used to extract the value from the data.

Returns#

Any

The value extracted from the data.

Raises#

KeyError

If the value cannot be extracted from the data using the provided variable path.

Examples#

>>> data_validator = DataValidator()
>>> example_data = {
...     "animal": {
...         "herd_information": {
...             "calf_num": 8,
...             "heiferI_num": 44,
...             "heiferII_num": 38,
...             "heiferIII_num_springers": 12
...         }
...     }
... }
>>> var_path = ["animal", "herd_information", "calf_num"]
>>> DataValidator.extract_value_by_key_list(example_data, var_path)
8
>>> data_validator = DataValidator()
>>> example_data = {
...     "manure_management_scenarios": [
...         {
...             "bedding_type": "straw",
...             "manure_handler": "manual scraping"
...         },
...         {
...             "bedding_type": "sawdust",
...             "manure_handler": "flush system"
...         }
...     ]
... }
>>> var_path = ["manure_management_scenarios", 0, "bedding_type"]
>>> DataValidator.extract_value_by_key_list(example_data, var_path)
'straw'
class RUFAS.data_validator.CrossValidator#

Bases: object

This class is will be utilized for cross-validation.

Attributes#

_alias_pooldict[str, Any]

Alias pool storing data for cross validation.

_event_logslist[dict[str, str | dict[str, str]]]

Logs for the events that will be handled by output manager.

relation_mappingdict[str, Any]

A mapping for all the supported relationship evaluation functions.

__init__() None#
cross_validate_data(im_variable_pool: dict[str, Any], cross_validation_rules: list[dict[str, Any]]) bool#

Performs cross-validation on the provided data using the provided cross validation rules.

Parameters#

im_variable_pooldict[str, Any]

A dictionary containing the InputManager variable pool to be validated.

cross_validation_ruleslist[dict[str, Any]]

A list of dictionaries containing the cross-validation rules to be applied.

Returns#

bool

A boolean indicating whether the data passed cross-validation.

_save_to_alias_pool(alias_name: str, value: Any) None#

Saves a value to the alias pool with the specified alias name.

Parameters#

alias_namestr

The name of the alias to be saved.

valueAny

The value to be saved.

_get_alias_value(alias_name: str, eager_termination: bool) Any#

Retrieves the value associated with the specified alias name from the alias pool.

Parameters#

alias_namestr

The alias of the value to retrieve.

eager_terminationbool

Whether to raise an error if the expression is not successfully evaluated.

Returns#

Any

The value associated with the specified alias name from the alias pool.

Raises#

KeyError

Raises the error when the alias name provided does not have value in the alias pool.

_target_and_save(target_and_save_result: dict[str, Any]) None#

This function handles the “target and save block” in the cross-validation rule. It retrieves the value of the target variable from the InputManager variable pool and saves it to the alias pool with the specified alias name. It also saves the constants defined in the “constants” block to the alias pool with the specified alias.

Parameters#

target_and_save_resultdict[str, dict[str, Any]]

A dictionary containing the “target and save block” of the cross-validation rule.

check_target_and_save_block(target_and_save_block: dict[str, dict[str, Any]], eager_termination: bool) None#

Check if the target and save block is valid.

_evaluate_expression(expression_block: dict[str, Any], eager_termination: bool) tuple[Any, bool]#

Evaluates an expression based on the provided expression block. This function also optionally adds to the alias pool if the save_as key is present in the expression block.

Parameters#

expression_blockdict[str, Any]

A dictionary containing the expression block to be evaluated.

eager_terminationbool

Whether to raise an error if the expression is not successfully evaluated.

Returns#

tuple[Any, bool]

The result of the expression evaluation and a boolean indicating whether the expression was successfully evaluated.

Notes#

Expression block: >>> { … “operation”: “sum | difference | average | product | no_op”, # optional, defaults to “no_op” … “apply_to”: “individual | group”, # optional … “ordered_variables”: [“alias_0”, “alias_1”], … “save_as”: “alias_2” # optional … }

_validate_expression_block_with_complex_variable_values(expression_block: dict[str, Any], ordered_values: list[Any], eager_termination: bool) bool#

Validates an expression block when it contains complex variables.

This method checks the validity of an expression block if it includes complex variables (such as lists or dictionaries) and ensures it adheres to predefined rules. Validation errors are logged, and eager termination behavior is enforced if specified.

Parameters#

expression_blockdict[str, Any]

A dictionary representing the expression block to be validated.

ordered_valueslist[Any]

A list of variables involved in the evaluation. Only one list or dictionary variable is permitted for cross-validation in a single block.

eager_terminationbool

Specifies whether to immediately terminate the process when a validation error is encountered.

Returns#

bool

Returns True if the expression block is valid, otherwise False if eager termination is disabled.

_evaluate_condition(condition_clause: dict[str, Any], eager_termination: bool) bool#

Evaluates if a single condition is satisfied based on the provided condition clause.

Parameters#

condition_clausedict[str, Any]

The condition clause to be evaluated.

eager_terminationbool

Specifies whether to immediately terminate the process when a validation error is encountered.

Returns#

bool

A boolean indicating whether the condition is satisfied.

_validate_condition_clause(condition_clause: dict[str, Any], eager_termination: bool) bool#

Validate the whole condition block.

_log_missing_condition_clause_field(missing_field: str) None#

Helper method to log the missing essential field in conditional clause.

_validate_relationship(relationship: Any, eager_termination: bool) bool#

Validate if a valid relationship check is given.

_evaluate_equal_condition(left_hand_value: Any, right_hand_value: Any) Any#

Evaluates equal condition.

_evaluate_greater_condition(left_hand_value: Any, right_hand_value: Any) Any#

Evaluates greater than condition

_evaluate_is_null(left_hand_value: Any) bool#

Evaluates is null condition.

_evaluate_is_type(left_hand_value: Any, data_type: Any, eager_termination: bool) bool#

Evaluates the if_type condition

_evaluate_regex(left_hand_value: Any, right_hand_value: Any) bool#

Check if a value matches a given regex pattern.

Parameters#

left_hand_valuestr

The string to check.

right_hand_valuestr

The regex pattern to match.

Returns#

bool

True if the value fully matches the regex pattern, otherwise False.

_evaluate_condition_clause_array(condition_clause_array: list[dict[str, Any]], eager_termination: bool) bool#

Evaluates if all conditions in the provided condition clause array are satisfied.

Parameters#

condition_clause_arraylist[dict[str, Any]]

An array of condition clauses to be evaluated.

eager_terminationbool

Specifies whether to immediately terminate the process when a validation error is encountered.

Returns#

bool

A boolean indicating whether all conditions in the array are satisfied.