RUFAS.data_validator module#
- class RUFAS.data_validator.ElementState(*values)#
Bases:
Enum
An enumeration of the states a data element can be in during validation. An element cannot be in more than one state at a time.
Attributes#
- VALIDint
The element is valid.
- INVALIDint
The element is invalid and cannot be fixed.
- FIXEDint
The element is invalid initially but has been fixed.
- VALID = 'valid'#
- INVALID = 'invalid'#
- FIXED = 'fixed'#
- class RUFAS.data_validator.ElementsCounter#
Bases:
object
A class to keep track of the number of elements in each state during validation.
Attributes#
- valid_elementsint
The number of valid elements.
- invalid_elementsint
The number of invalid elements.
- fixed_elementsint
The number of fixed elements.
- __init__() None #
- update(state: ElementState, value: int) None #
Updates the count of elements in a given state.
Parameters#
- stateElementState
The state of the element.
- valueint
The value by which the count should be updated.
Raises#
- ValueError
If the state is not one of the valid states.
- increment(state: ElementState) None #
Increments the count of elements in a given state by one.
Parameters#
- stateElementState
The state of the element.
- reset() None #
Resets the counts of all element states to zero.
- total_elements() int #
Returns the total number of elements by adding the counts of valid, invalid, and fixed elements.
- class RUFAS.data_validator.Modifiability(*values)#
Bases:
Enum
Enum class representing the modifiability status of a variable.
This Enum defines various levels of modifiability for a variable, indicating whether a variable is required at initialization and if it can be modified during runtime.
Attributes#
- REQUIRED_LOCKEDstr
Indicates the variable must be initialized with a value and cannot be modified thereafter.
- REQUIRED_UNLOCKEDstr
Indicates the variable must be initialized with a value but can be modified during runtime.
- UNREQUIRED_UNLOCKEDstr
Indicates the variable does not need to be initialized with a value and can be modified during runtime.
- REQUIRED_LOCKED: str = 'required locked'#
- REQUIRED_UNLOCKED: str = 'required unlocked'#
- UNREQUIRED_UNLOCKED: str = 'unrequired unlocked'#
- classmethod values() list[str] #
Provides a list of the string values of the enum members.
Returns#
- List[str]
A list containing the string values of the enum members.
- classmethod get_required_during_initialization() list[Modifiability] #
- classmethod get_modifiable_at_runtime() list[Modifiability] #
- class RUFAS.data_validator.DataValidator#
Bases:
object
This class is will be utilized to validate all types of data across RuFas codebase.
- __init__() None #
- validate_properties(metadata: dict[str, Any], metadata_depth_limit: int) tuple[bool, str] #
Iteratively traverses the metadata properties to check the max depth and routes properties to be validated by type.
return#
- Tuple[bool, str]
boolean to indicate the validation status, error message in str if there’s error that should be raised by the caller.
- _validate_metadata_properties_keys(required_properties_keys: set[str], optional_properties_keys: set[str], properties: dict[str, Any], path: list[str]) tuple[bool, str] #
Validates that keys in the metadata properties sections.
- _metadata_number_validator(key_path: list[str], value: dict[str, Any]) tuple[bool, str] #
Validates number type properties in metadata.
- _metadata_string_validator(key_path: list[str], value: dict[str, Any]) tuple[bool, str] #
Validates string type properties in metadata.
- _metadata_bool_validator(key_path: list[str], value: dict[str, Any]) tuple[bool, str] #
Validates bool type properties in metadata.
- _metadata_array_validator(key_path: list[str], value: dict[str, Any]) tuple[bool, str] #
Validates array type properties in metadata.
- _metadata_object_validator(key_path: list[str], value: dict[str, Any]) tuple[bool, str] #
Validates object type properties in metadata.
- validate_metadata(metadata: dict[str, Any], valid_data_types: set[str], address_to_data: str) tuple[bool, str] #
Checks that top-level metadata has valid and required keys and values.
- validate_data_by_type(variable_properties: dict[str, Any], variable_path: list[str | int], data: dict[str, Any], eager_termination: bool, properties_blob_key: str, elements_counter: ElementsCounter, called_during_initialization: bool, fixable_data_types: set[str]) bool #
Validates the data based on its specified type.
Parameters#
- variable_propertiesDict[str, Any]
A dictionary containing properties relevant to the validation.
- variable_pathList[str | int]
The path to the variable being validated.
- dataDict[str, Any]
The data to be validated.
- eager_terminationbool
If True, the process will be terminated as soon as finding invalid data and failing to fix it.
- properties_blob_keystr
The metadata properties for the data file being checked.
- elements_counterElementsCounter
A counter to keep track of the number of valid, invalid, and fixed elements.
- called_during_initialization: bool
Boolean variable indicating whether the function is being called during initialization.
- fixable_data_types: set[str]
Set enumerating the data types that the caller will attempt to fix while validating data.
Returns#
- bool
True if the data is valid, False otherwise.
Raises#
- KeyError
If the variable’s properties does not specify a “type”.
Notes#
Fixing invalid data will only be attempted if the data is a “simple” type (i.e. a string, bool or number).
- _validate_array_container_properties(variable_path: list[str | int], variable_properties: dict[str, Any], data: Any, properties_blob_key: str) bool #
Validates the container properties of an array data element.
Parameters#
- variable_pathList[str | int]
The path to the variable being validated.
- variable_propertiesDict[str, Any]
The metadata properties for the variable being validated.
- dataAny
The data to be validated.
- properties_blob_keystr
The metadata properties for the data file being checked.
Returns#
- bool
True if the array container properties are valid, False otherwise.
- _array_type_validator(variable_path: list[str | int], variable_properties: dict[str, Any], data: dict[str, Any], eager_termination: bool, properties_blob_key: str, elements_counter: ElementsCounter, called_during_initialization: bool, fixable_data_types: set[str]) bool #
Validates a data element of type array.
Parameters#
- variable_pathList[str | int]
The path to the variable being validated.
- variable_propertiesDict[str, Any]
The metadata properties for the variable being validated.
- dataDict[str, Any]
The data to be validated.
- eager_terminationbool
If True, the process will be terminated upon finding invalid data.
- properties_blob_keystr
The metadata properties for the data file being checked.
- elements_counterElementsCounter
A counter to keep track of the number of valid, invalid, and fixed elements.
- called_during_initialization: bool
Boolean variable indicating whether the function is being called during initialization.
- fixable_data_types: set[str]
Set of data types that are fixable.
Returns#
- bool
True if the data element is valid or fixable, False otherwise.
- _object_type_validator(variable_path: list[str | int], variable_properties: dict[str, Any], data: dict[str, Any], eager_termination: bool, properties_blob_key: str, elements_counter: ElementsCounter, called_during_initialization: bool, fixable_data_types: set[str]) bool #
Validates a data element of type object.
Parameters#
- variable_pathList[str | int]
The path to the variable being validated.
- variable_propertiesDict[str, Any]
The metadata properties for the variable being validated.
- dataDict[str, Any]
The data to be validated.
- eager_terminationbool
If True, the process will be terminated upon finding invalid data.
- properties_blob_keystr
The metadata properties for the data file being checked.
- elements_counterElementsCounter
A counter to keep track of the number of valid, invalid, and fixed elements.
- called_during_initialization: bool
Boolean variable indicating whether the function is being called during initialization.
Returns#
- bool
True if the data element is valid or fixable, False otherwise.
Notes#
This method will look for and delete any keys in the data that do not have properties specified for them in the metadata properties.
- _number_type_validator(variable_path: list[str | int], variable_properties: dict[str, Any], data: dict[str, Any], eager_termination: bool, properties_blob_key: str, elements_counter: ElementsCounter, called_during_initialization: bool, fixable_data_types: set[str]) bool #
Validates an data number element.
- _string_type_validator(variable_path: list[str | int], variable_properties: dict[str, Any], data: dict[str, Any], eager_termination: bool, properties_blob_key: str, elements_counter: ElementsCounter, called_during_initialization: bool, fixable_data_types: set[str]) bool #
Validates a data string element.
- _bool_type_validator(variable_path: list[str | int], variable_properties: dict[str, Any], data: dict[str, Any], eager_termination: bool, properties_blob_key: str, elements_counter: ElementsCounter, called_during_initialization: bool, fixable_data_types: set[str]) bool #
Validates a data bool element.
- _fix_data(variable_properties: dict[str, Any], element_hierarchy: list[str | int], data: dict[str, Any], properties_blob_key: str) bool #
Attempt to fix the invalid data.
Parameters#
- variable_propertiesdict[str, Any]
The properties for the variable of interest.
- element_hierarchy: list
A list indicating the path to reach the variable of interest in self.__metadata and self.__pool.
- data: dict[str, Any]
A buffer dictionary that holds the data for validation and fixing.
- properties_blob_keystr
The metadata properties section keyword for the data file being checked.
Returns#
- bool
True if the data is fixed, False otherwise.
- _extract_data_by_key_list(data: list[Any] | dict[str, Any], variable_path: Sequence[str | int], variable_properties: dict[str, Any], called_during_initialization: bool) Any #
Extracts a value from the data based on a specified path and handles missing data by calling DataValidator._log_missing_data().
Parameters#
- dataList[Any] | Dict[str, Any]
The data containing the value to be extracted.
- variable_pathList[str | int]
A list of keys to be used to extract the value from the data.
- variable_propertiesDict[str, Any]
The metadata properties for the variable being validated.
- called_during_initialization: bool
Boolean variable indicating whether the function is being called during initialization.
Returns#
- Any
The value extracted from the data if found. None if not found.
Notes#
This function navigates through the given data (which can be a list or a dictionary) following the path specified in variable_path. If the path leads to a value, it is returned. If a KeyError occurs during this process (i.e., a key or index is missing in the path), the function extracts the variable name by finding the last string element in the variable_path array and handles this missing data by calling DataValidator._log_missing_data().
- _log_missing_data(variable_properties: dict[str, Any], var_name: str, called_during_initialization: bool) None #
Handles logging for missing data for a variable, logging errors or warnings based on the context of initialization or runtime updates.
Parameters#
- variable_propertiesDict[str, Any]
Properties of the variable, potentially including its modifiability status.
- var_namestr
The name of the variable with missing data.
- called_during_initialization: bool
Boolean variable indicating whether the function is being called during initialization
Raises#
- KeyError
Raised if the missing data is deemed necessary, either during initialization or for a runtime update.
Notes#
This function determines if it’s being called during the initialization phase and checks if the missing variable data is required at this stage using ‘_is_data_required_upon_initialization’. If required, it logs an error and raises a KeyError. If not, it logs a warning.
- _is_data_required_upon_initialization(variable_name: str, variable_properties: dict[str, Any]) bool #
Determines whether a variable requires a data value upon initialization based on its modifiability status.
This function utilizes the ‘_get_variable_modifiability’ method to ascertain the modifiability status of the variable identified by ‘variable_name’ and described by ‘variable_properties’. It then checks if the modifiability status is either ‘REQUIRED_AND_LOCKED’ or ‘REQUIRED_AND_UNLOCKED’, indicating that the variable must be initialized with a value.
Parameters#
- variable_namestr
The name of the variable being evaluated for its initialization requirements.
- variable_propertiesDict[str, Any]
A dictionary containing the properties of the variable, which should include its modifiability status among others.
Returns#
- bool
True if the variable’s modifiability status necessitates a data value upon initialization, False otherwise.
- _get_variable_modifiability(variable_name: str, variable_properties: dict[str, Any]) Modifiability #
Determines the modifiability status of a variable based on its properties and returns the corresponding enum value.
Notes#
This function looks for a ‘modifiability’ key within variable_properties. If present and its value is not empty, the function attempts to map this value to an enum member in Modifiability. If the value does not correspond to any enum members, a KeyError is raised after logging the error. If ‘modifiability’ is absent or its value is empty, the function defaults to Modifiability.NOT_REQUIRED_AND_UNLOCKED.
Parameters#
- variable_namestr
The name of the variable for which the modifiability status is being determined. Used for error logging.
- variable_propertiesDict[str, Any]
A dictionary containing the properties of the variable, containing the desired ‘modifiability’ property.
Returns#
- Modifiability
An enum member representing the variable’s modifiability status.
Raises#
- KeyError
If ‘modifiability’ in variable_properties does not match any enum member in Modifiability. The error message includes the invalid modifiability value and suggests valid values.
- convert_variable_path_to_str(variable_path: list[str | int]) str #
Converts a list of keys (int or str) into a string representation of the path to a variable.
Parameters#
- variable_pathList[str | int]
A list of keys to be used to extract the value from the data.
Returns#
- str
A string representation of the path to a variable.
Examples#
>>> input_manager = InputManager() >>> var_path = ["animal", "herd_information", "calf_num"] >>> DataValidator.convert_variable_path_to_str(var_path) 'animal.herd_information.calf_num'
>>> input_manager = InputManager() >>> var_path = ["manure_management_scenarios", 0, "bedding_type"] >>> DataValidator.convert_variable_path_to_str(var_path) 'manure_management_scenarios.[0].bedding_type'
- extract_value_by_key_list(data: list[Any] | dict[str, Any], variable_path: Sequence[str | int]) Any #
Extracts a value from a nested list or dictionary using a list of keys (int or str).
Parameters#
- dataList[Any] | Dict[str, Any]
The data containing the value to be extracted.
- variable_pathList[str | int]
A list of keys to be used to extract the value from the data.
Returns#
- Any
The value extracted from the data.
Raises#
- KeyError
If the value cannot be extracted from the data using the provided variable path.
Examples#
>>> data_validator = DataValidator() >>> example_data = { ... "animal": { ... "herd_information": { ... "calf_num": 8, ... "heiferI_num": 44, ... "heiferII_num": 38, ... "heiferIII_num_springers": 12 ... } ... } ... } >>> var_path = ["animal", "herd_information", "calf_num"] >>> DataValidator.extract_value_by_key_list(example_data, var_path) 8
>>> data_validator = DataValidator() >>> example_data = { ... "manure_management_scenarios": [ ... { ... "bedding_type": "straw", ... "manure_handler": "manual scraping" ... }, ... { ... "bedding_type": "sawdust", ... "manure_handler": "flush system" ... } ... ] ... } >>> var_path = ["manure_management_scenarios", 0, "bedding_type"] >>> DataValidator.extract_value_by_key_list(example_data, var_path) 'straw'