opencf_core package
Submodules
Base Converter Module
This module serves as a foundation for creating file conversion utilities. It facilitates the development of file converters through abstract base classes, managing file types, and handling input and output files efficiently. The module is designed to be extendible, supporting various file formats and conversion strategies.
Classes:
BaseConverter: An abstract base class for creating specific file format converters, enforcing the implementation of file conversion logic.
Exceptions:
ValueError: Raised when file paths or types are incompatible or unsupported.
AssertionError: Ensured for internal consistency checks, confirming that file types match expected values.
- class opencf_core.base_converter.BaseConverter(input_files: ResolvedInputFile | List[ResolvedInputFile], output_file: ResolvedInputFile)
Bases:
ABC,Generic[T]Abstract base class for file conversion, defining the template for input to output file conversion.
- __init__(input_files: ResolvedInputFile | List[ResolvedInputFile], output_file: ResolvedInputFile)
Sets up the converter with specified input and output files, ensuring compatibility.
- Parameters:
input_files (Union[ResolvedInputFile, List[ResolvedInputFile]]) – Either a single input file or a list of input files with resolved types.
output_file (ResolvedInputFile) – The output file where the converted data will be saved.
- _check_file_types()
Validates that the provided files have acceptable and supported file types for conversion.
- abstract _convert(input_contents: List, args: T) Any
Abstract method to be implemented by subclasses to perform the actual file conversion process.
- abstract classmethod _get_supported_input_types() FileType | Iterable[FileType]
Abstract method to define the supported input file types by the converter.
- Returns:
The supported input file type.
- Return type:
Iterable[FileType]
- abstract classmethod _get_supported_output_types() FileType | Iterable[FileType]
Abstract method to define the supported output file types by the converter.
- Returns:
The supported output file type.
- Return type:
Iterable[FileType]
- check_io_handlers()
Ensures that valid I/O handlers (file reader and writer) are set for the conversion.
- abstract convert_files(output_path: Path)
Abstract method to be implemented by subclasses to handle file conversion process.
- abstract custom_io_handlers_check()
Custom IO handlers check method. Subclasses should implement this method to ensure proper IO handlers are set.
- folder_as_output: bool | None = None
- classmethod get_supported_input_types() Tuple[FileType, ...]
Defines the supported input file types for this converter.
- Returns:
The file types supported for input.
- Return type:
Tuple[FileType]
- classmethod get_supported_output_types() Tuple[FileType, ...]
Defines the supported output file types for this converter.
- Returns:
The file types supported for output.
- Return type:
Tuple[FileType]
- run_conversion()
Orchestrates the file conversion process, including reading, converting, and writing the file.
- class opencf_core.base_converter.FileAsOutputConversionArgs(output_file: pathlib.Path)
Bases:
object- output_file: Path
- class opencf_core.base_converter.FileAsOutputConverter(input_files: ResolvedInputFile | List[ResolvedInputFile], output_file: ResolvedInputFile)
Bases:
BaseConverter[FileAsOutputConversionArgs]- abstract _convert(input_contents: List, args: FileAsOutputConversionArgs) Any
Abstract method to be implemented by subclasses to perform the actual file conversion process.
- Parameters:
input_contents – List of input contents to be converted.
args – Arguments required for the file-based conversion process.
- Returns:
The converted content.
- convert_files(output_path: Path)
Convert input files to output content and save the output to the specified file path.
- Parameters:
output_path – The path where the converted output file will be saved.
- Returns:
The path where the output file was saved.
- custom_io_handlers_check()
Check if the file writer and folder output settings are valid.
- output_content: Any
- class opencf_core.base_converter.FolderAsOutputConversionArgs(output_folder: pathlib.Path)
Bases:
object- output_folder: Path
- class opencf_core.base_converter.FolderAsOutputConverter(input_files: ResolvedInputFile | List[ResolvedInputFile], output_file: ResolvedInputFile)
Bases:
BaseConverter[FolderAsOutputConversionArgs]- abstract _convert(input_contents: List, args: FolderAsOutputConversionArgs) Any
Abstract method to be implemented by subclasses to perform the actual file conversion process.
- Parameters:
input_contents – List of input contents to be converted.
args – Arguments required for the folder-based conversion process.
- Returns:
The converted content.
- convert_files(output_path: Path)
Convert input files to output content and save the output to the specified folder path.
- Parameters:
output_path – The path where the converted output folder will be saved.
- Returns:
The path where the output folder was saved.
- custom_io_handlers_check()
Check if the file writer and folder output settings are valid.
- output_content: Any
- exception opencf_core.base_converter.InvalidOutputFormatError(solving_tips)
Bases:
ExceptionException raised when the output content format check fails after conversion.
- class opencf_core.base_converter.WriterBasedConverter(input_files: ResolvedInputFile | List[ResolvedInputFile], output_file: ResolvedInputFile)
Bases:
BaseConverter[None]- __get_bad_output_content_solving_tips__() str
Provide tips to solve issues with bad output content.
- Returns:
Tips to solve issues with bad output content.
- _convert(input_contents: List, args: None) Any
Abstract method to be implemented by subclasses to perform the actual file conversion process.
- Parameters:
input_contents – List of input contents to be converted.
args – Arguments required for the conversion process, specific to the type of conversion.
- Returns:
The converted content.
- convert_files(output_path: Path)
Convert input files to output content and save the output to the specified path.
- Parameters:
output_path – The path where the converted output file will be saved.
- Returns:
The path where the output file was saved.
- custom_io_handlers_check()
Check if the file writer is valid.
Main Module
This module contains the main application logic.
- class opencf_core.converter_app.BaseConverterApp(input_paths: List[str], input_file_type: str | None = None, output_file_path: str | None = None, output_file_type: str | None = None)
Bases:
objectMain application class responsible for managing file conversions.
- __init__(input_paths: List[str], input_file_type: str | None = None, output_file_path: str | None = None, output_file_type: str | None = None)
Initializes the BaseConverterApp instance.
- Parameters:
input_paths (List[str]) – List of paths to the input files.
input_file_type (FileType, optional) – The type of the input file. Defaults to None.
output_file_path (str, optional) – The path to the output file. Defaults to None.
output_file_type (FileType, optional) – The type of the output file. Defaults to None.
- add_converter_pair(converter_class) None
Adds a converter pair to the application.
- Parameters:
converter_class (Type[BaseConverter]) – The converter class to add.
- Raises:
ValueError – If the converter class is invalid.
- converters: List[Type[BaseConverter]] = []
- get_converters_for_conversion(input_type: FileType, output_type: FileType) List[Type[BaseConverter]]
Returns a list of converter classes for a given input-output type pair.
- Parameters:
input_type (str) – The input type.
output_type (str) – The output type.
- Returns:
List of converter classes if found, else an empty list.
- Return type:
List[Type[BaseConverter]]
- get_supported_conversions() Tuple[Tuple[FileType, FileType], ...]
Retrieves the supported conversions.
- run() None
Runs the conversion process.
Dependencies:
aenum.Enum: For creating the FileType enumeration.
- opencf_core.enum.extend_enum_with_methods(inherited_enum: Type[Enum], added_enum: Type[Enum], filter_func: Callable[[Enum], bool]) None
Extends an Enum class with members and methods from another Enum class based on a filter function.
This function takes three arguments: inherited_enum, added_enum, and filter_func. It adds all the members from added_enum to inherited_enum that pass the filter function provided. It also copies all the methods (including class methods) from both inherited_enum and added_enum to the extended inherited_enum class.
- Parameters:
inherited_enum (Type[Enum]) – The Enum class to be extended with new members and methods.
added_enum (Type[Enum]) – The Enum class whose members and methods will be added to inherited_enum.
filter_func (Callable[[Enum], bool]) – A function that filters which members to add from added_enum to inherited_enum.
- Returns:
None
Classes:
UnsupportedFileTypeError: Custom exception for handling unsupported file types.
- EmptySuffixError: Specialized exception for cases where a file’s suffix does not provide enough information
to determine its type.
MismatchedException: Exception for handling cases where there’s a mismatch between expected and actual file attributes.
- exception opencf_core.exceptions.EmptySuffixError
Bases:
UnsupportedFileTypeErrorException raised when a file’s suffix does not provide enough information to determine its type.
- exception opencf_core.exceptions.MismatchedException(label, claimed_val, expected_vals)
Bases:
ExceptionException raised for mismatches between expected and actual file attributes.
- exception opencf_core.exceptions.UnsupportedFileTypeError(message)
Bases:
ExceptionException raised for handling cases of unsupported file types.
Resolved Input File Module
This module provides the ResolvedInputFile class, which manages file paths and types, resolving them as needed. It supports resolving file types based on paths, optional content reading, and handling both files and directories.
Classes: - ResolvedInputFile: Manages file paths and types, resolving them as needed.
Exceptions: - ValueError: Raised when file paths or types are incompatible or unsupported.
- class opencf_core.file_handler.ResolvedInputFile(path: str | ~pathlib.Path, is_dir: bool | None = None, should_exist: bool = True, file_type: str | None = None, add_suffix: bool = False, read_content: bool = False, filetype_class: ~typing.Type[~opencf_core.filetypes.FileType] | None = <aenum 'FileType'>)
Bases:
objectHandles resolving the file type of a given file or folder, managing path adjustments and optional content reading.
- __init__(path: str | ~pathlib.Path, is_dir: bool | None = None, should_exist: bool = True, file_type: str | None = None, add_suffix: bool = False, read_content: bool = False, filetype_class: ~typing.Type[~opencf_core.filetypes.FileType] | None = <aenum 'FileType'>)
Initializes an instance of ResolvedInputFile with options for type resolution and path modification.
- Parameters:
path (str) – The path to the file or folder.
is_dir (bool, optional) – Specifies if the path is a directory. If None, inferred using pathlib. Defaults to None.
should_exist (bool, optional) – Specifies if the existence of the path is required. Defaults to True.
file_type (str, optional) – The explicit type of the file. If None, attempts to resolve to a filetype object based on the path or content.
add_suffix (bool, optional) – Whether to append the resolved file type’s suffix to the file path. Defaults to False.
read_content (bool, optional) – Whether to read the file’s content to assist in type resolution. Defaults to False.
- __repr__()
Returns the absolute file path as a string.
- Returns:
The resolved file path.
- Return type:
str
- __resolve_filetype__(file_type: str | None, file_path: Path, read_content: bool) FileType
Determines the file type, utilizing the provided type, file path, or content as needed.
- Parameters:
file_type (FileType or str, optional) – An explicit file type or extension.
file_path (str) – The path to the file, used if file_type is not provided.
read_content (bool) – Indicates if file content should be used to help resolve the file type.
- Returns:
The resolved file type.
- Return type:
- __str__()
Returns the absolute file path as a string.
- Returns:
The resolved file path.
- Return type:
str
- _resolve_directory_type(file_type: str)
Handles the case when the specified path is a directory.
- _resolve_file_type(file_type: str | None, read_content: bool, add_suffix: bool)
Resolves the file type based on given parameters.
- Parameters:
file_type (FileType or str, optional) – An explicit file type or extension.
read_content (bool) – Indicates if file content should be used to help resolve the file type.
add_suffix (bool) – Whether to append the resolved file type’s suffix to the file path.
- _resolve_path_type(file_type: str | None = None) bool
Determines if the provided path refers to a directory or a file, based on its existence, suffix, and file_type.
- Parameters:
file_type (str, optional) – The type of file expected at the path. Influences directory creation and type resolution.
- Returns:
True if the path is determined to be a directory, False if it is a file.
- Return type:
bool
File Type Definitions Module
This module provides a comprehensive framework for handling various file types within a file conversion context. It defines classes and enumerations for identifying, validating, and working with different file types, based on file extensions, MIME types, and optionally, file content. It also includes custom exceptions for handling common errors related to file type processing.
Classes:
- FileType: Enum class that encapsulates various file types supported by the system, providing methods for
type determination from file attributes.
Dependencies:
collections.namedtuple: For defining simple classes for storing MIME type information.
pathlib.Path: For file path manipulations and checks.
opencf_core.mimes.guess_mime_type_from_file: Utility function to guess MIME type from a file path.
Usage Examples:
```python from pathlib import Path from mymodule import FileType, EmptySuffixError, UnsupportedFileTypeError
# Example: Determine file type from suffix try:
file_type, _ = FileType.from_suffix(‘.txt’) print(f’File type: {file_type.name}’)
- except (EmptySuffixError, UnsupportedFileTypeError) as e:
print(f’Error: {e}’)
# Example: Determine file type from MIME type try:
file_path = Path(‘/path/to/file.txt’) file_type, _ = FileType.from_mimetype(file_path) print(f’File type: {file_type.name}’)
- except FileNotFoundError as e:
print(f’Error: {e}’)
- except UnsupportedFileTypeError as e:
print(f’Error: {e}’)
# Example: Validate file type by path and content file_path = Path(‘/path/to/file.txt’) is_valid = FileType.TEXT.is_valid_path(file_path, read_content=True) print(f’Is valid: {is_valid}’) ```
- class opencf_core.filetypes.FileType(value=<no_arg>, names=None, module=None, qualname=None, type=None, start=1, boundary=None)
Bases:
EnumBase enumeration for file types, providing methods for type determination and validation.
- APNG = MimeType(extensions=('apng',), mime_types=('image/apng',), upper_mime_types=(), children_mime_types=())
- AVI = MimeType(extensions=('avi',), mime_types=('video/x-msvideo',), upper_mime_types=(), children_mime_types=())
- BIN = MimeType(extensions=('bin',), mime_types=('application/octet-stream',), upper_mime_types=(), children_mime_types=())
- CSV = MimeType(extensions=('csv',), mime_types=('text/csv',), upper_mime_types=('text/plain',), children_mime_types=())
- DOC = MimeType(extensions=('doc',), mime_types=('application/msword',), upper_mime_types=(), children_mime_types=())
- DOCX = MimeType(extensions=('docx',), mime_types=('application/vnd.openxmlformats-officedocument.wordprocessingml.document',), upper_mime_types=(), children_mime_types=())
- DOC_PRESENTATION = MimeType(extensions=('pptx', 'odp', 'ppt', 'pdf'), mime_types=('application/vnd.openxmlformats-officedocument.presentationml.presentation', 'application/vnd.oasis.opendocument.presentation', 'application/vnd.ms-powerpoint', 'application/pdf'), upper_mime_types=(), children_mime_types=(MimeType(extensions=('pptx',), mime_types=('application/vnd.openxmlformats-officedocument.presentationml.presentation',), upper_mime_types=(), children_mime_types=()), MimeType(extensions=('odp',), mime_types=('application/vnd.oasis.opendocument.presentation',), upper_mime_types=(), children_mime_types=()), MimeType(extensions=('ppt',), mime_types=('application/vnd.ms-powerpoint',), upper_mime_types=(), children_mime_types=()), MimeType(extensions=('pdf',), mime_types=('application/pdf',), upper_mime_types=(), children_mime_types=())))
- DOC_SPREADSHEET = MimeType(extensions=('xlsx', 'ods', 'csv', 'xls'), mime_types=('application/vnd.openxmlformats-officedocument.spreadsheetml.sheet', 'application/vnd.oasis.opendocument.spreadsheet', 'text/csv', 'application/vnd.ms-excel'), upper_mime_types=('text/plain',), children_mime_types=(MimeType(extensions=('xlsx',), mime_types=('application/vnd.openxmlformats-officedocument.spreadsheetml.sheet',), upper_mime_types=(), children_mime_types=()), MimeType(extensions=('ods',), mime_types=('application/vnd.oasis.opendocument.spreadsheet',), upper_mime_types=(), children_mime_types=()), MimeType(extensions=('csv',), mime_types=('text/csv',), upper_mime_types=('text/plain',), children_mime_types=()), MimeType(extensions=('xls',), mime_types=('application/vnd.ms-excel',), upper_mime_types=(), children_mime_types=())))
- DOC_TEXT = MimeType(extensions=('docx', 'odt', 'doc', 'md'), mime_types=('application/vnd.openxmlformats-officedocument.wordprocessingml.document', 'application/vnd.oasis.opendocument.text', 'application/msword', 'text/markdown'), upper_mime_types=('text/plain',), children_mime_types=(MimeType(extensions=('docx',), mime_types=('application/vnd.openxmlformats-officedocument.wordprocessingml.document',), upper_mime_types=(), children_mime_types=()), MimeType(extensions=('odt',), mime_types=('application/vnd.oasis.opendocument.text',), upper_mime_types=(), children_mime_types=()), MimeType(extensions=('doc',), mime_types=('application/msword',), upper_mime_types=(), children_mime_types=()), MimeType(extensions=('md',), mime_types=('text/markdown',), upper_mime_types=('text/plain',), children_mime_types=())))
- DPX = MimeType(extensions=('dpx',), mime_types=('image/dpx',), upper_mime_types=(), children_mime_types=())
- EPS = MimeType(extensions=('eps',), mime_types=('application/postscript',), upper_mime_types=(), children_mime_types=())
- EXR = MimeType(extensions=('exr',), mime_types=('image/aces-exr',), upper_mime_types=(), children_mime_types=())
- GIF = MimeType(extensions=('gif',), mime_types=('image/gif',), upper_mime_types=(), children_mime_types=())
- HTML = MimeType(extensions=('html', 'htm'), mime_types=('text/html',), upper_mime_types=(), children_mime_types=())
- IMG_ANIM = MimeType(extensions=('gif', 'apng', 'webp'), mime_types=('image/gif', 'image/apng', 'image/webp'), upper_mime_types=(), children_mime_types=(MimeType(extensions=('gif',), mime_types=('image/gif',), upper_mime_types=(), children_mime_types=()), MimeType(extensions=('apng',), mime_types=('image/apng',), upper_mime_types=(), children_mime_types=()), MimeType(extensions=('webp',), mime_types=('image/webp',), upper_mime_types=(), children_mime_types=())))
- IMG_RASTER = MimeType(extensions=('png', 'jpeg', 'jpg', 'tiff'), mime_types=('image/png', 'image/jpeg', 'image/tiff'), upper_mime_types=(), children_mime_types=(MimeType(extensions=('png',), mime_types=('image/png',), upper_mime_types=(), children_mime_types=()), MimeType(extensions=('jpeg', 'jpg'), mime_types=('image/jpeg',), upper_mime_types=(), children_mime_types=()), MimeType(extensions=('tiff',), mime_types=('image/tiff',), upper_mime_types=(), children_mime_types=())))
- IMG_SEQ = MimeType(extensions=('exr', 'dpx', 'tiff'), mime_types=('image/aces-exr', 'image/dpx', 'image/tiff'), upper_mime_types=(), children_mime_types=(MimeType(extensions=('exr',), mime_types=('image/aces-exr',), upper_mime_types=(), children_mime_types=()), MimeType(extensions=('dpx',), mime_types=('image/dpx',), upper_mime_types=(), children_mime_types=()), MimeType(extensions=('tiff',), mime_types=('image/tiff',), upper_mime_types=(), children_mime_types=())))
- IMG_VEC = MimeType(extensions=('svg', 'eps'), mime_types=('image/svg+xml', 'application/postscript'), upper_mime_types=(), children_mime_types=(MimeType(extensions=('svg',), mime_types=('image/svg+xml',), upper_mime_types=(), children_mime_types=()), MimeType(extensions=('eps',), mime_types=('application/postscript',), upper_mime_types=(), children_mime_types=())))
- JPEG = MimeType(extensions=('jpeg', 'jpg'), mime_types=('image/jpeg',), upper_mime_types=(), children_mime_types=())
- JSON = MimeType(extensions=('json',), mime_types=('application/json',), upper_mime_types=(), children_mime_types=())
- MD = MimeType(extensions=('md',), mime_types=('text/markdown',), upper_mime_types=('text/plain',), children_mime_types=())
- MOV = MimeType(extensions=('mov',), mime_types=('video/quicktime', 'video/wbm'), upper_mime_types=(), children_mime_types=())
- MP4 = MimeType(extensions=('mp4',), mime_types=('video/mp4',), upper_mime_types=(), children_mime_types=())
- NOTYPE: MimeType = MimeType(extensions=(), mime_types=(), upper_mime_types=(), children_mime_types=())
- ODP = MimeType(extensions=('odp',), mime_types=('application/vnd.oasis.opendocument.presentation',), upper_mime_types=(), children_mime_types=())
- ODS = MimeType(extensions=('ods',), mime_types=('application/vnd.oasis.opendocument.spreadsheet',), upper_mime_types=(), children_mime_types=())
- ODT = MimeType(extensions=('odt',), mime_types=('application/vnd.oasis.opendocument.text',), upper_mime_types=(), children_mime_types=())
- PDF = MimeType(extensions=('pdf',), mime_types=('application/pdf',), upper_mime_types=(), children_mime_types=())
- PNG = MimeType(extensions=('png',), mime_types=('image/png',), upper_mime_types=(), children_mime_types=())
- PPT = MimeType(extensions=('ppt',), mime_types=('application/vnd.ms-powerpoint',), upper_mime_types=(), children_mime_types=())
- PPTX = MimeType(extensions=('pptx',), mime_types=('application/vnd.openxmlformats-officedocument.presentationml.presentation',), upper_mime_types=(), children_mime_types=())
- SVG = MimeType(extensions=('svg',), mime_types=('image/svg+xml',), upper_mime_types=(), children_mime_types=())
- TEXT = MimeType(extensions=('txt',), mime_types=('text/plain',), upper_mime_types=(), children_mime_types=())
- TIFF = MimeType(extensions=('tiff',), mime_types=('image/tiff',), upper_mime_types=(), children_mime_types=())
- UNHANDLED: MimeType = MimeType(extensions=(), mime_types=(), upper_mime_types=(), children_mime_types=())
- VIDEO = MimeType(extensions=('mp4', 'mov', 'avi', 'wmv'), mime_types=('video/mp4', 'video/quicktime', 'video/wbm', 'video/x-msvideo', 'video/x-ms-wmv'), upper_mime_types=(), children_mime_types=(MimeType(extensions=('mp4',), mime_types=('video/mp4',), upper_mime_types=(), children_mime_types=()), MimeType(extensions=('mov',), mime_types=('video/quicktime', 'video/wbm'), upper_mime_types=(), children_mime_types=()), MimeType(extensions=('avi',), mime_types=('video/x-msvideo',), upper_mime_types=(), children_mime_types=()), MimeType(extensions=('wmv',), mime_types=('video/x-ms-wmv',), upper_mime_types=(), children_mime_types=())))
- WEBP = MimeType(extensions=('webp',), mime_types=('image/webp',), upper_mime_types=(), children_mime_types=())
- WMV = MimeType(extensions=('wmv',), mime_types=('video/x-ms-wmv',), upper_mime_types=(), children_mime_types=())
- XLS = MimeType(extensions=('xls',), mime_types=('application/vnd.ms-excel',), upper_mime_types=(), children_mime_types=())
- XLSX = MimeType(extensions=('xlsx',), mime_types=('application/vnd.openxmlformats-officedocument.spreadsheetml.sheet',), upper_mime_types=(), children_mime_types=())
- XML = MimeType(extensions=('xml',), mime_types=('application/xml', 'text/xml'), upper_mime_types=(), children_mime_types=())
- classmethod clean_suffix(suffix: str) str
- classmethod from_mimetype(file_path: str | Path, raise_err: bool = False, return_matches: bool = False) Tuple[FileType, Tuple[FileType, ...]]
Determines a filetype from a file’s MIME type.
- Parameters:
file_path (str) – The path to the file.
raise_err (bool, optional) – Whether to raise an exception if the type is unhandled. Defaults to False.
return_matches (bool, optional) – Whether to return a tuple with the first matching filetype and a list of all options. Defaults to False.
- Returns:
The determined filetype enumeration member, or a tuple with the first matching filetype and a list of all options.
- Return type:
- Raises:
FileNotFoundError – If the file does not exist.
UnsupportedFileTypeError – If the file type is unhandled and raise_err is True.
- classmethod from_path(path: str | Path, read_content=False, raise_err=False, return_matches=False) Tuple[FileType, Tuple[FileType, ...]]
Determines the filetype of a file based on its path. Optionally reads the file’s content to verify its type.
- Parameters:
path (Path) – The path to the file.
read_content (bool, optional) – If True, the method also checks the file’s content to determine its type. Defaults to False.
raise_err (bool, optional) – If True, raises exceptions for unsupported types or when file does not exist. Defaults to False.
return_matches (bool, optional) – Whether to return a tuple with the first matching filetype and a list of all options. Defaults to False.
- Returns:
The determined filetype enumeration member based on the file’s suffix and/or content, or a tuple with the first matching filetype and a list of all options.
- Return type:
- Raises:
FileNotFoundError – If the file does not exist when attempting to read its content.
UnsupportedFileTypeError – If the file type is unsupported and raise_err is True.
AssertionError – If there is a mismatch between the file type determined from the file’s suffix and its content.
- classmethod from_suffix(suffix: str, raise_err: bool = False, return_matches: bool = False) Tuple[FileType, Tuple[FileType, ...]]
Determines a filetype from a file’s suffix.
- Parameters:
suffix (str) – The file suffix (extension).
raise_err (bool, optional) – Whether to raise an exception if the type is unhandled. Defaults to False.
return_matches (bool, optional) – Whether to return a tuple with the first matching filetype and a list of all options. Defaults to False.
- Returns:
The determined filetype enumeration member, or a tuple with the first matching filetype and a list of all options.
- Return type:
- Raises:
EmptySuffixError – If the suffix is empty and raise_err is True.
UnsupportedFileTypeError – If the file type is unhandled and raise_err is True.
- classmethod get_filetypes()
Yields all valid file types in the enumeration.
- get_one_mimetype() str
Retrieves the primary mimetype associated with the filetype.
- Returns:
- The primary mimetype for the filetype.
Returns an empty string if the filetype does not have an associated extension.
- Return type:
Mimetype
- get_one_suffix() str
Retrieves the primary file extension associated with the filetype.
- Returns:
- The primary file extension for the filetype, prefixed with a period.
Returns an empty string if the filetype does not have an associated extension.
- Return type:
str
- get_value() MimeType
Returns the MimeType associated with the enumeration member.
- Returns:
The MIME type information.
- Return type:
- is_true_filetype() bool
Determines if the filetype instance represents a supported file type based on the presence of defined extensions.
- Returns:
True if the filetype has at least one associated file extension, False otherwise.
- Return type:
bool
- is_valid_mime_type(file_path: Path, raise_err=False) bool
Validates whether the MIME type of the file at the specified path aligns with the filetype’s expected MIME types.
This method first determines the filetype based on the file’s actual MIME type (determined by reading the file’s content) and then checks if this determined filetype matches the instance calling this method. Special consideration is given to filetype.TEXT, where a broader compatibility check is performed due to the generic nature of text MIME types.
- Parameters:
file_path (Path) – The path to the file whose MIME type is to be validated.
raise_err (bool, optional) – If True, a MismatchedException is raised if the file’s MIME type does not match the expected MIME types of the filetype instance. Defaults to False.
- Returns:
- True if the file’s MIME type matches the expected MIME types for this filetype instance or if special
compatibility conditions are met (e.g., for filetype.TEXT with “text/plain”). Otherwise, False.
- Return type:
bool
- Raises:
MismatchedException – If raise_err is True and the file’s MIME type does not match the expected MIME types for this filetype instance, including detailed information about the mismatch.
- is_valid_path(file_path: str | Path, read_content=False, raise_err=False) bool
Validates the filetype of a given file path. Optionally reads the file’s content to verify its type.
- Parameters:
file_path (Union[str, Path]) – The file path to validate.
read_content (bool, optional) – If True, the method also checks the file’s content to validate its type. Defaults to False.
raise_err (bool, optional) – If True, raises exceptions for mismatched or unsupported types. Defaults to False.
- Returns:
True if the file path’s type matches the filetype, False otherwise.
- Return type:
bool
- Raises:
AssertionError – If there is a mismatch between the file type determined from the file’s suffix and its content.
MismatchedException – If the file type determined from the file’s suffix or content does not match the filetype.
- is_valid_suffix(suffix: str, raise_err=False) bool
Validates whether a given file extension matches the filetype’s expected extensions.
- Parameters:
suffix (str) – The file extension to validate, including the leading period (e.g., “.txt”).
raise_err (bool, optional) – If True, raises a MismatchedException for invalid extensions. Defaults to False.
- Returns:
True if the suffix matches one of the filetype’s extensions, False otherwise.
- Return type:
bool
- Raises:
MismatchedException – If the suffix does not match the filetype’s extensions and raise_err is True.
- class opencf_core.filetypes.FileTypeExamples(value=<no_arg>, names=None, module=None, qualname=None, type=None, start=1, boundary=None)
Bases:
EnumEnumeration of supported file types with methods for type determination and validation.
- XML = MimeType(extensions=('xml',), mime_types=('application/xml', 'text/xml'), upper_mime_types=(), children_mime_types=())
- class opencf_core.filetypes.MimeType(extensions: Tuple[str, ...] = (), mime_types: Tuple[str, ...] = (), upper_mime_types: Tuple[str, ...] = (), children_mime_types: Tuple[MimeType, ...] = ())
Bases:
objectClass representing MIME type information.
- extensions
Tuple of file extensions associated with the MIME type.
- Type:
Tuple[str, …]
- mime_types
Tuple of MIME types.
- Type:
Tuple[str, …]
- upper_mime_types
Tuple of additional MIME types that can be considered equivalent.
- Type:
Tuple[str, …]
- extensions: Tuple[str, ...] = ()
- mime_types: Tuple[str, ...] = ()
- upper_mime_types: Tuple[str, ...] = ()
- opencf_core.filetypes.extend_filetype_enum(added_enum: Type[Enum]) None
Extends the BaseFileType enumeration with members from another enumeration.
- Parameters:
added_enum (Type[Enum]) – The enum class to extend BaseFileType with.
- opencf_core.filetypes.extract_enum_members(enum_cls: Type) Dict[str, MimeType]
Extracts MimeType instances from an enum class.
- Parameters:
enum_cls (Type) – The enum class.
- Returns:
Dictionary of MimeType instances keyed by enum member names.
- Return type:
Dict[str, MimeType]
- opencf_core.filetypes.get_equivalent_file_types(mime_types: Set[MimeType], raise_error: bool = True) Set[FileType]
Get the equivalent FileTypes for a given list of MimeTypes.
- Parameters:
mime_types (Set[MimeType]) – The list of MIME types to find the equivalent FileTypes for.
raise_error (bool, optional) – Controls whether to raise an error if no equivalent FileType is found. Defaults to True.
- Returns:
A list of equivalent FileTypes if found, otherwise None.
- Return type:
List[FileType]
- opencf_core.filetypes.get_file_type_children(file_type: FileType, include_head: bool = False) Set[FileType]
Recursively get all children FileTypes as equivalent FileTypes of the MIME types in the subtree of the given FileType.
- Parameters:
file_type (FileType) – The FileType to get the subtree for.
include_head (bool, optional) – Controls whether to include the head node in the result. Defaults to False.
- Returns:
A set of all equivalent FileTypes in the subtree.
- Return type:
Set[FileType]
Example
>>> all_image_children = get_file_type_children(FileType.IMG_RASTER) >>> print(all_image_children) {FileType.PNG, FileType.JPEG, FileType.TIFF}
- opencf_core.filetypes.get_file_types_clidren(file_types: Iterable[FileType], include_head: bool = False) Set[FileType]
Recursively get all children FileTypes as equivalent FileTypes of the MIME types in the subtree of the given list of FileType instances.
- Parameters:
file_types (List[FileType]) – The list of FileType instances to get the subtree for.
include_head (bool, optional) – Controls whether to include the head node in the result. Defaults to False.
- Returns:
A set of all equivalent FileTypes in the subtree.
- Return type:
Set[FileType]
Example
>>> all_image_children = get_file_types_from_list([FileType.IMG_RASTER]) >>> print(all_image_children) {FileType.PNG, FileType.JPEG, FileType.TIFF}
- opencf_core.filetypes.get_mime_type_children(mime_type: MimeType, include_head: bool = False) Set[MimeType]
Recursively get all children MIME types in the subtree of the given MIME type.
- Parameters:
mime_type (MimeType) – The MIME type to get the subtree for.
include_head (bool, optional) – Controls whether to include the head node in the result. Defaults to False.
- Returns:
A set of all MIME types in the subtree.
- Return type:
Set[MimeType]
Example
>>> all_image_children = get_mime_type_children(MimeType(extensions=('png',), mime_types=('image/png',), upper_mime_types=(), children_mime_types=())) >>> print(all_image_children) {MimeType(extensions=('png',), mime_types=('image/png',), upper_mime_types=(), children_mime_types=()), MimeType(extensions=('jpeg', 'jpg'), mime_types=('image/jpeg',), upper_mime_types=(), children_mime_types=()), MimeType(extensions=('tiff',), mime_types=('image/tiff',), upper_mime_types=(), children_mime_types=())}
- opencf_core.filetypes.merge_mimetype(*mimetypes: MimeType) MimeType
Merge multiple MimeType objects into one.
Input/Output Handler Module
This module is designed to provide a structured approach to handling file input and output operations across various formats such as plain text, CSV, JSON, and potentially XML. It introduces a set of abstract base classes and concrete implementations for reading from and writing to files, ensuring type safety and format consistency through method signatures and runtime checks.
- class opencf_core.io_handler.Converter
Bases:
ABCAbstract base class for data converters.
- abstract _check_input_format(content: Any) bool
Checks if the provided content matches the expected input format.
- Parameters:
content (Any) – The content to be checked.
- Returns:
True if the content matches the expected input format, False otherwise.
- Return type:
bool
- abstract _check_output_format(content: Any) bool
Checks if the provided content matches the expected output format.
- Parameters:
content (Any) – The content to be checked.
- Returns:
True if the content matches the expected output format, False otherwise.
- Return type:
bool
- abstract _convert(content: Any) Any
Converts the provided content from the input format to the output format.
- Parameters:
content (Any) – The content to be converted.
- Returns:
The converted content in the output format.
- Return type:
Any
- check_input_format(content: Any) bool
- check_output_format(content: Any) bool
- convert(content: Any) Any
- class opencf_core.io_handler.CsvToDictReader
Bases:
ReaderReads content from a CSV file and returns it as a list of dictionaries.
Example
>>> reader = CsvToDictReader() >>> content = reader.read(Path('input.csv')) >>> print(content) [{'name': 'John', 'age': '30'}, {'name': 'Jane', 'age': '25'}]
- _check_input_format(content: List[Dict[str, Any]]) bool
Validates the input content to ensure it is a list of dictionaries.
- Parameters:
content (List[Dict[str, Any]]) – The content to validate.
- Returns:
True if the content is a list of dictionaries, False otherwise.
- Return type:
bool
- _read_content(input_path: Path) List[Dict[str, Any]]
Reads and parses the content from the CSV file at the given path.
- Parameters:
input_path (Path) – The path to the CSV file.
- Returns:
The parsed content as a list of dictionaries.
- Return type:
List[Dict[str, Any]]
- input_format
alias of
List[Dict[str,Any]]
- class opencf_core.io_handler.DictToCsvWriter
Bases:
WriterWrites content from a dictionary to a CSV file.
- _check_output_format(content: List[Dict[str, Any]]) bool
Validates the output content to ensure it is a list of dictionaries.
- Parameters:
content (List[Dict[str, Any]]) – The content to validate.
- Returns:
True if the content is a list of dictionaries, False otherwise.
- Return type:
bool
- _write_content(output_path: Path, output_content: List[Dict[str, Any]]) None
Writes the list of dictionaries content to a CSV file at the given path.
- Parameters:
output_path (Path) – The path to the CSV file.
content (List[Dict[str, Any]]) – The list of dictionaries content to write.
- output_format
alias of
List[Dict[str,Any]]
- class opencf_core.io_handler.DictToJsonWriter
Bases:
WriterWrites content from a dictionary to a JSON file.
- output_format
alias of
Dict[str,Any]
- class opencf_core.io_handler.JsonToDictReader
Bases:
ReaderReads content from a JSON file and returns it as a dictionary.
- input_format
alias of
Dict[str,Any]
- class opencf_core.io_handler.Reader
Bases:
ABCAbstract base class for file readers.
- abstract _check_input_format(content: Any) bool
Checks if the provided content matches the expected input format.
- Parameters:
content (Any) – The content to be checked.
- Returns:
True if the content matches the expected input format, False otherwise.
- Return type:
bool
- abstract _read_content(input_path: Path) Any
Reads and returns the content from the given input path.
- Parameters:
input_path (Path) – The path to the input file.
- Returns:
The content read from the input file.
- Return type:
Any
- check_input_format(content: Any) bool
- read_content(input_path: Path) Any
- class opencf_core.io_handler.SamePathReader
Bases:
ReaderA Reader that returns the input path itself, useful for operations where the file path is the desired output.
- input_format
alias of
Path
- class opencf_core.io_handler.StrToTxtWriter
Bases:
WriterWrites a string to a text file.
- output_format
alias of
str
- class opencf_core.io_handler.StrToXmlWriter
Bases:
WriterWrites content as a string to an XML file.
- output_format
alias of
str
- class opencf_core.io_handler.TreeToXmlWriter
Bases:
WriterWrites content from a dictionary to an XML file.
- _check_output_format(content: Element) bool
Validates the output content to ensure it is an ElementTree element.
- Parameters:
content (ET.Element) – The content to validate.
- Returns:
True if the content is a valid ElementTree element, False otherwise.
- Return type:
bool
- _write_content(output_path: Path, output_content: Element) None
Writes the ElementTree element content to an XML file at the given path.
- Parameters:
output_path (Path) – The path to the XML file.
content (ET.Element) – The ElementTree element content to write.
- output_format
alias of
Element
- class opencf_core.io_handler.TxtToStrReader
Bases:
ReaderReads content from a text file and returns it as a string.
- input_format
alias of
str
- class opencf_core.io_handler.Writer
Bases:
ABCAbstract base class for file writers.
- abstract _check_output_format(content: Any) bool
Checks if the provided content matches the expected output format.
- Parameters:
content (Any) – The content to be checked.
- Returns:
True if the content matches the expected output format, False otherwise.
- Return type:
bool
- abstract _write_content(output_path: Path, output_content: Any)
Writes the provided content to the given output path.
- Parameters:
output_path (Path) – The path to the output file.
output_content (Any) – The content to be written to the output file.
- check_output_format(content: Any) bool
- write_content(output_path: Path, output_content: Any)
- class opencf_core.io_handler.XmlToStrReader
Bases:
ReaderReads content from an XML file and returns it as a string.
- input_format
alias of
str
- class opencf_core.io_handler.XmlToTreeReader
Bases:
ReaderReads content from an XML file and returns it as an ElementTree element.
- _check_input_format(content: Element) bool
Validates the input content to ensure it is an ElementTree element.
- Parameters:
content (ET.Element) – The content to validate.
- Returns:
True if the content is a valid ElementTree element, False otherwise.
- Return type:
bool
- _read_content(input_path: Path) Element
Reads and parses the content from the XML file at the given path.
- Parameters:
input_path (Path) – The path to the XML file.
- Returns:
The root element of the parsed XML tree.
- Return type:
ET.Element
- input_format
alias of
Element
- class opencf_core.logging_config.ColoredFormatter(fmt=None, datefmt=None, style='%', validate=True)
Bases:
Formatteroriginal code from [Sergey Pleshakov, stackoverflow](https://stackoverflow.com/a/56944256/16668046)
- FORMATS = {10: '\x1b[38;20m%(asctime)s - %(name)s - %(levelname)s - %(message)s (%(filename)s:%(lineno)d)\x1b[0m', 20: '\x1b[38;20m%(asctime)s - %(name)s - %(levelname)s - %(message)s (%(filename)s:%(lineno)d)\x1b[0m', 30: '\x1b[33;20m%(asctime)s - %(name)s - %(levelname)s - %(message)s (%(filename)s:%(lineno)d)\x1b[0m', 40: '\x1b[31;20m%(asctime)s - %(name)s - %(levelname)s - %(message)s (%(filename)s:%(lineno)d)\x1b[0m', 50: '\x1b[31;1m%(asctime)s - %(name)s - %(levelname)s - %(message)s (%(filename)s:%(lineno)d)\x1b[0m'}
- bold_red = '\x1b[31;1m'
- format(record)
Format the specified record as text.
The record’s attribute dictionary is used as the operand to a string formatting operation which yields the returned string. Before formatting the dictionary, a couple of preparatory steps are carried out. The message attribute of the record is computed using LogRecord.getMessage(). If the formatting string uses the time (as determined by a call to usesTime(), formatTime() is called to format the event time. If there is exception information, it is formatted using formatException() and appended to the message.
- grey = '\x1b[38;20m'
- log_format = '%(asctime)s - %(name)s - %(levelname)s - %(message)s (%(filename)s:%(lineno)d)'
- red = '\x1b[31;20m'
- reset = '\x1b[0m'
- yellow = '\x1b[33;20m'
- class opencf_core.logging_config.LoggerConfig
Bases:
object- set_log_file(log_file: str) None
Set log file.
- Parameters:
log_file (str) – Path to the log file.
- set_log_level(level: int) None
Set log level.
- Parameters:
level (int) – Logging level.
- set_log_level_str(level: str) None
Set log level.
- Parameters:
level (str) – Logging level.
- setup_logger(name: str, log_file: str | None = None, level: int = 20) None
Set up logger.
- Parameters:
name (str) – Name of the logger.
log_file (str, optional) – Path to the log file. Defaults to None.
level (int, optional) – Logging level. Defaults to logging.INFO.
MIME Type Guesser Module
This module provides a singleton class for guessing MIME types from file paths using the python-magic library.
- opencf_core.mimes.guess_mime_type_from_file(file_path: str | Path) str
Guesses the MIME type from the file path.
- Parameters:
file_path (str) – The path to the file.
- Returns:
The guessed MIME type.
- Return type:
str
- opencf_core.utils.ensure_iterable(obj, raise_err=True, return_single=False)
- opencf_core.utils.get_filepaths_from_inputs(args: List[str]) List[str]
Generate a list of file paths from a list of command-line arguments.
- Parameters:
args (list of str) – List of command-line arguments including file paths, directory paths, and glob patterns.
- Returns:
List of file paths that match the input criteria.
- Return type:
list of str
- opencf_core.utils.is_iterable(obj)
- opencf_core.utils.test()