opencf_core package

Submodules

Base Converter Module

This module serves as a foundation for creating file conversion utilities. It facilitates the development of file converters through abstract base classes, managing file types, and handling input and output files efficiently. The module is designed to be extendible, supporting various file formats and conversion strategies.

Classes:

  • BaseConverter: An abstract base class for creating specific file format converters, enforcing the implementation of file conversion logic.

Exceptions:

  • ValueError: Raised when file paths or types are incompatible or unsupported.

  • AssertionError: Ensured for internal consistency checks, confirming that file types match expected values.

class opencf_core.base_converter.BaseConverter(input_files: ResolvedInputFile | List[ResolvedInputFile], output_file: ResolvedInputFile)

Bases: ABC, Generic[T]

Abstract base class for file conversion, defining the template for input to output file conversion.

__init__(input_files: ResolvedInputFile | List[ResolvedInputFile], output_file: ResolvedInputFile)

Sets up the converter with specified input and output files, ensuring compatibility.

Parameters:
_check_file_types()

Validates that the provided files have acceptable and supported file types for conversion.

abstract _convert(input_contents: List, args: T) Any

Abstract method to be implemented by subclasses to perform the actual file conversion process.

abstract classmethod _get_supported_input_types() FileType | Iterable[FileType]

Abstract method to define the supported input file types by the converter.

Returns:

The supported input file type.

Return type:

Iterable[FileType]

abstract classmethod _get_supported_output_types() FileType | Iterable[FileType]

Abstract method to define the supported output file types by the converter.

Returns:

The supported output file type.

Return type:

Iterable[FileType]

check_io_handlers()

Ensures that valid I/O handlers (file reader and writer) are set for the conversion.

abstract convert_files(output_path: Path)

Abstract method to be implemented by subclasses to handle file conversion process.

abstract custom_io_handlers_check()

Custom IO handlers check method. Subclasses should implement this method to ensure proper IO handlers are set.

file_reader: Reader | None = None
file_writer: Writer | None = None
folder_as_output: bool | None = None
classmethod get_input_types(extend=False) Tuple[FileType, ...]
classmethod get_output_types(extend=False) Tuple[FileType, ...]
classmethod get_supported_input_types() Tuple[FileType, ...]

Defines the supported input file types for this converter.

Returns:

The file types supported for input.

Return type:

Tuple[FileType]

classmethod get_supported_output_types() Tuple[FileType, ...]

Defines the supported output file types for this converter.

Returns:

The file types supported for output.

Return type:

Tuple[FileType]

run_conversion()

Orchestrates the file conversion process, including reading, converting, and writing the file.

class opencf_core.base_converter.FileAsOutputConversionArgs(output_file: pathlib.Path)

Bases: object

output_file: Path
class opencf_core.base_converter.FileAsOutputConverter(input_files: ResolvedInputFile | List[ResolvedInputFile], output_file: ResolvedInputFile)

Bases: BaseConverter[FileAsOutputConversionArgs]

abstract _convert(input_contents: List, args: FileAsOutputConversionArgs) Any

Abstract method to be implemented by subclasses to perform the actual file conversion process.

Parameters:
  • input_contents – List of input contents to be converted.

  • args – Arguments required for the file-based conversion process.

Returns:

The converted content.

convert_files(output_path: Path)

Convert input files to output content and save the output to the specified file path.

Parameters:

output_path – The path where the converted output file will be saved.

Returns:

The path where the output file was saved.

custom_io_handlers_check()

Check if the file writer and folder output settings are valid.

output_content: Any
class opencf_core.base_converter.FolderAsOutputConversionArgs(output_folder: pathlib.Path)

Bases: object

output_folder: Path
class opencf_core.base_converter.FolderAsOutputConverter(input_files: ResolvedInputFile | List[ResolvedInputFile], output_file: ResolvedInputFile)

Bases: BaseConverter[FolderAsOutputConversionArgs]

abstract _convert(input_contents: List, args: FolderAsOutputConversionArgs) Any

Abstract method to be implemented by subclasses to perform the actual file conversion process.

Parameters:
  • input_contents – List of input contents to be converted.

  • args – Arguments required for the folder-based conversion process.

Returns:

The converted content.

convert_files(output_path: Path)

Convert input files to output content and save the output to the specified folder path.

Parameters:

output_path – The path where the converted output folder will be saved.

Returns:

The path where the output folder was saved.

custom_io_handlers_check()

Check if the file writer and folder output settings are valid.

output_content: Any
exception opencf_core.base_converter.InvalidOutputFormatError(solving_tips)

Bases: Exception

Exception raised when the output content format check fails after conversion.

class opencf_core.base_converter.WriterBasedConverter(input_files: ResolvedInputFile | List[ResolvedInputFile], output_file: ResolvedInputFile)

Bases: BaseConverter[None]

__get_bad_output_content_solving_tips__() str

Provide tips to solve issues with bad output content.

Returns:

Tips to solve issues with bad output content.

_convert(input_contents: List, args: None) Any

Abstract method to be implemented by subclasses to perform the actual file conversion process.

Parameters:
  • input_contents – List of input contents to be converted.

  • args – Arguments required for the conversion process, specific to the type of conversion.

Returns:

The converted content.

convert_files(output_path: Path)

Convert input files to output content and save the output to the specified path.

Parameters:

output_path – The path where the converted output file will be saved.

Returns:

The path where the output file was saved.

converters: List[Converter] = []
custom_io_handlers_check()

Check if the file writer is valid.

Main Module

This module contains the main application logic.

class opencf_core.converter_app.BaseConverterApp(input_paths: List[str], input_file_type: str | None = None, output_file_path: str | None = None, output_file_type: str | None = None)

Bases: object

Main application class responsible for managing file conversions.

__init__(input_paths: List[str], input_file_type: str | None = None, output_file_path: str | None = None, output_file_type: str | None = None)

Initializes the BaseConverterApp instance.

Parameters:
  • input_paths (List[str]) – List of paths to the input files.

  • input_file_type (FileType, optional) – The type of the input file. Defaults to None.

  • output_file_path (str, optional) – The path to the output file. Defaults to None.

  • output_file_type (FileType, optional) – The type of the output file. Defaults to None.

add_converter_pair(converter_class) None

Adds a converter pair to the application.

Parameters:

converter_class (Type[BaseConverter]) – The converter class to add.

Raises:

ValueError – If the converter class is invalid.

converters: List[Type[BaseConverter]] = []
filetype_class

alias of FileType

get_converters_for_conversion(input_type: FileType, output_type: FileType) List[Type[BaseConverter]]

Returns a list of converter classes for a given input-output type pair.

Parameters:
  • input_type (str) – The input type.

  • output_type (str) – The output type.

Returns:

List of converter classes if found, else an empty list.

Return type:

List[Type[BaseConverter]]

get_supported_conversions() Tuple[Tuple[FileType, FileType], ...]

Retrieves the supported conversions.

Returns:

A tuple of tuples representing supported conversions.

Return type:

Tuple[Tuple[FileType, FileType]]

run() None

Runs the conversion process.

Dependencies:

  • aenum.Enum: For creating the FileType enumeration.

opencf_core.enum.extend_enum_with_methods(inherited_enum: Type[Enum], added_enum: Type[Enum], filter_func: Callable[[Enum], bool]) None

Extends an Enum class with members and methods from another Enum class based on a filter function.

This function takes three arguments: inherited_enum, added_enum, and filter_func. It adds all the members from added_enum to inherited_enum that pass the filter function provided. It also copies all the methods (including class methods) from both inherited_enum and added_enum to the extended inherited_enum class.

Parameters:
  • inherited_enum (Type[Enum]) – The Enum class to be extended with new members and methods.

  • added_enum (Type[Enum]) – The Enum class whose members and methods will be added to inherited_enum.

  • filter_func (Callable[[Enum], bool]) – A function that filters which members to add from added_enum to inherited_enum.

Returns:

None

Classes:

  • UnsupportedFileTypeError: Custom exception for handling unsupported file types.

  • EmptySuffixError: Specialized exception for cases where a file’s suffix does not provide enough information

    to determine its type.

  • MismatchedException: Exception for handling cases where there’s a mismatch between expected and actual file attributes.

exception opencf_core.exceptions.EmptySuffixError

Bases: UnsupportedFileTypeError

Exception raised when a file’s suffix does not provide enough information to determine its type.

exception opencf_core.exceptions.MismatchedException(label, claimed_val, expected_vals)

Bases: Exception

Exception raised for mismatches between expected and actual file attributes.

exception opencf_core.exceptions.UnsupportedFileTypeError(message)

Bases: Exception

Exception raised for handling cases of unsupported file types.

Resolved Input File Module

This module provides the ResolvedInputFile class, which manages file paths and types, resolving them as needed. It supports resolving file types based on paths, optional content reading, and handling both files and directories.

Classes: - ResolvedInputFile: Manages file paths and types, resolving them as needed.

Exceptions: - ValueError: Raised when file paths or types are incompatible or unsupported.

class opencf_core.file_handler.ResolvedInputFile(path: str | ~pathlib.Path, is_dir: bool | None = None, should_exist: bool = True, file_type: str | None = None, add_suffix: bool = False, read_content: bool = False, filetype_class: ~typing.Type[~opencf_core.filetypes.FileType] | None = <aenum 'FileType'>)

Bases: object

Handles resolving the file type of a given file or folder, managing path adjustments and optional content reading.

__init__(path: str | ~pathlib.Path, is_dir: bool | None = None, should_exist: bool = True, file_type: str | None = None, add_suffix: bool = False, read_content: bool = False, filetype_class: ~typing.Type[~opencf_core.filetypes.FileType] | None = <aenum 'FileType'>)

Initializes an instance of ResolvedInputFile with options for type resolution and path modification.

Parameters:
  • path (str) – The path to the file or folder.

  • is_dir (bool, optional) – Specifies if the path is a directory. If None, inferred using pathlib. Defaults to None.

  • should_exist (bool, optional) – Specifies if the existence of the path is required. Defaults to True.

  • file_type (str, optional) – The explicit type of the file. If None, attempts to resolve to a filetype object based on the path or content.

  • add_suffix (bool, optional) – Whether to append the resolved file type’s suffix to the file path. Defaults to False.

  • read_content (bool, optional) – Whether to read the file’s content to assist in type resolution. Defaults to False.

__repr__()

Returns the absolute file path as a string.

Returns:

The resolved file path.

Return type:

str

__resolve_filetype__(file_type: str | None, file_path: Path, read_content: bool) FileType

Determines the file type, utilizing the provided type, file path, or content as needed.

Parameters:
  • file_type (FileType or str, optional) – An explicit file type or extension.

  • file_path (str) – The path to the file, used if file_type is not provided.

  • read_content (bool) – Indicates if file content should be used to help resolve the file type.

Returns:

The resolved file type.

Return type:

FileType

__str__()

Returns the absolute file path as a string.

Returns:

The resolved file path.

Return type:

str

_resolve_directory_type(file_type: str)

Handles the case when the specified path is a directory.

_resolve_file_type(file_type: str | None, read_content: bool, add_suffix: bool)

Resolves the file type based on given parameters.

Parameters:
  • file_type (FileType or str, optional) – An explicit file type or extension.

  • read_content (bool) – Indicates if file content should be used to help resolve the file type.

  • add_suffix (bool) – Whether to append the resolved file type’s suffix to the file path.

_resolve_path_type(file_type: str | None = None) bool

Determines if the provided path refers to a directory or a file, based on its existence, suffix, and file_type.

Parameters:

file_type (str, optional) – The type of file expected at the path. Influences directory creation and type resolution.

Returns:

True if the path is determined to be a directory, False if it is a file.

Return type:

bool

File Type Definitions Module

This module provides a comprehensive framework for handling various file types within a file conversion context. It defines classes and enumerations for identifying, validating, and working with different file types, based on file extensions, MIME types, and optionally, file content. It also includes custom exceptions for handling common errors related to file type processing.

Classes:

  • FileType: Enum class that encapsulates various file types supported by the system, providing methods for

    type determination from file attributes.

Dependencies:

  • collections.namedtuple: For defining simple classes for storing MIME type information.

  • pathlib.Path: For file path manipulations and checks.

  • opencf_core.mimes.guess_mime_type_from_file: Utility function to guess MIME type from a file path.

Usage Examples:

```python from pathlib import Path from mymodule import FileType, EmptySuffixError, UnsupportedFileTypeError

# Example: Determine file type from suffix try:

file_type, _ = FileType.from_suffix(‘.txt’) print(f’File type: {file_type.name}’)

except (EmptySuffixError, UnsupportedFileTypeError) as e:

print(f’Error: {e}’)

# Example: Determine file type from MIME type try:

file_path = Path(‘/path/to/file.txt’) file_type, _ = FileType.from_mimetype(file_path) print(f’File type: {file_type.name}’)

except FileNotFoundError as e:

print(f’Error: {e}’)

except UnsupportedFileTypeError as e:

print(f’Error: {e}’)

# Example: Validate file type by path and content file_path = Path(‘/path/to/file.txt’) is_valid = FileType.TEXT.is_valid_path(file_path, read_content=True) print(f’Is valid: {is_valid}’) ```

class opencf_core.filetypes.FileType(value=<no_arg>, names=None, module=None, qualname=None, type=None, start=1, boundary=None)

Bases: Enum

Base enumeration for file types, providing methods for type determination and validation.

NOTYPE

Represents an undefined file type (no extensions).

Type:

MimeType

TEXT

Represents a text file type (.txt).

Type:

MimeType

UNHANDLED

Represents an unhandled file type (no extensions).

Type:

MimeType

CSV

Represents a CSV file type (.csv).

Type:

MimeType

MARKDOWN

Represents a Markdown file type (.md).

Type:

MimeType

EXCEL

Represents an Excel file type (.xls, .xlsx).

Type:

MimeType

MSWORD

Represents a Microsoft Word file type (.doc, .docx).

Type:

MimeType

JSON

Represents a JSON file type (.json).

Type:

MimeType

PDF

Represents a PDF file type (.pdf).

Type:

MimeType

IMAGE

Represents an image file type (.jpg, .jpeg, .png).

Type:

MimeType

GIF

Represents a GIF file type (.gif).

Type:

MimeType

VIDEO

Represents a video file type (.mp4, .avi).

Type:

MimeType

XML

Represents a xml file type (.xml).

Type:

MimeType

APNG = MimeType(extensions=('apng',), mime_types=('image/apng',), upper_mime_types=(), children_mime_types=())
AVI = MimeType(extensions=('avi',), mime_types=('video/x-msvideo',), upper_mime_types=(), children_mime_types=())
BIN = MimeType(extensions=('bin',), mime_types=('application/octet-stream',), upper_mime_types=(), children_mime_types=())
CSV = MimeType(extensions=('csv',), mime_types=('text/csv',), upper_mime_types=('text/plain',), children_mime_types=())
DOC = MimeType(extensions=('doc',), mime_types=('application/msword',), upper_mime_types=(), children_mime_types=())
DOCX = MimeType(extensions=('docx',), mime_types=('application/vnd.openxmlformats-officedocument.wordprocessingml.document',), upper_mime_types=(), children_mime_types=())
DOC_PRESENTATION = MimeType(extensions=('pptx', 'odp', 'ppt', 'pdf'), mime_types=('application/vnd.openxmlformats-officedocument.presentationml.presentation', 'application/vnd.oasis.opendocument.presentation', 'application/vnd.ms-powerpoint', 'application/pdf'), upper_mime_types=(), children_mime_types=(MimeType(extensions=('pptx',), mime_types=('application/vnd.openxmlformats-officedocument.presentationml.presentation',), upper_mime_types=(), children_mime_types=()), MimeType(extensions=('odp',), mime_types=('application/vnd.oasis.opendocument.presentation',), upper_mime_types=(), children_mime_types=()), MimeType(extensions=('ppt',), mime_types=('application/vnd.ms-powerpoint',), upper_mime_types=(), children_mime_types=()), MimeType(extensions=('pdf',), mime_types=('application/pdf',), upper_mime_types=(), children_mime_types=())))
DOC_SPREADSHEET = MimeType(extensions=('xlsx', 'ods', 'csv', 'xls'), mime_types=('application/vnd.openxmlformats-officedocument.spreadsheetml.sheet', 'application/vnd.oasis.opendocument.spreadsheet', 'text/csv', 'application/vnd.ms-excel'), upper_mime_types=('text/plain',), children_mime_types=(MimeType(extensions=('xlsx',), mime_types=('application/vnd.openxmlformats-officedocument.spreadsheetml.sheet',), upper_mime_types=(), children_mime_types=()), MimeType(extensions=('ods',), mime_types=('application/vnd.oasis.opendocument.spreadsheet',), upper_mime_types=(), children_mime_types=()), MimeType(extensions=('csv',), mime_types=('text/csv',), upper_mime_types=('text/plain',), children_mime_types=()), MimeType(extensions=('xls',), mime_types=('application/vnd.ms-excel',), upper_mime_types=(), children_mime_types=())))
DOC_TEXT = MimeType(extensions=('docx', 'odt', 'doc', 'md'), mime_types=('application/vnd.openxmlformats-officedocument.wordprocessingml.document', 'application/vnd.oasis.opendocument.text', 'application/msword', 'text/markdown'), upper_mime_types=('text/plain',), children_mime_types=(MimeType(extensions=('docx',), mime_types=('application/vnd.openxmlformats-officedocument.wordprocessingml.document',), upper_mime_types=(), children_mime_types=()), MimeType(extensions=('odt',), mime_types=('application/vnd.oasis.opendocument.text',), upper_mime_types=(), children_mime_types=()), MimeType(extensions=('doc',), mime_types=('application/msword',), upper_mime_types=(), children_mime_types=()), MimeType(extensions=('md',), mime_types=('text/markdown',), upper_mime_types=('text/plain',), children_mime_types=())))
DPX = MimeType(extensions=('dpx',), mime_types=('image/dpx',), upper_mime_types=(), children_mime_types=())
EPS = MimeType(extensions=('eps',), mime_types=('application/postscript',), upper_mime_types=(), children_mime_types=())
EXR = MimeType(extensions=('exr',), mime_types=('image/aces-exr',), upper_mime_types=(), children_mime_types=())
GIF = MimeType(extensions=('gif',), mime_types=('image/gif',), upper_mime_types=(), children_mime_types=())
HTML = MimeType(extensions=('html', 'htm'), mime_types=('text/html',), upper_mime_types=(), children_mime_types=())
IMG_ANIM = MimeType(extensions=('gif', 'apng', 'webp'), mime_types=('image/gif', 'image/apng', 'image/webp'), upper_mime_types=(), children_mime_types=(MimeType(extensions=('gif',), mime_types=('image/gif',), upper_mime_types=(), children_mime_types=()), MimeType(extensions=('apng',), mime_types=('image/apng',), upper_mime_types=(), children_mime_types=()), MimeType(extensions=('webp',), mime_types=('image/webp',), upper_mime_types=(), children_mime_types=())))
IMG_RASTER = MimeType(extensions=('png', 'jpeg', 'jpg', 'tiff'), mime_types=('image/png', 'image/jpeg', 'image/tiff'), upper_mime_types=(), children_mime_types=(MimeType(extensions=('png',), mime_types=('image/png',), upper_mime_types=(), children_mime_types=()), MimeType(extensions=('jpeg', 'jpg'), mime_types=('image/jpeg',), upper_mime_types=(), children_mime_types=()), MimeType(extensions=('tiff',), mime_types=('image/tiff',), upper_mime_types=(), children_mime_types=())))
IMG_SEQ = MimeType(extensions=('exr', 'dpx', 'tiff'), mime_types=('image/aces-exr', 'image/dpx', 'image/tiff'), upper_mime_types=(), children_mime_types=(MimeType(extensions=('exr',), mime_types=('image/aces-exr',), upper_mime_types=(), children_mime_types=()), MimeType(extensions=('dpx',), mime_types=('image/dpx',), upper_mime_types=(), children_mime_types=()), MimeType(extensions=('tiff',), mime_types=('image/tiff',), upper_mime_types=(), children_mime_types=())))
IMG_VEC = MimeType(extensions=('svg', 'eps'), mime_types=('image/svg+xml', 'application/postscript'), upper_mime_types=(), children_mime_types=(MimeType(extensions=('svg',), mime_types=('image/svg+xml',), upper_mime_types=(), children_mime_types=()), MimeType(extensions=('eps',), mime_types=('application/postscript',), upper_mime_types=(), children_mime_types=())))
JPEG = MimeType(extensions=('jpeg', 'jpg'), mime_types=('image/jpeg',), upper_mime_types=(), children_mime_types=())
JSON = MimeType(extensions=('json',), mime_types=('application/json',), upper_mime_types=(), children_mime_types=())
MD = MimeType(extensions=('md',), mime_types=('text/markdown',), upper_mime_types=('text/plain',), children_mime_types=())
MOV = MimeType(extensions=('mov',), mime_types=('video/quicktime', 'video/wbm'), upper_mime_types=(), children_mime_types=())
MP4 = MimeType(extensions=('mp4',), mime_types=('video/mp4',), upper_mime_types=(), children_mime_types=())
NOTYPE: MimeType = MimeType(extensions=(), mime_types=(), upper_mime_types=(), children_mime_types=())
ODP = MimeType(extensions=('odp',), mime_types=('application/vnd.oasis.opendocument.presentation',), upper_mime_types=(), children_mime_types=())
ODS = MimeType(extensions=('ods',), mime_types=('application/vnd.oasis.opendocument.spreadsheet',), upper_mime_types=(), children_mime_types=())
ODT = MimeType(extensions=('odt',), mime_types=('application/vnd.oasis.opendocument.text',), upper_mime_types=(), children_mime_types=())
PDF = MimeType(extensions=('pdf',), mime_types=('application/pdf',), upper_mime_types=(), children_mime_types=())
PNG = MimeType(extensions=('png',), mime_types=('image/png',), upper_mime_types=(), children_mime_types=())
PPT = MimeType(extensions=('ppt',), mime_types=('application/vnd.ms-powerpoint',), upper_mime_types=(), children_mime_types=())
PPTX = MimeType(extensions=('pptx',), mime_types=('application/vnd.openxmlformats-officedocument.presentationml.presentation',), upper_mime_types=(), children_mime_types=())
SVG = MimeType(extensions=('svg',), mime_types=('image/svg+xml',), upper_mime_types=(), children_mime_types=())
TEXT = MimeType(extensions=('txt',), mime_types=('text/plain',), upper_mime_types=(), children_mime_types=())
TIFF = MimeType(extensions=('tiff',), mime_types=('image/tiff',), upper_mime_types=(), children_mime_types=())
UNHANDLED: MimeType = MimeType(extensions=(), mime_types=(), upper_mime_types=(), children_mime_types=())
VIDEO = MimeType(extensions=('mp4', 'mov', 'avi', 'wmv'), mime_types=('video/mp4', 'video/quicktime', 'video/wbm', 'video/x-msvideo', 'video/x-ms-wmv'), upper_mime_types=(), children_mime_types=(MimeType(extensions=('mp4',), mime_types=('video/mp4',), upper_mime_types=(), children_mime_types=()), MimeType(extensions=('mov',), mime_types=('video/quicktime', 'video/wbm'), upper_mime_types=(), children_mime_types=()), MimeType(extensions=('avi',), mime_types=('video/x-msvideo',), upper_mime_types=(), children_mime_types=()), MimeType(extensions=('wmv',), mime_types=('video/x-ms-wmv',), upper_mime_types=(), children_mime_types=())))
WEBP = MimeType(extensions=('webp',), mime_types=('image/webp',), upper_mime_types=(), children_mime_types=())
WMV = MimeType(extensions=('wmv',), mime_types=('video/x-ms-wmv',), upper_mime_types=(), children_mime_types=())
XLS = MimeType(extensions=('xls',), mime_types=('application/vnd.ms-excel',), upper_mime_types=(), children_mime_types=())
XLSX = MimeType(extensions=('xlsx',), mime_types=('application/vnd.openxmlformats-officedocument.spreadsheetml.sheet',), upper_mime_types=(), children_mime_types=())
XML = MimeType(extensions=('xml',), mime_types=('application/xml', 'text/xml'), upper_mime_types=(), children_mime_types=())
classmethod clean_suffix(suffix: str) str
classmethod from_mimetype(file_path: str | Path, raise_err: bool = False, return_matches: bool = False) Tuple[FileType, Tuple[FileType, ...]]

Determines a filetype from a file’s MIME type.

Parameters:
  • file_path (str) – The path to the file.

  • raise_err (bool, optional) – Whether to raise an exception if the type is unhandled. Defaults to False.

  • return_matches (bool, optional) – Whether to return a tuple with the first matching filetype and a list of all options. Defaults to False.

Returns:

The determined filetype enumeration member, or a tuple with the first matching filetype and a list of all options.

Return type:

FileType

Raises:
  • FileNotFoundError – If the file does not exist.

  • UnsupportedFileTypeError – If the file type is unhandled and raise_err is True.

classmethod from_path(path: str | Path, read_content=False, raise_err=False, return_matches=False) Tuple[FileType, Tuple[FileType, ...]]

Determines the filetype of a file based on its path. Optionally reads the file’s content to verify its type.

Parameters:
  • path (Path) – The path to the file.

  • read_content (bool, optional) – If True, the method also checks the file’s content to determine its type. Defaults to False.

  • raise_err (bool, optional) – If True, raises exceptions for unsupported types or when file does not exist. Defaults to False.

  • return_matches (bool, optional) – Whether to return a tuple with the first matching filetype and a list of all options. Defaults to False.

Returns:

The determined filetype enumeration member based on the file’s suffix and/or content, or a tuple with the first matching filetype and a list of all options.

Return type:

FileType

Raises:
  • FileNotFoundError – If the file does not exist when attempting to read its content.

  • UnsupportedFileTypeError – If the file type is unsupported and raise_err is True.

  • AssertionError – If there is a mismatch between the file type determined from the file’s suffix and its content.

classmethod from_suffix(suffix: str, raise_err: bool = False, return_matches: bool = False) Tuple[FileType, Tuple[FileType, ...]]

Determines a filetype from a file’s suffix.

Parameters:
  • suffix (str) – The file suffix (extension).

  • raise_err (bool, optional) – Whether to raise an exception if the type is unhandled. Defaults to False.

  • return_matches (bool, optional) – Whether to return a tuple with the first matching filetype and a list of all options. Defaults to False.

Returns:

The determined filetype enumeration member, or a tuple with the first matching filetype and a list of all options.

Return type:

FileType

Raises:
classmethod get_filetypes()

Yields all valid file types in the enumeration.

get_one_mimetype() str

Retrieves the primary mimetype associated with the filetype.

Returns:

The primary mimetype for the filetype.

Returns an empty string if the filetype does not have an associated extension.

Return type:

Mimetype

get_one_suffix() str

Retrieves the primary file extension associated with the filetype.

Returns:

The primary file extension for the filetype, prefixed with a period.

Returns an empty string if the filetype does not have an associated extension.

Return type:

str

get_value() MimeType

Returns the MimeType associated with the enumeration member.

Returns:

The MIME type information.

Return type:

MimeType

is_true_filetype() bool

Determines if the filetype instance represents a supported file type based on the presence of defined extensions.

Returns:

True if the filetype has at least one associated file extension, False otherwise.

Return type:

bool

is_valid_mime_type(file_path: Path, raise_err=False) bool

Validates whether the MIME type of the file at the specified path aligns with the filetype’s expected MIME types.

This method first determines the filetype based on the file’s actual MIME type (determined by reading the file’s content) and then checks if this determined filetype matches the instance calling this method. Special consideration is given to filetype.TEXT, where a broader compatibility check is performed due to the generic nature of text MIME types.

Parameters:
  • file_path (Path) – The path to the file whose MIME type is to be validated.

  • raise_err (bool, optional) – If True, a MismatchedException is raised if the file’s MIME type does not match the expected MIME types of the filetype instance. Defaults to False.

Returns:

True if the file’s MIME type matches the expected MIME types for this filetype instance or if special

compatibility conditions are met (e.g., for filetype.TEXT with “text/plain”). Otherwise, False.

Return type:

bool

Raises:

MismatchedException – If raise_err is True and the file’s MIME type does not match the expected MIME types for this filetype instance, including detailed information about the mismatch.

is_valid_path(file_path: str | Path, read_content=False, raise_err=False) bool

Validates the filetype of a given file path. Optionally reads the file’s content to verify its type.

Parameters:
  • file_path (Union[str, Path]) – The file path to validate.

  • read_content (bool, optional) – If True, the method also checks the file’s content to validate its type. Defaults to False.

  • raise_err (bool, optional) – If True, raises exceptions for mismatched or unsupported types. Defaults to False.

Returns:

True if the file path’s type matches the filetype, False otherwise.

Return type:

bool

Raises:
  • AssertionError – If there is a mismatch between the file type determined from the file’s suffix and its content.

  • MismatchedException – If the file type determined from the file’s suffix or content does not match the filetype.

is_valid_suffix(suffix: str, raise_err=False) bool

Validates whether a given file extension matches the filetype’s expected extensions.

Parameters:
  • suffix (str) – The file extension to validate, including the leading period (e.g., “.txt”).

  • raise_err (bool, optional) – If True, raises a MismatchedException for invalid extensions. Defaults to False.

Returns:

True if the suffix matches one of the filetype’s extensions, False otherwise.

Return type:

bool

Raises:

MismatchedException – If the suffix does not match the filetype’s extensions and raise_err is True.

class opencf_core.filetypes.FileTypeExamples(value=<no_arg>, names=None, module=None, qualname=None, type=None, start=1, boundary=None)

Bases: Enum

Enumeration of supported file types with methods for type determination and validation.

XML = MimeType(extensions=('xml',), mime_types=('application/xml', 'text/xml'), upper_mime_types=(), children_mime_types=())
class opencf_core.filetypes.MimeType(extensions: Tuple[str, ...] = (), mime_types: Tuple[str, ...] = (), upper_mime_types: Tuple[str, ...] = (), children_mime_types: Tuple[MimeType, ...] = ())

Bases: object

Class representing MIME type information.

extensions

Tuple of file extensions associated with the MIME type.

Type:

Tuple[str, …]

mime_types

Tuple of MIME types.

Type:

Tuple[str, …]

upper_mime_types

Tuple of additional MIME types that can be considered equivalent.

Type:

Tuple[str, …]

children_mime_types: Tuple[MimeType, ...] = ()
extensions: Tuple[str, ...] = ()
mime_types: Tuple[str, ...] = ()
upper_mime_types: Tuple[str, ...] = ()
opencf_core.filetypes.extend_filetype_enum(added_enum: Type[Enum]) None

Extends the BaseFileType enumeration with members from another enumeration.

Parameters:

added_enum (Type[Enum]) – The enum class to extend BaseFileType with.

opencf_core.filetypes.extract_enum_members(enum_cls: Type) Dict[str, MimeType]

Extracts MimeType instances from an enum class.

Parameters:

enum_cls (Type) – The enum class.

Returns:

Dictionary of MimeType instances keyed by enum member names.

Return type:

Dict[str, MimeType]

opencf_core.filetypes.get_equivalent_file_types(mime_types: Set[MimeType], raise_error: bool = True) Set[FileType]

Get the equivalent FileTypes for a given list of MimeTypes.

Parameters:
  • mime_types (Set[MimeType]) – The list of MIME types to find the equivalent FileTypes for.

  • raise_error (bool, optional) – Controls whether to raise an error if no equivalent FileType is found. Defaults to True.

Returns:

A list of equivalent FileTypes if found, otherwise None.

Return type:

List[FileType]

opencf_core.filetypes.get_file_type_children(file_type: FileType, include_head: bool = False) Set[FileType]

Recursively get all children FileTypes as equivalent FileTypes of the MIME types in the subtree of the given FileType.

Parameters:
  • file_type (FileType) – The FileType to get the subtree for.

  • include_head (bool, optional) – Controls whether to include the head node in the result. Defaults to False.

Returns:

A set of all equivalent FileTypes in the subtree.

Return type:

Set[FileType]

Example

>>> all_image_children = get_file_type_children(FileType.IMG_RASTER)
>>> print(all_image_children)
{FileType.PNG, FileType.JPEG, FileType.TIFF}
opencf_core.filetypes.get_file_types_clidren(file_types: Iterable[FileType], include_head: bool = False) Set[FileType]

Recursively get all children FileTypes as equivalent FileTypes of the MIME types in the subtree of the given list of FileType instances.

Parameters:
  • file_types (List[FileType]) – The list of FileType instances to get the subtree for.

  • include_head (bool, optional) – Controls whether to include the head node in the result. Defaults to False.

Returns:

A set of all equivalent FileTypes in the subtree.

Return type:

Set[FileType]

Example

>>> all_image_children = get_file_types_from_list([FileType.IMG_RASTER])
>>> print(all_image_children)
{FileType.PNG, FileType.JPEG, FileType.TIFF}
opencf_core.filetypes.get_mime_type_children(mime_type: MimeType, include_head: bool = False) Set[MimeType]

Recursively get all children MIME types in the subtree of the given MIME type.

Parameters:
  • mime_type (MimeType) – The MIME type to get the subtree for.

  • include_head (bool, optional) – Controls whether to include the head node in the result. Defaults to False.

Returns:

A set of all MIME types in the subtree.

Return type:

Set[MimeType]

Example

>>> all_image_children = get_mime_type_children(MimeType(extensions=('png',), mime_types=('image/png',), upper_mime_types=(), children_mime_types=()))
>>> print(all_image_children)
{MimeType(extensions=('png',), mime_types=('image/png',), upper_mime_types=(), children_mime_types=()),
 MimeType(extensions=('jpeg', 'jpg'), mime_types=('image/jpeg',), upper_mime_types=(), children_mime_types=()),
 MimeType(extensions=('tiff',), mime_types=('image/tiff',), upper_mime_types=(), children_mime_types=())}
opencf_core.filetypes.merge_mimetype(*mimetypes: MimeType) MimeType

Merge multiple MimeType objects into one.

Input/Output Handler Module

This module is designed to provide a structured approach to handling file input and output operations across various formats such as plain text, CSV, JSON, and potentially XML. It introduces a set of abstract base classes and concrete implementations for reading from and writing to files, ensuring type safety and format consistency through method signatures and runtime checks.

class opencf_core.io_handler.Converter

Bases: ABC

Abstract base class for data converters.

abstract _check_input_format(content: Any) bool

Checks if the provided content matches the expected input format.

Parameters:

content (Any) – The content to be checked.

Returns:

True if the content matches the expected input format, False otherwise.

Return type:

bool

abstract _check_output_format(content: Any) bool

Checks if the provided content matches the expected output format.

Parameters:

content (Any) – The content to be checked.

Returns:

True if the content matches the expected output format, False otherwise.

Return type:

bool

abstract _convert(content: Any) Any

Converts the provided content from the input format to the output format.

Parameters:

content (Any) – The content to be converted.

Returns:

The converted content in the output format.

Return type:

Any

check_input_format(content: Any) bool
check_output_format(content: Any) bool
convert(content: Any) Any
class opencf_core.io_handler.CsvToDictReader

Bases: Reader

Reads content from a CSV file and returns it as a list of dictionaries.

Example

>>> reader = CsvToDictReader()
>>> content = reader.read(Path('input.csv'))
>>> print(content)
[{'name': 'John', 'age': '30'}, {'name': 'Jane', 'age': '25'}]
_check_input_format(content: List[Dict[str, Any]]) bool

Validates the input content to ensure it is a list of dictionaries.

Parameters:

content (List[Dict[str, Any]]) – The content to validate.

Returns:

True if the content is a list of dictionaries, False otherwise.

Return type:

bool

_read_content(input_path: Path) List[Dict[str, Any]]

Reads and parses the content from the CSV file at the given path.

Parameters:

input_path (Path) – The path to the CSV file.

Returns:

The parsed content as a list of dictionaries.

Return type:

List[Dict[str, Any]]

input_format

alias of List[Dict[str, Any]]

class opencf_core.io_handler.DictToCsvWriter

Bases: Writer

Writes content from a dictionary to a CSV file.

_check_output_format(content: List[Dict[str, Any]]) bool

Validates the output content to ensure it is a list of dictionaries.

Parameters:

content (List[Dict[str, Any]]) – The content to validate.

Returns:

True if the content is a list of dictionaries, False otherwise.

Return type:

bool

_write_content(output_path: Path, output_content: List[Dict[str, Any]]) None

Writes the list of dictionaries content to a CSV file at the given path.

Parameters:
  • output_path (Path) – The path to the CSV file.

  • content (List[Dict[str, Any]]) – The list of dictionaries content to write.

output_format

alias of List[Dict[str, Any]]

class opencf_core.io_handler.DictToJsonWriter

Bases: Writer

Writes content from a dictionary to a JSON file.

output_format

alias of Dict[str, Any]

class opencf_core.io_handler.JsonToDictReader

Bases: Reader

Reads content from a JSON file and returns it as a dictionary.

input_format

alias of Dict[str, Any]

class opencf_core.io_handler.Reader

Bases: ABC

Abstract base class for file readers.

abstract _check_input_format(content: Any) bool

Checks if the provided content matches the expected input format.

Parameters:

content (Any) – The content to be checked.

Returns:

True if the content matches the expected input format, False otherwise.

Return type:

bool

abstract _read_content(input_path: Path) Any

Reads and returns the content from the given input path.

Parameters:

input_path (Path) – The path to the input file.

Returns:

The content read from the input file.

Return type:

Any

check_input_format(content: Any) bool
read_content(input_path: Path) Any
class opencf_core.io_handler.SamePathReader

Bases: Reader

A Reader that returns the input path itself, useful for operations where the file path is the desired output.

input_format

alias of Path

class opencf_core.io_handler.StrToTxtWriter

Bases: Writer

Writes a string to a text file.

output_format

alias of str

class opencf_core.io_handler.StrToXmlWriter

Bases: Writer

Writes content as a string to an XML file.

output_format

alias of str

class opencf_core.io_handler.TreeToXmlWriter

Bases: Writer

Writes content from a dictionary to an XML file.

_check_output_format(content: Element) bool

Validates the output content to ensure it is an ElementTree element.

Parameters:

content (ET.Element) – The content to validate.

Returns:

True if the content is a valid ElementTree element, False otherwise.

Return type:

bool

_write_content(output_path: Path, output_content: Element) None

Writes the ElementTree element content to an XML file at the given path.

Parameters:
  • output_path (Path) – The path to the XML file.

  • content (ET.Element) – The ElementTree element content to write.

output_format

alias of Element

class opencf_core.io_handler.TxtToStrReader

Bases: Reader

Reads content from a text file and returns it as a string.

input_format

alias of str

class opencf_core.io_handler.Writer

Bases: ABC

Abstract base class for file writers.

abstract _check_output_format(content: Any) bool

Checks if the provided content matches the expected output format.

Parameters:

content (Any) – The content to be checked.

Returns:

True if the content matches the expected output format, False otherwise.

Return type:

bool

abstract _write_content(output_path: Path, output_content: Any)

Writes the provided content to the given output path.

Parameters:
  • output_path (Path) – The path to the output file.

  • output_content (Any) – The content to be written to the output file.

check_output_format(content: Any) bool
write_content(output_path: Path, output_content: Any)
class opencf_core.io_handler.XmlToStrReader

Bases: Reader

Reads content from an XML file and returns it as a string.

input_format

alias of str

class opencf_core.io_handler.XmlToTreeReader

Bases: Reader

Reads content from an XML file and returns it as an ElementTree element.

_check_input_format(content: Element) bool

Validates the input content to ensure it is an ElementTree element.

Parameters:

content (ET.Element) – The content to validate.

Returns:

True if the content is a valid ElementTree element, False otherwise.

Return type:

bool

_read_content(input_path: Path) Element

Reads and parses the content from the XML file at the given path.

Parameters:

input_path (Path) – The path to the XML file.

Returns:

The root element of the parsed XML tree.

Return type:

ET.Element

input_format

alias of Element

class opencf_core.logging_config.ColoredFormatter(fmt=None, datefmt=None, style='%', validate=True)

Bases: Formatter

FORMATS = {10: '\x1b[38;20m%(asctime)s - %(name)s - %(levelname)s - %(message)s (%(filename)s:%(lineno)d)\x1b[0m', 20: '\x1b[38;20m%(asctime)s - %(name)s - %(levelname)s - %(message)s (%(filename)s:%(lineno)d)\x1b[0m', 30: '\x1b[33;20m%(asctime)s - %(name)s - %(levelname)s - %(message)s (%(filename)s:%(lineno)d)\x1b[0m', 40: '\x1b[31;20m%(asctime)s - %(name)s - %(levelname)s - %(message)s (%(filename)s:%(lineno)d)\x1b[0m', 50: '\x1b[31;1m%(asctime)s - %(name)s - %(levelname)s - %(message)s (%(filename)s:%(lineno)d)\x1b[0m'}
bold_red = '\x1b[31;1m'
format(record)

Format the specified record as text.

The record’s attribute dictionary is used as the operand to a string formatting operation which yields the returned string. Before formatting the dictionary, a couple of preparatory steps are carried out. The message attribute of the record is computed using LogRecord.getMessage(). If the formatting string uses the time (as determined by a call to usesTime(), formatTime() is called to format the event time. If there is exception information, it is formatted using formatException() and appended to the message.

grey = '\x1b[38;20m'
log_format = '%(asctime)s - %(name)s - %(levelname)s - %(message)s (%(filename)s:%(lineno)d)'
red = '\x1b[31;20m'
reset = '\x1b[0m'
yellow = '\x1b[33;20m'
class opencf_core.logging_config.LoggerConfig

Bases: object

set_log_file(log_file: str) None

Set log file.

Parameters:

log_file (str) – Path to the log file.

set_log_level(level: int) None

Set log level.

Parameters:

level (int) – Logging level.

set_log_level_str(level: str) None

Set log level.

Parameters:

level (str) – Logging level.

setup_logger(name: str, log_file: str | None = None, level: int = 20) None

Set up logger.

Parameters:
  • name (str) – Name of the logger.

  • log_file (str, optional) – Path to the log file. Defaults to None.

  • level (int, optional) – Logging level. Defaults to logging.INFO.

MIME Type Guesser Module

This module provides a singleton class for guessing MIME types from file paths using the python-magic library.

opencf_core.mimes.guess_mime_type_from_file(file_path: str | Path) str

Guesses the MIME type from the file path.

Parameters:

file_path (str) – The path to the file.

Returns:

The guessed MIME type.

Return type:

str

opencf_core.utils.ensure_iterable(obj, raise_err=True, return_single=False)
opencf_core.utils.get_filepaths_from_inputs(args: List[str]) List[str]

Generate a list of file paths from a list of command-line arguments.

Parameters:

args (list of str) – List of command-line arguments including file paths, directory paths, and glob patterns.

Returns:

List of file paths that match the input criteria.

Return type:

list of str

opencf_core.utils.is_iterable(obj)
opencf_core.utils.test()

Module contents