opencf_core package
Submodules
Base Converter Module
This module serves as a foundation for creating file conversion utilities. It facilitates the development of file converters through abstract base classes, managing file types, and handling input and output files efficiently. The module is designed to be extendible, supporting various file formats and conversion strategies.
Classes: - ResolvedInputFile: Manages file paths and types, resolving them as needed. - BaseConverter: An abstract base class for creating specific file format converters, enforcing the implementation
of file conversion logic.
Exceptions: - ValueError: Raised when file paths or types are incompatible or unsupported. - AssertionError: Ensured for internal consistency checks, confirming that file types match expected values.
- class opencf_core.base_converter.BaseConverter(input_files: ResolvedInputFile | List[ResolvedInputFile], output_file: ResolvedInputFile)
Bases:
ABCAbstract base class for file conversion, defining the template for input to output file conversion.
- __init__(input_files: ResolvedInputFile | List[ResolvedInputFile], output_file: ResolvedInputFile)
Sets up the converter with specified input and output files, ensuring compatibility.
- Parameters:
input_files (Union[ResolvedInputFile, List[ResolvedInputFile]]) – Either a single input file or a list of input files with resolved types.
output_file (ResolvedInputFile) – The output file where the converted data will be saved.
- _check_file_types()
Validates that the provided files have acceptable and supported file types for conversion.
- abstract _convert(input_contents: List, output_file: Path | None = None, output_folder: Path | None = None)
Abstract method to be implemented by subclasses to perform the actual file conversion process.
- abstract classmethod _get_supported_input_type() FileType
Abstract method to define the supported input file type by the converter.
- Returns:
The supported input file type.
- Return type:
- abstract classmethod _get_supported_output_type() FileType
Abstract method to define the supported output file type by the converter.
- Returns:
The supported output file type.
- Return type:
- check_io_handlers()
Ensures that valid I/O handlers (file reader and writer) are set for the conversion.
- convert()
Orchestrates the file conversion process, including reading, converting, and writing the file.
- file_reader: FileReader = None
- file_writer: FileWriter = None
- folder_as_output: bool = None
- classmethod get_input_type()
- classmethod get_output_type()
- exception opencf_core.base_converter.InvalidOutputFormatError(solving_tips)
Bases:
ExceptionException raised when the output content format check fails after conversion.
- class opencf_core.base_converter.ResolvedInputFile(path: str | Path, is_dir: bool | None = None, should_exist: bool = True, file_type: str | None = None, add_suffix: bool = False, read_content: bool = False)
Bases:
objectHandles resolving the file type of a given file or folder, managing path adjustments and optional content reading.
- __init__(path: str | Path, is_dir: bool | None = None, should_exist: bool = True, file_type: str | None = None, add_suffix: bool = False, read_content: bool = False)
Initializes an instance of ResolvedInputFile with options for type resolution and path modification.
- Parameters:
path (str) – The path to the file or folder.
is_dir (bool, optional) – Specifies if the path is a directory. If None, inferred using pathlib. Defaults to None.
should_exist (bool, optional) – Specifies if the existence of the path is required. Defaults to True.
file_type (str, optional) – The explicit type of the file. If None, attempts to resolve to a FileType object based on the path or content.
add_suffix (bool, optional) – Whether to append the resolved file type’s suffix to the file path. Defaults to False.
read_content (bool, optional) – Whether to read the file’s content to assist in type resolution. Defaults to False.
- __repr__()
Returns the absolute file path as a string.
- Returns:
The resolved file path.
- Return type:
str
- __resolve_filetype__(file_type: str, file_path: Path, read_content: bool) FileType
Determines the file type, utilizing the provided type, file path, or content as needed.
- Parameters:
file_type (FileType or str, optional) – An explicit file type or extension.
file_path (str) – The path to the file, used if file_type is not provided.
read_content (bool) – Indicates if file content should be used to help resolve the file type.
- Returns:
The resolved file type.
- Return type:
- __str__()
Returns the absolute file path as a string.
- Returns:
The resolved file path.
- Return type:
str
- _resolve_directory_type(file_type: str)
Handles the case when the specified path is a directory.
- _resolve_file_type(file_type: str, read_content: bool, add_suffix: bool)
Resolves the file type based on given parameters.
- Parameters:
file_type (FileType or str, optional) – An explicit file type or extension.
read_content (bool) – Indicates if file content should be used to help resolve the file type.
add_suffix (bool) – Whether to append the resolved file type’s suffix to the file path.
- _resolve_path_type(file_type: str | None = None) bool
Determines if the provided path refers to a directory or a file, based on its existence, suffix, and file_type.
- Parameters:
file_type (str, optional) – The type of file expected at the path. Influences directory creation and type resolution.
- Returns:
True if the path is determined to be a directory, False if it is a file.
- Return type:
bool
Main Module
This module contains the main application logic.
- class opencf_core.converter_app.BaseConverterApp(input_file_paths: List[str], input_file_type: str | None = None, output_file_path: str | None = None, output_file_type: str | None = None)
Bases:
objectMain application class responsible for managing file conversions.
- __init__(input_file_paths: List[str], input_file_type: str | None = None, output_file_path: str | None = None, output_file_type: str | None = None)
Initializes the BaseConverterApp instance.
- Parameters:
input_file_paths (List[str]) – List of paths to the input files.
input_file_type (FileType, optional) – The type of the input file. Defaults to None.
output_file_path (str, optional) – The path to the output file. Defaults to None.
output_file_type (FileType, optional) – The type of the output file. Defaults to None.
- add_converter_pair(converter_class: Type[BaseConverter])
Adds a converter pair to the application.
- Parameters:
converter_class (Type[BaseConverter]) – The converter class to add.
- Raises:
ValueError – If the converter class is invalid.
- converters: List[Type[BaseConverter]] = []
- get_supported_conversions() Tuple[Tuple[FileType, FileType], ...]
Retrieves the supported conversions.
- run()
Runs the conversion process.
File Type Definitions Module
This module provides a comprehensive framework for handling various file types within a file conversion context. It defines classes and enumerations for identifying, validating, and working with different file types, based on file extensions, MIME types, and optionally, file content. It also includes custom exceptions for handling common errors related to file type processing.
Classes: - UnsupportedFileTypeError: Custom exception for handling unsupported file types. - EmptySuffixError: Specialized exception for cases where a file’s suffix does not provide enough information
to determine its type.
FileNotFoundError: Raised when a specified file does not exist.
MismatchedException: Exception for handling cases where there’s a mismatch between expected and actual file attributes.
- FileType: Enum class that encapsulates various file types supported by the system, providing methods for
type determination from file attributes.
Functions: - test_file_type_parsing(): Demonstrates and validates the parsing functionality for various file types. - test_file_type_matching(): Tests the matching and validation capabilities of the FileType class.
Dependencies: - collections.namedtuple: For defining simple classes for storing MIME type information. - enum.Enum: For creating the FileType enumeration. - pathlib.Path: For file path manipulations and checks. - opencf_core.mimes.guess_mime_type_from_file: Utility function to guess MIME type from a file path.
- exception opencf_core.filetypes.EmptySuffixError
Bases:
UnsupportedFileTypeErrorException raised when a file’s suffix does not provide enough information to determine its type.
- exception opencf_core.filetypes.FileNotFoundError(file_path)
Bases:
ExceptionException raised when the specified file cannot be found.
- class opencf_core.filetypes.FileType(value)
Bases:
EnumEnumeration of supported file types with methods for type determination and validation.
- CSV = MimeType(extensions=('csv',), mime_types=('text/csv',), upper_mime_types=())
- EXCEL = MimeType(extensions=('xls', 'xlsx'), mime_types=('application/vnd.ms-excel', 'application/vnd.openxmlformats-officedocument.spreadsheetml.sheet'), upper_mime_types=())
- GIF = MimeType(extensions=('gif',), mime_types=('image/gif',), upper_mime_types=())
- IMAGE = MimeType(extensions=('jpg', 'jpeg', 'png'), mime_types=('image/jpeg', 'image/png'), upper_mime_types=())
- JSON = MimeType(extensions=('json',), mime_types=('application/json',), upper_mime_types=())
- MARKDOWN = MimeType(extensions=('md',), mime_types=('text/markdown',), upper_mime_types=('text/plain',))
- MSWORD = MimeType(extensions=('docx', 'doc'), mime_types=('application/vnd.openxmlformats-officedocument.wordprocessingml.document', 'application/msword'), upper_mime_types=())
- NOTYPE = MimeType(extensions=(), mime_types=(), upper_mime_types=())
- PDF = MimeType(extensions=('pdf',), mime_types=('application/pdf',), upper_mime_types=())
- TEXT = MimeType(extensions=('txt',), mime_types=('text/plain',), upper_mime_types=())
- UNHANDLED = MimeType(extensions=(), mime_types=(), upper_mime_types=())
- VIDEO = MimeType(extensions=('mp4', 'avi'), mime_types=('video/mp4', 'video/x-msvideo'), upper_mime_types=())
- XML = MimeType(extensions=('xml',), mime_types=('application/xml', 'text/xml'), upper_mime_types=())
- classmethod from_mimetype(file_path: str | Path, raise_err: bool = False)
Determines a FileType from a file’s MIME type.
- Parameters:
file_path (str) – The path to the file.
raise_err (bool, optional) – Whether to raise an exception if the type is unhandled. Defaults to False.
- Returns:
The determined FileType enumeration member.
- Return type:
- Raises:
FileNotFoundError – If the file does not exist.
UnsupportedFileTypeError – If the file type is unhandled and raise_err is True.
- classmethod from_path(path: Path, read_content=False, raise_err=False)
Determines the FileType of a file based on its path. Optionally reads the file’s content to verify its type.
- Parameters:
path (Path) – The path to the file.
read_content (bool, optional) – If True, the method also checks the file’s content to determine its type. Defaults to False.
raise_err (bool, optional) – If True, raises exceptions for unsupported types or when file does not exist. Defaults to False.
- Returns:
The determined FileType enumeration member based on the file’s suffix and/or content.
- Return type:
- Raises:
FileNotFoundError – If the file does not exist when attempting to read its content.
UnsupportedFileTypeError – If the file type is unsupported and raise_err is True.
AssertionError – If there is a mismatch between the file type determined from the file’s suffix and its content.
- classmethod from_suffix(suffix: str, raise_err: bool = False)
Determines a FileType from a file’s suffix.
- Parameters:
suffix (str) – The file suffix (extension).
raise_err (bool, optional) – Whether to raise an exception if the type is unhandled. Defaults to False.
- Returns:
The determined FileType enumeration member.
- Return type:
- Raises:
EmptySuffixError – If the suffix is empty and raise_err is True.
UnsupportedFileTypeError – If the file type is unhandled and raise_err is True.
- get_suffix()
Retrieves the primary file extension associated with the FileType.
- Returns:
- The primary file extension for the FileType, prefixed with a period.
Returns an empty string if the FileType does not have an associated extension.
- Return type:
str
- is_true_filetype()
Determines if the FileType instance represents a supported file type based on the presence of defined extensions.
- Returns:
True if the FileType has at least one associated file extension, False otherwise.
- Return type:
bool
- is_valid_mime_type(path: Path, raise_err=False)
Validates whether the MIME type of the file at the specified path aligns with the FileType’s expected MIME types.
This method first determines the FileType based on the file’s actual MIME type (determined by reading the file’s content) and then checks if this determined FileType matches the instance calling this method. Special consideration is given to FileType.TEXT, where a broader compatibility check is performed due to the generic nature of text MIME types.
- Parameters:
path (Path) – The path to the file whose MIME type is to be validated.
raise_err (bool, optional) – If True, a MismatchedException is raised if the file’s MIME type does not match the expected MIME types of the FileType instance. Defaults to False.
- Returns:
- True if the file’s MIME type matches the expected MIME types for this FileType instance or if special
compatibility conditions are met (e.g., for FileType.TEXT with “text/plain”). Otherwise, False.
- Return type:
bool
- Raises:
MismatchedException – If raise_err is True and the file’s MIME type does not match the expected MIME types for this FileType instance, including detailed information about the mismatch.
- is_valid_path(path: Path, raise_err=False, read_content=False)
Validates whether the file at a given path matches the FileType, optionally checking the file’s content.
- Parameters:
path (Path) – The path to the file to validate.
raise_err (bool, optional) – If True, raises a MismatchedException for a mismatching file type. Defaults to False.
read_content (bool, optional) – If True, also validates the file’s content type against the FileType. Defaults to False.
- Returns:
- True if the file’s type matches the FileType, based on its path and optionally its content.
False otherwise.
- Return type:
bool
- Raises:
MismatchedException – If the file’s type does not match and raise_err is True.
- is_valid_suffix(suffix: str, raise_err=False)
Validates whether a given file extension matches the FileType’s expected extensions.
- Parameters:
suffix (str) – The file extension to validate, including the leading period (e.g., “.txt”).
raise_err (bool, optional) – If True, raises a MismatchedException for invalid extensions. Defaults to False.
- Returns:
True if the suffix matches one of the FileType’s extensions, False otherwise.
- Return type:
bool
- Raises:
MismatchedException – If the suffix does not match and raise_err is True.
- class opencf_core.filetypes.MimeType(extensions, mime_types, upper_mime_types)
Bases:
tuple- extensions
Alias for field number 0
- mime_types
Alias for field number 1
- upper_mime_types
Alias for field number 2
- exception opencf_core.filetypes.MismatchedException(label, claimed_val, expected_vals)
Bases:
ExceptionException raised for mismatches between expected and actual file attributes.
- exception opencf_core.filetypes.UnsupportedFileTypeError(message)
Bases:
ExceptionException raised for handling cases of unsupported file types.
- opencf_core.filetypes.test_file_type_matching()
Tests for validating the functionality of file type matching.
- opencf_core.filetypes.test_file_type_parsing()
Tests for validating the functionality of file type parsing.
Input/Output Handler Module
This module is designed to provide a structured approach to handling file input and output operations across various formats such as plain text, CSV, JSON, and potentially XML. It introduces a set of abstract base classes and concrete implementations for reading from and writing to files, ensuring type safety and format consistency through method signatures and runtime checks.
- class opencf_core.io_handler.CsvToListReader
Bases:
FileReaderReads content from a CSV file and returns it as a list of lists, where each sublist represents a row.
- input_format
alias of
List[List[str]]
- class opencf_core.io_handler.DictToJsonWriter
Bases:
FileWriterWrites content from a dictionary to a JSON file.
- output_format
alias of
Dict[str,Any]
- class opencf_core.io_handler.FileReader
Bases:
ABCAbstract base class for file readers.
- abstract _check_input_format(content: Any) bool
Checks if the provided content matches the expected input format.
- Parameters:
content (Any) – The content to be checked.
- Returns:
True if the content matches the expected input format, False otherwise.
- Return type:
bool
- abstract _read_content(input_path: Path) Any
Reads and returns the content from the given input path.
- Parameters:
input_path (Path) – The path to the input file.
- Returns:
The content read from the input file.
- Return type:
Any
- input_format: type = None
- class opencf_core.io_handler.FileWriter
Bases:
ABCAbstract base class for file writers.
- abstract _check_output_format(content: Any) bool
Checks if the provided content matches the expected output format.
- Parameters:
content (Any) – The content to be checked.
- Returns:
True if the content matches the expected output format, False otherwise.
- Return type:
bool
- abstract _write_content(output_path: Path, output_content: Any)
Writes the provided content to the given output path.
- Parameters:
output_path (Path) – The path to the output file.
output_content (Any) – The content to be written to the output file.
- output_format = None
- class opencf_core.io_handler.JsonToDictReader
Bases:
FileReaderReads content from a JSON file and returns it as a dictionary.
- input_format
alias of
Dict[str,Any]
- class opencf_core.io_handler.ListToCsvWriter
Bases:
FileWriterWrites content as a list of lists to a CSV file, where each sublist represents a row.
- output_format
alias of
List[List[str]]
- class opencf_core.io_handler.SamePathReader
Bases:
FileReaderA FileReader that returns the input path itself, useful for operations where the file path is the desired output.
- input_format
alias of
Path
- class opencf_core.io_handler.StrToTxtWriter
Bases:
FileWriterWrites a string to a text file.
- output_format
alias of
str
- class opencf_core.io_handler.StrToXmlWriter
Bases:
FileWriterWrites content as a string to an XML file.
- output_format
alias of
str
- class opencf_core.io_handler.TxtToStrReader
Bases:
FileReaderReads content from a text file and returns it as a string.
- input_format
alias of
str
- class opencf_core.io_handler.XmlToStrReader
Bases:
FileReaderReads content from an XML file and returns it as a string.
- input_format
alias of
str
- opencf_core.logger.setup_logger(log_file='logs/app.log')
Setup logger configuration.
MIME Type Guesser Module
This module provides a singleton class for guessing MIME types from file paths using the python-magic library.
- class opencf_core.mimes.MimeGuesser
Bases:
objectSingleton class for guessing MIME types from file paths using the python-magic library.
- static __new__(cls)
Creates a new instance of the class if it doesn’t exist already.
- Returns:
The instance of the MimeGuesser class.
- Return type:
- get_mime_guesser()
Returns the mime_guesser instance.
- Returns:
The instance of the mime_guesser.
- Return type:
magic.Magic
- classmethod guess_mime_type_from_file(file_path)
Guesses the MIME type from the file path.
- Parameters:
file_path (str) – The path to the file.
- Returns:
The guessed MIME type.
- Return type:
str
- Raises:
ImportError – If the python-magic library is not imported.
- opencf_core.mimes.guess_mime_type_from_file(file_path)
Guesses the MIME type from the file path.
- Parameters:
file_path (str) – The path to the file.
- Returns:
The guessed MIME type.
- Return type:
str