opencf_core package

Submodules

Base Converter Module

This module serves as a foundation for creating file conversion utilities. It facilitates the development of file converters through abstract base classes, managing file types, and handling input and output files efficiently. The module is designed to be extendible, supporting various file formats and conversion strategies.

Classes: - ResolvedInputFile: Manages file paths and types, resolving them as needed. - BaseConverter: An abstract base class for creating specific file format converters, enforcing the implementation

of file conversion logic.

Exceptions: - ValueError: Raised when file paths or types are incompatible or unsupported. - AssertionError: Ensured for internal consistency checks, confirming that file types match expected values.

class opencf_core.base_converter.BaseConverter(input_files: ResolvedInputFile | List[ResolvedInputFile], output_file: ResolvedInputFile)

Bases: ABC

Abstract base class for file conversion, defining the template for input to output file conversion.

__init__(input_files: ResolvedInputFile | List[ResolvedInputFile], output_file: ResolvedInputFile)

Sets up the converter with specified input and output files, ensuring compatibility.

Parameters:
_check_file_types()

Validates that the provided files have acceptable and supported file types for conversion.

abstract _convert(input_contents: List, output_file: Path | None = None, output_folder: Path | None = None)

Abstract method to be implemented by subclasses to perform the actual file conversion process.

abstract classmethod _get_supported_input_type() FileType

Abstract method to define the supported input file type by the converter.

Returns:

The supported input file type.

Return type:

FileType

abstract classmethod _get_supported_output_type() FileType

Abstract method to define the supported output file type by the converter.

Returns:

The supported output file type.

Return type:

FileType

check_io_handlers()

Ensures that valid I/O handlers (file reader and writer) are set for the conversion.

convert()

Orchestrates the file conversion process, including reading, converting, and writing the file.

file_reader: FileReader = None
file_writer: FileWriter = None
folder_as_output: bool = None
classmethod get_input_type()
classmethod get_output_type()
classmethod get_supported_input_type() FileType

Defines the supported input file type for this converter.

Returns:

The file type supported for input.

Return type:

FileType

classmethod get_supported_output_type() FileType

Defines the supported output file type for this converter.

Returns:

The file type supported for output.

Return type:

FileType

exception opencf_core.base_converter.InvalidOutputFormatError(solving_tips)

Bases: Exception

Exception raised when the output content format check fails after conversion.

class opencf_core.base_converter.ResolvedInputFile(path: str | Path, is_dir: bool | None = None, should_exist: bool = True, file_type: str | None = None, add_suffix: bool = False, read_content: bool = False)

Bases: object

Handles resolving the file type of a given file or folder, managing path adjustments and optional content reading.

__init__(path: str | Path, is_dir: bool | None = None, should_exist: bool = True, file_type: str | None = None, add_suffix: bool = False, read_content: bool = False)

Initializes an instance of ResolvedInputFile with options for type resolution and path modification.

Parameters:
  • path (str) – The path to the file or folder.

  • is_dir (bool, optional) – Specifies if the path is a directory. If None, inferred using pathlib. Defaults to None.

  • should_exist (bool, optional) – Specifies if the existence of the path is required. Defaults to True.

  • file_type (str, optional) – The explicit type of the file. If None, attempts to resolve to a FileType object based on the path or content.

  • add_suffix (bool, optional) – Whether to append the resolved file type’s suffix to the file path. Defaults to False.

  • read_content (bool, optional) – Whether to read the file’s content to assist in type resolution. Defaults to False.

__repr__()

Returns the absolute file path as a string.

Returns:

The resolved file path.

Return type:

str

__resolve_filetype__(file_type: str, file_path: Path, read_content: bool) FileType

Determines the file type, utilizing the provided type, file path, or content as needed.

Parameters:
  • file_type (FileType or str, optional) – An explicit file type or extension.

  • file_path (str) – The path to the file, used if file_type is not provided.

  • read_content (bool) – Indicates if file content should be used to help resolve the file type.

Returns:

The resolved file type.

Return type:

FileType

__str__()

Returns the absolute file path as a string.

Returns:

The resolved file path.

Return type:

str

_resolve_directory_type(file_type: str)

Handles the case when the specified path is a directory.

_resolve_file_type(file_type: str, read_content: bool, add_suffix: bool)

Resolves the file type based on given parameters.

Parameters:
  • file_type (FileType or str, optional) – An explicit file type or extension.

  • read_content (bool) – Indicates if file content should be used to help resolve the file type.

  • add_suffix (bool) – Whether to append the resolved file type’s suffix to the file path.

_resolve_path_type(file_type: str | None = None) bool

Determines if the provided path refers to a directory or a file, based on its existence, suffix, and file_type.

Parameters:

file_type (str, optional) – The type of file expected at the path. Influences directory creation and type resolution.

Returns:

True if the path is determined to be a directory, False if it is a file.

Return type:

bool

Main Module

This module contains the main application logic.

class opencf_core.converter_app.BaseConverterApp(input_file_paths: List[str], input_file_type: str | None = None, output_file_path: str | None = None, output_file_type: str | None = None)

Bases: object

Main application class responsible for managing file conversions.

__init__(input_file_paths: List[str], input_file_type: str | None = None, output_file_path: str | None = None, output_file_type: str | None = None)

Initializes the BaseConverterApp instance.

Parameters:
  • input_file_paths (List[str]) – List of paths to the input files.

  • input_file_type (FileType, optional) – The type of the input file. Defaults to None.

  • output_file_path (str, optional) – The path to the output file. Defaults to None.

  • output_file_type (FileType, optional) – The type of the output file. Defaults to None.

add_converter_pair(converter_class: Type[BaseConverter])

Adds a converter pair to the application.

Parameters:

converter_class (Type[BaseConverter]) – The converter class to add.

Raises:

ValueError – If the converter class is invalid.

converters: List[Type[BaseConverter]] = []
get_supported_conversions() Tuple[Tuple[FileType, FileType], ...]

Retrieves the supported conversions.

Returns:

A tuple of tuples representing supported conversions.

Return type:

Tuple[Tuple[FileType, FileType]]

run()

Runs the conversion process.

File Type Definitions Module

This module provides a comprehensive framework for handling various file types within a file conversion context. It defines classes and enumerations for identifying, validating, and working with different file types, based on file extensions, MIME types, and optionally, file content. It also includes custom exceptions for handling common errors related to file type processing.

Classes: - UnsupportedFileTypeError: Custom exception for handling unsupported file types. - EmptySuffixError: Specialized exception for cases where a file’s suffix does not provide enough information

to determine its type.

  • FileNotFoundError: Raised when a specified file does not exist.

  • MismatchedException: Exception for handling cases where there’s a mismatch between expected and actual file attributes.

  • FileType: Enum class that encapsulates various file types supported by the system, providing methods for

    type determination from file attributes.

Functions: - test_file_type_parsing(): Demonstrates and validates the parsing functionality for various file types. - test_file_type_matching(): Tests the matching and validation capabilities of the FileType class.

Dependencies: - collections.namedtuple: For defining simple classes for storing MIME type information. - enum.Enum: For creating the FileType enumeration. - pathlib.Path: For file path manipulations and checks. - opencf_core.mimes.guess_mime_type_from_file: Utility function to guess MIME type from a file path.

exception opencf_core.filetypes.EmptySuffixError

Bases: UnsupportedFileTypeError

Exception raised when a file’s suffix does not provide enough information to determine its type.

exception opencf_core.filetypes.FileNotFoundError(file_path)

Bases: Exception

Exception raised when the specified file cannot be found.

class opencf_core.filetypes.FileType(value)

Bases: Enum

Enumeration of supported file types with methods for type determination and validation.

CSV = MimeType(extensions=('csv',), mime_types=('text/csv',), upper_mime_types=())
EXCEL = MimeType(extensions=('xls', 'xlsx'), mime_types=('application/vnd.ms-excel', 'application/vnd.openxmlformats-officedocument.spreadsheetml.sheet'), upper_mime_types=())
GIF = MimeType(extensions=('gif',), mime_types=('image/gif',), upper_mime_types=())
IMAGE = MimeType(extensions=('jpg', 'jpeg', 'png'), mime_types=('image/jpeg', 'image/png'), upper_mime_types=())
JSON = MimeType(extensions=('json',), mime_types=('application/json',), upper_mime_types=())
MARKDOWN = MimeType(extensions=('md',), mime_types=('text/markdown',), upper_mime_types=('text/plain',))
MSWORD = MimeType(extensions=('docx', 'doc'), mime_types=('application/vnd.openxmlformats-officedocument.wordprocessingml.document', 'application/msword'), upper_mime_types=())
NOTYPE = MimeType(extensions=(), mime_types=(), upper_mime_types=())
PDF = MimeType(extensions=('pdf',), mime_types=('application/pdf',), upper_mime_types=())
TEXT = MimeType(extensions=('txt',), mime_types=('text/plain',), upper_mime_types=())
UNHANDLED = MimeType(extensions=(), mime_types=(), upper_mime_types=())
VIDEO = MimeType(extensions=('mp4', 'avi'), mime_types=('video/mp4', 'video/x-msvideo'), upper_mime_types=())
XML = MimeType(extensions=('xml',), mime_types=('application/xml', 'text/xml'), upper_mime_types=())
classmethod from_mimetype(file_path: str | Path, raise_err: bool = False)

Determines a FileType from a file’s MIME type.

Parameters:
  • file_path (str) – The path to the file.

  • raise_err (bool, optional) – Whether to raise an exception if the type is unhandled. Defaults to False.

Returns:

The determined FileType enumeration member.

Return type:

FileType

Raises:
classmethod from_path(path: Path, read_content=False, raise_err=False)

Determines the FileType of a file based on its path. Optionally reads the file’s content to verify its type.

Parameters:
  • path (Path) – The path to the file.

  • read_content (bool, optional) – If True, the method also checks the file’s content to determine its type. Defaults to False.

  • raise_err (bool, optional) – If True, raises exceptions for unsupported types or when file does not exist. Defaults to False.

Returns:

The determined FileType enumeration member based on the file’s suffix and/or content.

Return type:

FileType

Raises:
  • FileNotFoundError – If the file does not exist when attempting to read its content.

  • UnsupportedFileTypeError – If the file type is unsupported and raise_err is True.

  • AssertionError – If there is a mismatch between the file type determined from the file’s suffix and its content.

classmethod from_suffix(suffix: str, raise_err: bool = False)

Determines a FileType from a file’s suffix.

Parameters:
  • suffix (str) – The file suffix (extension).

  • raise_err (bool, optional) – Whether to raise an exception if the type is unhandled. Defaults to False.

Returns:

The determined FileType enumeration member.

Return type:

FileType

Raises:
get_suffix()

Retrieves the primary file extension associated with the FileType.

Returns:

The primary file extension for the FileType, prefixed with a period.

Returns an empty string if the FileType does not have an associated extension.

Return type:

str

is_true_filetype()

Determines if the FileType instance represents a supported file type based on the presence of defined extensions.

Returns:

True if the FileType has at least one associated file extension, False otherwise.

Return type:

bool

is_valid_mime_type(path: Path, raise_err=False)

Validates whether the MIME type of the file at the specified path aligns with the FileType’s expected MIME types.

This method first determines the FileType based on the file’s actual MIME type (determined by reading the file’s content) and then checks if this determined FileType matches the instance calling this method. Special consideration is given to FileType.TEXT, where a broader compatibility check is performed due to the generic nature of text MIME types.

Parameters:
  • path (Path) – The path to the file whose MIME type is to be validated.

  • raise_err (bool, optional) – If True, a MismatchedException is raised if the file’s MIME type does not match the expected MIME types of the FileType instance. Defaults to False.

Returns:

True if the file’s MIME type matches the expected MIME types for this FileType instance or if special

compatibility conditions are met (e.g., for FileType.TEXT with “text/plain”). Otherwise, False.

Return type:

bool

Raises:

MismatchedException – If raise_err is True and the file’s MIME type does not match the expected MIME types for this FileType instance, including detailed information about the mismatch.

is_valid_path(path: Path, raise_err=False, read_content=False)

Validates whether the file at a given path matches the FileType, optionally checking the file’s content.

Parameters:
  • path (Path) – The path to the file to validate.

  • raise_err (bool, optional) – If True, raises a MismatchedException for a mismatching file type. Defaults to False.

  • read_content (bool, optional) – If True, also validates the file’s content type against the FileType. Defaults to False.

Returns:

True if the file’s type matches the FileType, based on its path and optionally its content.

False otherwise.

Return type:

bool

Raises:

MismatchedException – If the file’s type does not match and raise_err is True.

is_valid_suffix(suffix: str, raise_err=False)

Validates whether a given file extension matches the FileType’s expected extensions.

Parameters:
  • suffix (str) – The file extension to validate, including the leading period (e.g., “.txt”).

  • raise_err (bool, optional) – If True, raises a MismatchedException for invalid extensions. Defaults to False.

Returns:

True if the suffix matches one of the FileType’s extensions, False otherwise.

Return type:

bool

Raises:

MismatchedException – If the suffix does not match and raise_err is True.

class opencf_core.filetypes.MimeType(extensions, mime_types, upper_mime_types)

Bases: tuple

extensions

Alias for field number 0

mime_types

Alias for field number 1

upper_mime_types

Alias for field number 2

exception opencf_core.filetypes.MismatchedException(label, claimed_val, expected_vals)

Bases: Exception

Exception raised for mismatches between expected and actual file attributes.

exception opencf_core.filetypes.UnsupportedFileTypeError(message)

Bases: Exception

Exception raised for handling cases of unsupported file types.

opencf_core.filetypes.test_file_type_matching()

Tests for validating the functionality of file type matching.

opencf_core.filetypes.test_file_type_parsing()

Tests for validating the functionality of file type parsing.

Input/Output Handler Module

This module is designed to provide a structured approach to handling file input and output operations across various formats such as plain text, CSV, JSON, and potentially XML. It introduces a set of abstract base classes and concrete implementations for reading from and writing to files, ensuring type safety and format consistency through method signatures and runtime checks.

class opencf_core.io_handler.CsvToListReader

Bases: FileReader

Reads content from a CSV file and returns it as a list of lists, where each sublist represents a row.

input_format

alias of List[List[str]]

class opencf_core.io_handler.DictToJsonWriter

Bases: FileWriter

Writes content from a dictionary to a JSON file.

output_format

alias of Dict[str, Any]

class opencf_core.io_handler.FileReader

Bases: ABC

Abstract base class for file readers.

abstract _check_input_format(content: Any) bool

Checks if the provided content matches the expected input format.

Parameters:

content (Any) – The content to be checked.

Returns:

True if the content matches the expected input format, False otherwise.

Return type:

bool

abstract _read_content(input_path: Path) Any

Reads and returns the content from the given input path.

Parameters:

input_path (Path) – The path to the input file.

Returns:

The content read from the input file.

Return type:

Any

input_format: type = None
class opencf_core.io_handler.FileWriter

Bases: ABC

Abstract base class for file writers.

abstract _check_output_format(content: Any) bool

Checks if the provided content matches the expected output format.

Parameters:

content (Any) – The content to be checked.

Returns:

True if the content matches the expected output format, False otherwise.

Return type:

bool

abstract _write_content(output_path: Path, output_content: Any)

Writes the provided content to the given output path.

Parameters:
  • output_path (Path) – The path to the output file.

  • output_content (Any) – The content to be written to the output file.

output_format = None
class opencf_core.io_handler.JsonToDictReader

Bases: FileReader

Reads content from a JSON file and returns it as a dictionary.

input_format

alias of Dict[str, Any]

class opencf_core.io_handler.ListToCsvWriter

Bases: FileWriter

Writes content as a list of lists to a CSV file, where each sublist represents a row.

output_format

alias of List[List[str]]

class opencf_core.io_handler.SamePathReader

Bases: FileReader

A FileReader that returns the input path itself, useful for operations where the file path is the desired output.

input_format

alias of Path

class opencf_core.io_handler.StrToTxtWriter

Bases: FileWriter

Writes a string to a text file.

output_format

alias of str

class opencf_core.io_handler.StrToXmlWriter

Bases: FileWriter

Writes content as a string to an XML file.

output_format

alias of str

class opencf_core.io_handler.TxtToStrReader

Bases: FileReader

Reads content from a text file and returns it as a string.

input_format

alias of str

class opencf_core.io_handler.XmlToStrReader

Bases: FileReader

Reads content from an XML file and returns it as a string.

input_format

alias of str

opencf_core.logger.setup_logger(log_file='logs/app.log')

Setup logger configuration.

MIME Type Guesser Module

This module provides a singleton class for guessing MIME types from file paths using the python-magic library.

class opencf_core.mimes.MimeGuesser

Bases: object

Singleton class for guessing MIME types from file paths using the python-magic library.

static __new__(cls)

Creates a new instance of the class if it doesn’t exist already.

Returns:

The instance of the MimeGuesser class.

Return type:

MimeGuesser

get_mime_guesser()

Returns the mime_guesser instance.

Returns:

The instance of the mime_guesser.

Return type:

magic.Magic

classmethod guess_mime_type_from_file(file_path)

Guesses the MIME type from the file path.

Parameters:

file_path (str) – The path to the file.

Returns:

The guessed MIME type.

Return type:

str

Raises:

ImportError – If the python-magic library is not imported.

opencf_core.mimes.guess_mime_type_from_file(file_path)

Guesses the MIME type from the file path.

Parameters:

file_path (str) – The path to the file.

Returns:

The guessed MIME type.

Return type:

str

Module contents