Package kodexa
Kodexa is a Python framework to enable flexible data engineering with semi-structured and unstructured documents and data.
Kodexa allows you to interact with:
- Documents
- Content and feature rich containers for semi-structured and unstructured contents
- Pipelines
- Link together steps to build processing pipelines promoting re-use
- Stores
- Store documents with relationships in families
- Assistants
- Rich platform agents that are able to react to changes in stores and outside the platform
Expand source code
"""
Kodexa is a Python framework to enable flexible data engineering with semi-structured and unstructured documents and
data.
.. include:: ./documentation.md
"""
from .assistant import Assistant, AssistantContext, AssistantResponse
from .connectors import FileHandleConnector, FolderConnector, UrlConnector, add_connector, get_connector, \
get_connectors, get_source, registered_connectors
from .model import ContentEvent, ContentFeature, ContentNode, Document, DocumentActor, DocumentFamily, DocumentMetadata, \
DocumentStore, DocumentTransition, SourceMetadata, TransitionType
from .model.objects import Taxonomy
from .pipeline import Pipeline, PipelineContext, PipelineStatistics
from .platform import KodexaPlatform, RemoteStep, RemotePipeline, RemoteSession, KodexaClient
from .steps import NodeTagCopy, NodeTagger, RollupTransformer, TagsToKeyValuePairExtractor, TextParser, \
KodexaProcessingException
from .stores import LocalDocumentStore, LocalModelStore, RemoteDocumentStore, \
RemoteModelStore, RemoteDataStore, TableDataStore
Sub-modules
kodexa.assistant
-
Support for setting up and defining assistants that you can use in Kodexa
kodexa.cli
-
The Kodexa Command-Line Interface
kodexa.connectors
-
Connectors provide a way to access document (files or otherwise) from a source, and they form the starting point for Pipelines
kodexa.mixins
-
Mix-ins are an effective way to add helper functionality to Documents and ContentNode's based on the underlying features.
kodexa.model
-
Model represents the core model at the heart of the Kodexa Content Model and architecture …
kodexa.pipeline
-
A Pipeline is a way to bring together a Connector, set of steps and then a sink to perform data cleansing, normalization, analysis and more.
kodexa.platform
-
Out of the box integration with the Kodexa platform
kodexa.selectors
-
Selectors allow you to work with a Kodexa document to find content
kodexa.steps
-
Common and reusable steps
kodexa.stores
-
Stores are persistence components for Documents. Typically, they can act as either a Connector or a Sink
kodexa.testing
-
Utilities to help support unit testing and test harnesses for Kodexa
kodexa.training
-
Utilities for training actions using Kodexa