Package kodexa

Kodexa is a Python framework to enable flexible data engineering with semi-structured and unstructured documents and data.

Kodexa allows you to interact with:

  • Documents
    • Content and feature rich containers for semi-structured and unstructured contents
  • Pipelines
    • Link together steps to build processing pipelines promoting re-use
  • Stores
    • Store documents with relationships in families
  • Assistants
    • Rich platform agents that are able to react to changes in stores and outside the platform
Expand source code
"""
Kodexa is a Python framework to enable flexible data engineering with semi-structured and unstructured documents and
data.


.. include:: ./documentation.md
"""
from .assistant import Assistant, AssistantContext, AssistantResponse
from .cloud import KodexaPlatform, RemoteAction, RemotePipeline, RemoteSession
from .connectors import FileHandleConnector, FolderConnector, UrlConnector, add_connector, get_connector, \
    get_connectors, get_source, registered_connectors
from .model import ContentEvent, ContentFeature, ContentNode, Document, DocumentActor, DocumentFamily, DocumentMetadata, \
    DocumentStore, DocumentTransition, RemoteStore, SourceMetadata, TransitionType
from .pipeline import Pipeline, PipelineContext, PipelineStatistics
from .sinks import FolderSink, InMemoryDocumentSink
from .steps import NodeTagCopy, NodeTagger, RollupTransformer, TagsToKeyValuePairExtractor, TextParser, \
    KodexaProcessingException
from .stores import DataStoreHelper, LocalDocumentStore, LocalModelStore, RemoteDocumentStore, \
    RemoteModelStore, RemoteTableDataStore, TableDataStore
from .taxonomy import RemoteTaxonomy, Taxon, Taxonomy

Sub-modules

kodexa.assistant

Support for setting up and defining assistants that you can use in Kodexa

kodexa.cli

The Kodexa Command-Line Interface

kodexa.cloud

Out of the box integration with the Kodexa platform, enabling the universe of content services that are available

kodexa.connectors

Connectors provide a way to access document (files or otherwise) from a source, and they form the starting point for Pipelines

kodexa.mixins

Mix-ins are an effective way to add helper functionality to Documents and ContentNode's based on the underlying features.

kodexa.model

Model represents the core model at the heart of the Kodexa Content Model and architecture …

kodexa.pipeline

A Pipeline is a way to bring together a Connector, set of steps and then a sink to perform data cleansing, normalization, analysis and more.

kodexa.selectors

Selectors allow you to work with a Kodexa document to find content

kodexa.sinks

Sinks are the end-point of a Pipeline and allow for the final output of the pipeline to be either stored or written out

kodexa.steps

Common and reusable steps

kodexa.stores

Stores are persistence components for Documents. Typically, they can act as either a Connector or a Sink

kodexa.taxonomy

Support for setting up and defining a taxonomy

kodexa.testing

Utilities to help support unit testing and test harnesses for Kodexa

kodexa.training

Utilities for training actions using Kodexa