Package kodexa

Kodexa is a Python framework to enable flexible data engineering with semi-structured and unstructured documents and data.

Kodexa allows you to interact with:

  • Documents
    • Content and feature rich containers for semi-structured and unstructured contents
  • Pipelines
    • Link together steps to build processing pipelines promoting re-use
  • Stores
    • Store documents with relationships in families
  • Assistants
    • Rich platform agents that are able to react to changes in stores and outside the platform
Expand source code
Kodexa is a Python framework to enable flexible data engineering with semi-structured and unstructured documents and

.. include:: ./
from .assistant import Assistant, AssistantContext, AssistantResponse
from .cloud import KodexaPlatform, RemoteAction, RemotePipeline, RemoteSession
from .connectors import FileHandleConnector, FolderConnector, UrlConnector, add_connector, get_connector, \
    get_connectors, get_source, registered_connectors
from .model import ContentEvent, ContentFeature, ContentNode, Document, DocumentActor, DocumentFamily, DocumentMetadata, \
    DocumentStore, DocumentTransition, RemoteStore, SourceMetadata, TransitionType
from .pipeline import Pipeline, PipelineContext, PipelineStatistics
from .sinks import FolderSink, InMemoryDocumentSink
from .steps import NodeTagCopy, NodeTagger, RollupTransformer, TagsToKeyValuePairExtractor, TextParser, \
from .stores import DataStoreHelper, LocalDocumentStore, LocalModelStore, RemoteDocumentStore, \
    RemoteModelStore, RemoteTableDataStore, TableDataStore
from .taxonomy import RemoteTaxonomy, Taxon, Taxonomy



Support for setting up and defining assistants that you can use in Kodexa


The Kodexa Command-Line Interface

Out of the box integration with the Kodexa platform, enabling the universe of content services that are available


Connectors provide a way to access document (files or otherwise) from a source, and they form the starting point for Pipelines


Mix-ins are an effective way to add helper functionality to Documents and ContentNode's based on the underlying features.


Model represents the core model at the heart of the Kodexa Content Model and architecture …


A Pipeline is a way to bring together a Connector, set of steps and then a sink to perform data cleansing, normalization, analysis and more.


Selectors allow you to work with a Kodexa document to find content


Sinks are the end-point of a Pipeline and allow for the final output of the pipeline to be either stored or written out


Common and reusable steps


Stores are persistence components for Documents. Typically, they can act as either a Connector or a Sink


Support for setting up and defining a taxonomy


Utilities to help support unit testing and test harnesses for Kodexa

Utilities for training actions using Kodexa