Package kodexa

Kodexa is a Python framework to enable flexible data engineering with semi-structured and unstructured documents and data.

Kodexa allows you to interact with:

  • Documents
    • Content and feature rich containers for semi-structured and unstructured contents
  • Pipelines
    • Link together steps to build processing pipelines promoting re-use
  • Stores
    • Store documents with relationships in families
  • Assistants
    • Rich platform agents that are able to react to changes in stores and outside the platform
Expand source code
Kodexa is a Python framework to enable flexible data engineering with semi-structured and unstructured documents and

.. include:: ./
from .assistant import Assistant, AssistantContext, AssistantResponse
from .connectors import FileHandleConnector, FolderConnector, UrlConnector, add_connector, get_connector, \
    get_connectors, get_source, registered_connectors
from .model import ContentEvent, ContentFeature, ContentNode, Document, DocumentActor, DocumentFamily, DocumentMetadata, \
    DocumentStore, DocumentTransition, SourceMetadata, TransitionType
from .model.objects import Taxonomy
from .pipeline import Pipeline, PipelineContext, PipelineStatistics
from .platform import KodexaPlatform, RemoteStep, RemotePipeline, RemoteSession, KodexaClient
from .steps import NodeTagCopy, NodeTagger, RollupTransformer, TagsToKeyValuePairExtractor, TextParser, \
from .stores import LocalDocumentStore, LocalModelStore, RemoteDocumentStore, \
    RemoteModelStore, RemoteDataStore, TableDataStore



Support for setting up and defining assistants that you can use in Kodexa


The Kodexa Command-Line Interface


Connectors provide a way to access document (files or otherwise) from a source, and they form the starting point for Pipelines


Mix-ins are an effective way to add helper functionality to Documents and ContentNode's based on the underlying features.


Model represents the core model at the heart of the Kodexa Content Model and architecture …


A Pipeline is a way to bring together a Connector, set of steps and then a sink to perform data cleansing, normalization, analysis and more.


Out of the box integration with the Kodexa platform


Selectors allow you to work with a Kodexa document to find content


Common and reusable steps


Stores are persistence components for Documents. Typically, they can act as either a Connector or a Sink


Utilities to help support unit testing and test harnesses for Kodexa

Utilities for training actions using Kodexa