Kodexa is a Python framework to enable flexible data engineering with semi-structured and unstructured documents and data.
Kodexa allows you to interact with:
- Content and feature rich containers for semi-structured and unstructured contents
- Link together steps to build processing pipelines promoting re-use
- Store documents with relationships in families
- Rich platform agents that are able to react to changes in stores and outside the platform
Expand source code
""" Kodexa is a Python framework to enable flexible data engineering with semi-structured and unstructured documents and data. .. include:: ./documentation.md """ from .assistant import Assistant, AssistantContext, AssistantResponse from .cloud import KodexaPlatform, RemoteAction, RemotePipeline, RemoteSession from .connectors import FileHandleConnector, FolderConnector, UrlConnector, add_connector, get_connector, \ get_connectors, get_source, registered_connectors from .model import ContentEvent, ContentFeature, ContentNode, Document, DocumentActor, DocumentFamily, DocumentMetadata, \ DocumentStore, DocumentTransition, RemoteStore, SourceMetadata, TransitionType from .pipeline import Pipeline, PipelineContext, PipelineStatistics from .sinks import FolderSink, InMemoryDocumentSink from .steps import NodeTagCopy, NodeTagger, RollupTransformer, TagsToKeyValuePairExtractor, TextParser, \ KodexaProcessingException from .stores import DataStoreHelper, LocalDocumentStore, LocalModelStore, RemoteDocumentStore, \ RemoteModelStore, RemoteTableDataStore, TableDataStore from .taxonomy import RemoteTaxonomy, Taxon, Taxonomy
Support for setting up and defining assistants that you can use in Kodexa
The Kodexa Command-Line Interface
Out of the box integration with the Kodexa platform, enabling the universe of content services that are available
Connectors provide a way to access document (files or otherwise) from a source, and they form the starting point for Pipelines
Mix-ins are an effective way to add helper functionality to Documents and ContentNode's based on the underlying features.
Model represents the core model at the heart of the Kodexa Content Model and architecture …
A Pipeline is a way to bring together a Connector, set of steps and then a sink to perform data cleansing, normalization, analysis and more.
Selectors allow you to work with a Kodexa document to find content
Sinks are the end-point of a Pipeline and allow for the final output of the pipeline to be either stored or written out
Common and reusable steps
Stores are persistence components for Documents. Typically, they can act as either a Connector or a Sink
Support for setting up and defining a taxonomy
Utilities to help support unit testing and test harnesses for Kodexa
Utilities for training actions using Kodexa