The Cubicon Platform
Enabling a Semantic Net Environment

Cubicon Semantic Net environment provides the following infrastructure capabilities to manage the complexities for a universal medium to facilitate data, information and knowledge exchange:

Semantic Engine Enables Transparent Computing
Knowledge Maps Support Machine-to-machine Context Processing
Software Transactional Memory Empowers multi/many-core Processors
Deep Packet Inspection Permits Intelligent Routing

These advanced capabilities will be difficult to achieve using current programming languages, databases, as well as XML and related W3C substrate technologies. These legacy technologies lack the performance, productivity, interoperability, agility, robustness and security characteristics to support a Semantic Net environment.

The following sections explain the requirement for each capability, its value to an advanced platform and a high-level technical description of the Cubicon platform architecture.




 Semantic Engine (SE) Enables Transparent Computing

Transparency Requirement

A Semantic Net environment needs to enable a developer to create sophisticated programs without having to be concerned with physical execution memory (heap space) in a secure manner. Current Java (JVM) and Microsoft (.NET) virtual machines perform these functions, but have inherent performance limitations forcing a developer to workaround the garbage collector (GC) event when the VM is applied in a high performance environment. This problem exists at both ends of the network spectrum: router and embedded devices. A router needs to perform Deep Packet Inspection (DPI) at wire speed and many embedded environments are real time in nature requiring systems sampling rates far faster than current VMs can poll and respond (i.e. engine monitoring, fly-by-wire aircraft). A hard code workaround also wreaks havoc on interoperability destroying any resemblance to an intrinsically secure computational model.

Garbage Collection Explanation

The name 'garbage collection' implies that objects no longer needed by the program are 'garbage' and can be thrown away. A more accurate and current metaphor might be 'memory recycling'. When a program no longer references an object, the heap space it occupies can be recycled so that space is made available for subsequent new objects. The GC must somehow determine which objects are no longer referenced by the program and make available the heap space occupied by such unreferenced objects. In the process of freeing unreferenced objects, current commercial GCs must run a finalizer on objects being freed.

In addition to freeing unreferenced objects, a GC must also combat heap fragmentation. This occurs during the course of normal program execution. New objects are allocated, and unreferenced objects are freed so that available portions of heap memory are left in between positions occupied by live objects. Requests to allocate new objects may have to be filled by extending the size of the heap even though there is enough total unused space in the existing heap. This will happen if there is not enough contiguous free heap space available into which the new object will fit. On a virtual memory system, the extra paging (or swapping) required to service an ever growing heap could degrade the performance of the executing program. On an embedded system with small memory, fragmentation could cause a virtual machine to run out of memory unnecessarily.

Garbage Collection Algorithms

Any GC algorithm must do two basic things. First, it must detect garbage objects. Second, it must reclaim the heap space used by the released objects and make that space available again to the program. A GC ordinarily accomplishes defining a set of root objects and determining reachability from them. An object is reachable if there is some path of references from the roots by which the executing program can access the object. The roots are always accessible to the program. Any objects that are reachable from the roots are considered 'live'. Objects that are not reachable are considered garbage, because they can no longer affect the future course of program execution.

Commercial algorithms use a variety of GC approaches in an attempt to obtain transparency in the widest applicable set of target devices. The current approaches to distinguishing live objects from garbage are called reference counting and tracing. In addition, the science is overloaded with additional concepts such as mark and sweep, finalization, defragmentation, compaction, and copying/generational/adaptor collectors. GC research remains wide open since no current commercial implementation demonstrates the performance to meet the transparency requirements for a Semantic Net environment. A GC event in a JVM or .NET virtual machine takes over 20 milliseconds before control is returned back to an application.

CubeRun Semantic Engine Approach

A CubeRun garbage collection event is 1000x more efficient than JVM or .NET thus enabling effective utilization in router and embedded real time devices. This high performance is achieved through five fundamental and proprietary ways:

1) Context vs. object processing. The fundamental processing discourse in Cubicon is a context compared to an object. This higher order discourse provides additional metadata to the GC enabling it to perform object tracking, compaction and virtualization in a dynamic, automated fashion over a wide spectrum of target devices.

Context Processing Model
Memory Manager Function Call Tree

2) CoreObject component model. All processing within a CubeRun is performed by a finite set of coreObjects. This regularity between the language and execution engine eliminates much of the complexity found in contemporary language environments and greatly increasing algorithm efficiency. The following schema depicts the call relationships between functions within the Memory Manager module.

3) Cubicon iconic reference architecture. This reference architecture is a bit and pointer defined blueprint of the entire Memory Manager module. Iconic representation makes it possible to see and fine tune the algorithm in full living color prior to C coding.

 
Apportion Heap C Source Code
Apportion Heap Cubicon Design
 
2D Memory Heap Display

4) Innovative algorithm. CubeRun uses a novel heap compacting technique that is generational, highly adaptive and incremental. Bench test monitoring indicates a nominal GC event takes less than 1 microsecond.

 
Memory Manager Dashboard

5) CubeStudio visual memory model. This 'dashboard' enables the GC parameters to be easily fine-tuned for a particular target environment from feedback provided by a visual characterization (2D and 3D) models.


 Knowledge Maps (KM) Support
machine-to-machine Context Processing

Requirement

"The semantic web is an evolving extension of the World Wide Web in which content is expressed not only in natural language, but also in a format that can be read and used by software agents, permitting them to locate, share and integrate information more easily. It reinforces W3C Director Sir Tim Berners-Lee's vision of the Web as a universal medium for data, information and knowledge exchange.

"At its core, the semantic web comprises a philosophy, a set of design principles, collaborative working groups and a variety of enabling technologies. Some elements of the semantic web are expressed as prospective future possibilities that have yet to be implemented or realized. Other elements of the semantic web are expressed in formal specifications. Some of these include Resource Description Framework (RDF), a variety of data interchange formats (e.g. RDF/XML, N3, Turtle, N-Triples), notations such as RDF Schema (RDFS) and the Web Ontology Language (OWL), all of which are intended to provide a formal description of concepts, terms and relationships within a given knowledge domain.

"Humans are capable of using the Web to carry out tasks such as finding the Finnish word for car, to reserve a library book, or to search for the cheapest DVD and buy it. However, a computer cannot accomplish the same tasks without human direction because web pages are designed to be read by people, not machines. The semantic web is a vision of information that is understandable by computers, so that they can perform more of the tedious works involved in finding, sharing and combining information on the web."

An Emerging Knowledge Science

Knowledge Science is the discipline of understanding the mechanics through which humans and software-based machines know, learn, change and adapt their own behaviors. Throughout recorded history, knowledge has been made explicit through symbols, text and graphics on media such as clay, stone, papyrus, paper and most recently, as digitally stored representations. The digital effort began in the early 1970's when knowledge science was recognized as a vigorous field of study beginning with the development of natural language learning programs funded by the National Science Foundation (NSF). Now, knowledge science experts are engaged in a debate between:

Knowledge science and knowledge representations encompass philosophical, epistemological and ontological considerations. This article presents an overview of this new field of study, relying on historical data to provide insight and understanding, while it addresses the two schools of knowledge science."

The first language-based school is driven by the W3C and the use of HTML and accompanying HTTP protocol. The second value-based theory-based semantics school was developed by Richard Ballard and utilizes a single relational database table with four fields designed to store unique identifier codes that are pre-defined, stored and then later looked-up to populate the value-model/object/attribute/value table fields. This original technology was called the Mark 2.

The language-based knowledge representation has progressed from RDF in 1999 to DAML and now the Web Ontology Language (OWL-S) and other current formats. Researchers are now working on a numerical (value-based) representation to define a language-independent structured knowledge exchange.

Ballard's value-based technology has been refined in the Mark 3 implementation towards a goal where a "software-based machine can faithfully represent every form of knowledge and reason with that knowledge the way people do." (Ballard)

Basic Topic Map

Topic Maps. Topic Maps is an ISO standard for the representation and interchange of knowledge, with an emphasis on the 'find-ability' of information. It combines concepts from the two knowledge representation schools while introducing several of its own.

A topic map can represent information using topics (representing any concept, from people, countries and organizations to software modules, individual files and events), associations (which represent the relationships between them) and occurrences (which represent relationships between topics and information resources relevant to them). They are thus similar to semantic networks and mind maps in many respects. In loose usage all those concepts are often used synonymously, though only topic maps are standardized.

Definition
Use cases

 

Cubicon Topic Map Approach

Cubicon's approach is value-based cast in a topic model. This advanced knowledge representation model has the following distinguishing features:

Representation of both particular and universal knowledge. Cubicon can represent particular information and universal theory as well as interrelate between both fundamental knowledge forms. Particular information represents anything that exists in time and space that can be processed by the senses, measured or counted. Universal theory is the coherent general propositions that comprise rational constraints that justify the relationships of concepts, ideas, thought patterns and their instances necessary to convey meaning.

Knowledge Forms

An entity (a human or body (company, organization or government)) develops and maintains system components based upon its context within a community of practice. Cubicon is based on a converged structural and behavioral model that brings context and coherence to the systems world. Context incorporates conventional software object-oriented semantics, however its model deals more precisely with information through higher order cognitive constructs. Context in Cubicon is declared in four forms: Module, Plexus, Service and Topic. These forms share the identical composite data structure architecture yet play distinct system roles. The first three forms represent particular information:

A Module is a grove of objects and cells that represent something specific (e.g. an employee record in a database or an expression in a spreadsheet).

A Plexus is either logic/number-oriented (medium) or string-oriented (document) and is represented as an attribute of a Module object or Service (e.g. a JPEG, MPEG or a PDF, Word doc, HTML page).

A Service maps transaction data between heterogeneous systems controlled by different entities (e.g. financial transaction over the Net).

The Topic context form is unique in the respect that resource data is related to a universal theory. A topic represents a specific subject. The relationship between a topic and its subject is defined as reification. Reification of a subject allows internal resource data to be assigned and further defining the subject for a given topic. Occurrence resources provide means to access external data and information related to a topic.

Concentric model. A Topic Map automatically organizes around a center study topic. Any topic that has a direct association with the study topic appears in the top focus group circle (dark blue). Additional degrees of separation appear in subsequent clockwise groups. Selection of another topic moves it to the center and refreshes the map to depict its particular topic associations. There will never be a map 'top'. Dynamic viewing provides a Cubist the flexibility to reason about subjects and their characteristics from any topic perspective.

Cubicon Topic Map

Net navigation augments Web search. The Web requires a person to reflect upon a page and search by performing a click to a linked HTML page. A page contains no semantic information that would allow a machine to self-determine a search purpose, relevance or direction. A Cubicon Topic Map is comprised of semantic-associated concepts that are navigated by a CubeRun semantic engine. Language text search is replaced by topic value-linked navigation. A Topic Map may consist of thousands of concepts (topics that are reified subjects) possibly linked to any number of other Topic Maps controlled by other entities. This myriad of interconnections evolves as an overlay on top of the Web and becomes the Semantic Net foundation.

An alternative viewpoint to this replacement is to consider the Web as a grid of interconnected HTML pages. Conversely, the Semantic Net will consist of a grid of interconnected 'topics' either general or specific. Each topic refers to resources about the same subject spread over many nodes around the Net.

Net Navigation Augments Web Search
 
Resources Occur Around the Net

Resources occur around the Net. Besides the traditional URI resources, a topic can also extend through three other external reference forms that are native to Cubicon. They consist of document, medium and service occurrences. External resources are particular data that are status-based and change over time. By contrast, internal resources are occurrences of discourse maps and universal data declarations of theory that are concept-based. Cubicon refers to this linked resource data as 'hyperdata', an extension of the original 'hypertext' term.

 
Topic Frame

An efficient means to capture domain knowledge. A domain expert utilizing CubeStudio primarily through the topic frame, declares a map. A topic participates in relationships called 'associations', in which they play both client and server roles. A topic can also belong to one or more aspects. Thus, topics have three kinds of characteristics; resource occurrences, roles played as associations, and aspects where they belong. A topic along with its characteristics can also be declared by program behavior enabling the Semantic Net to evolve facilitated by reflective machine reasoning.

 
Topic Map Schema

Value-based concept encoding. A Topic Map is composed of a number of parts that are binary encoded for efficient representation and native CubeRun processing. Integration with the virtual machine enables situation service agents to navigate vast knowledge spaces at the 'speed of thought'.

 
Global Term/Name Space Identification

Global term/name space identification. Cubicon's multi-dimensional space identification technique provides universal discourse across countries and cultures by capturing the theory within concepts and information within component contents. This identification space is represented in globally unique concatenated integer values. Each identifier may be linked to multiple concept terms or component names. A concept may have multiple term synonyms. A component and its subcomponents may have multiple community natural language declarations. In addition, an entity may declare a component dialect name for use within their body.

Situation Map

Situation agents drive machine-driven reasoning. A situation map conveys knowledge about a set of topic maps by reasoning about their topic characteristics and content resources hosted by both native and alien entity repositories. A map supports a situation service, an agent behavior that navigates between topics to perform this inference.

A situation service is presented with predicate data called a premise. A premise consists of composite structures that are shared between topics. A situation service executing within a CubeRun analyzes both a topic's relationships with other topics as well as related occurrences of content resources. Based on this analysis, it will then eventually return to its origin device. Once home, 1) a calling service can consume contents of the suitcase, 2) return contents back to user through the interface, or 3) browse forward to another Web page.

 

There are two service forms: situation services and dialog services.

Situation Service

A dialog service is an exchange between two entities in operations that usually perform a transaction process consisting of two or more individual back and forth steps over a period of time. Dialog services may be nested to perform subcontracting tasks.

A situation service may call either service form. A situation may be called by another situation and therefore infinitely nested as one service leverages the knowledge of another.

A situation service creates a scenario that navigates from topic-to-topic and map-to-map. These dynamic scenarios are archived and can be displayed to enable a Cubist to understand system behavior more effectively.

 
Situation Map Dynamic Scenario

Net navigation provides knowledge insight. Cubicon evolves the Internet from a Web browser to a Net navigation experience transforming information into knowledge. A Net Navigator map displays knowledge derived by accessing linked topics through situation service behaviors. These maps are declared in Cubicon and directly enable a domain expert to understand decision choices and weigh possible outcomes.

Development of Concept Genealogy as well as Topic, Situation and Net Navigator maps are context components that will employ the next generation of Semantic Net Designers and return employment to domestic firms where domain expertise will trump outsourced general-purpose labor.

Knowledge Maps

Knowledge metering provide mechanism to create economic incentive. A fundamental business issue is to create an eco where communities and entities have an incentive to invest their human capital to create knowledge maps. Cubicon provides a Topic Map producer the capability to charge a consumer for knowledge on a per situation service access basis. A consumer request for access to a producer's Topic Map may be through a Net Navigator or program-driven through behavior declared within a system module (grove).

Knowledge Metering

 Software Transactional Memory (STM)
Empowers multi/many-core Processors

Requirement

The near-term future of processor technology is multi/many-core, multi-thread, and parallelism. However, programmability remains a key barrier to exploit these emerging environments and realize their potential in the marketplace.

Numerous roadblocks stand in the way of proposed higher-level packages along with the many language extensions created to support concurrency. The burden of overcoming the complexity of context/synchronization analysis, exceptional handling, performance monitoring and tuning, not to mention the debugging of parallel networks, severely limits adoption and optimal utilization of multi/many core processors to only an elite field of expert programmers.

These advanced processors face a serious programmability problem. Concurrency quickly overwhelms humans. Almost universally we find it much more difficult to reason about concurrent code over sequential code. Even the most skilled programmers can miss possible interleaving opportunities among simple collections of partially ordered operations. Manual parallel programming is error-prone.

The solution? As many have said before, optimal utilization of multi/many-core processors requires effective automation techniques that can go well beyond current language extensions. Automation will enable mainstream software developers, as well as users, to easily adopt new techniques.

Cubicon STM Approach

Cubicon comprehensively addresses parallel programming challenges facing the microprocessor industry through industrial application of STM technology. STM provides concurrency control among individual thread tasks as well as transaction commit and rollback sequencing on multicast content across the Internet.

Soft-lock Transaction

Soft-lock transaction eliminates deadlock. A transaction is a sequence of tasks executed by a single thread through a closure mechanism. Transactions are atomic: each transaction either commits (it takes effect) or aborts (its effects are discarded). Transactions are linearizable (or serializable): they appear to take effect in a one-at-a-time order.

Soft-lock transaction facilitates fine-grained synchronization. There is no need to maintain tracking of what locks protect which structures, and no need for elaborate deadlock-avoidance protocols. A soft-lock transaction functions as an alternative to lock-based synchronization and is implemented in a lock-free way. As a transaction closure executes a series of reads and writes to a shared memory, the reads and writes logically occur at a single instant in time. Intermediate states are not visible to other (successful) closures.

Optimistic parallel execution provides transparent programming model. Unlike locking techniques used in most modern multi-threaded applications, Cubicon is optimistic. Every thread completes its modifications to a shared memory without regard to what other threads might be doing. It records every read and write made to a structure within a closure.

Optimistic Parallel Execution

A closure operates within a session and includes a buffer that contains a copy of those object and cell structures controlled by a transaction. Instead of placing the onus on the writer to make sure a specific transaction does not adversely affect other operations in progress, it is placed on the reader. After completing an entire transaction, the reader verifies that other threads have not concurrently made changes to memory that the transaction of interest has accessed in the past.

In the final operation called a commit, changes of a transaction are validated and if this confirmation is successful it will be deemed permanent. A commit writes a buffer back to a shared memory. A transaction may also abort at any time, causing all prior closure changes to be rolled back or undone. If a transaction cannot be committed due to conflicting changes, it is typically aborted and re-executed from the beginning until it succeeds.

 
Module Processing by Core Pool

Immense performance gain on large number of cores. The benefit of this optimistic approach is increased concurrency. No thread needs to wait for access to a resource, and different threads can safely and simultaneously modify disjointed parts of a shared memory module (grove) that would normally be protected under the same soft-lock. Despite overhead of retrying transactions that fail, in most realistic applications, conflicts arise rarely enough that there is an immense performance gain over lock-based protocols. This is especially true when using large numbers of processor cores. Module processing is distributed among a number of core pools upon initialization of a node.

 
STM Memory Manager Integration

STM mechanism integrates with memory manager. A closure performance hit relative to fine-grained lock-based systems is minor. The closure mechanism is an integral component of CubeRun and uses memory pointers for copy operations. This solution easily scales since the mechanism sits directly above the Memory Manager that oversees garbage collection, compaction and virtualization of structures.

 
Alien Entity Transaction

Alien entity transaction sessions not compromised. The closure mechanism works over TCP/IP. This means that a set of remote sessions can now perform transactions with an origin grove in a transparent manner. Structure access control is provided with an exchange set mechanism providing the ability to establish read and write privileges, thus allowing an alien entity to interact with grove structures as well as their own individual attributes.

 

Intent of Ajax is fully automated in Net browser model. A closure is also used for creating interactive web applications much like the Ajax (Asynchronous JavaScript and XML) model. Ajax allows exchanging small amounts of data with a server behind the scene, so that an entire Net page does not have to be reloaded each time the user makes a change. A commit action only returns modified data and automatically updates other closures that share a particular structure.

Net Browser Model

Net browser enables simple responsive behavior. A net browser replaces traditional application window and web browser mechanisms with an integrated architecture that enables a client's page to interact with a server as though it is part of the native remote application.

Net Browser Core Components

This window/browser fusion brings a common user experience regardless of the client/server relationship. This visual architecture uses a window closure composed of core components. Transaction data interaction remains uniform all the way down to core processing.


 Deep Packet Inspection (DPI) permits intelligent routing

Requirement

Internet Protocol (IP) routing is the foundation of the Internet and considered one of the most important technologies of the past 20 years. IP routing technology forwards packets of data to the appropriate network destinations, creating the current most efficient use of aggregate network bandwidth.

As new uses for the Internet are created, the need for a 'smart' network becomes even more critical. Advanced capabilities will need to provide the intelligence on the network to offer a foundation for delivering robust services such as voice and video, quality of service, security and network-aware applications. These network services are helping to drive the growth of the Internet through the creation of new applications such as real-time trading, interactive support, on-demand media, multi-site communications and unified messaging. To stay competitive, more and more businesses are using sophisticated Internet applications that will rely on these types of advanced capabilities.

Three generation of routers:

DPI is a fundamental process required to effectively utilize third-generation capabilities. It will provide the ability to access and analyze structured content as it passes through a router at wire speed. However, this process is currently weighted down by document parsing and virus screening tasks that require many machine cycles to perform. Efficiency can be significantly improved by allocating these tasks to acceleration hardware. Offering effective DPI is just the bottom tier of a tall stack of functionality required to enable future communication services architecture.

VAN Facilitates Service Deployment and Management

Virtual Active Networking (VAN) is a term used when referring to the advanced capabilities created by augmenting DPI. VAN needs to allow customized processing on packets in a stream. Packet processing can be applied to application-aware routing, information caching, multi-user communications as well as packet filtering. While a VAN may be owned and operated by a network provider, it allows its users to install and execute services on the network that may be simultaneously shared by many users. The primary benefit of this service automation is its ability to facilitate rapid deployment and provide flexible management.

Virtual Active Network (VAN)

Requirements for a VAN. Four fundamental concepts are required to unite in order to realize network-aware services:

* along with network switch, proxy server, soft switch, SAN or other core network device, as differentiated from edge devices.

VAN Design Environment Requirements. A VAN requires a design environment with the following characteristics, to be used by both the network provider and its users:

 

Comparison to a Virtual Private Network (VPN). The VAN captures the functionality and resources that a provider offers to a user. In the same way that an active network can be understood as a generalization of a traditional network, a VAN can be seen as a generalization of a traditional Virtual Private Network (VPN). Similar to a traditional VPN, a VAN can be used to run network services using a provider's physical infrastructure.

In contrast to a traditional VPN, however, a VAN gives a user a much higher degree of service flexibility and control. A VAN is a very 'natural' abstraction for users and allows them to create management functionality and oversee their services at run-time without interaction with the provider's management system - a capability that is very difficult and costly to achieve in current router environments. Furthermore, a VAN supports a mechanism to isolate users from one another in order to avoid interference between their services. This mechanism should reside inside each active network node, thus enabling the provision of resource partitioning and the capability of policing consumption along VAN boundaries.

Stateful Inspection

The concept of using a firewall to provide security functions is growing. Originally, only the packet's header would be examined, perhaps to exclude traffic from suspected bad actors. It is now important to examine the payload as well, such as to detect virus-like content for example. This DPI capability, now mostly used for intrusion detection, is a prerequisite for the performance of future network-aware services. The network routing device performing DPI must look into the payload and then make decisions based on the significance of that data.

A firewall performing stateful inspection, by contrast, analyzes packets at the network layers, as well as overseeing higher layers, in order to assess the overall packet. By combining information from various layers (transport, session and network) the firewall now is better able to understand the content it is inspecting. This advanced level of review enables establishment of virtual sessions in order to track connectionless protocols such as UDP- and RPC-based applications.

Emerging Need for Stateful Inspection Technology. Modern application services require firewalls that have the capability to gain a more intimate knowledge of the application payload. Emerging applications utilizing eXtensible Markup Language (XML) and Simple Object Access Protocol (SOAP) require the firewall to monitor the content within the packets at wire-speed. Additionally, services that can change their communication ports (perhaps in order to spoof outbound filtering) or those that tunnel within commonly allowed ports (such as TCP port 80) must be monitored in order to maintain security within a network. In order to meet these new demands, firewall stateful inspection technology must further evolve.

Traditional stateful inspection dictates that a router maintains packet header knowledge about a particular stream until a session ends. This capability must be extended into packet content as well, even to allow for the state of the data to be modified by a network-aware service.

Requirements for Advanced Intrusion Detection and Prevention. In order for the firewall to successfully provide intrusion detection and prevention, it must have DPI capabilities including:

Additionally, the firewall must be able to provide for wire-speed Secure Socket Layer (SSL) session inspection and filtering. This will require the ability to decrypt an SSL session and then re-establish it once the packets have been inspected.

Intrusion Detection Technologies Evolving to DPI. The need for DPI capability in firewalls stems from pervasive data-driven attacks that effect many network sectors. DPI represents the integration of stateful inspection alongside traditional intrusion detection and prevention capabilities. Current intrusion detection technology (IDT), while able to detect these attacks, provides very little preemptive capability.

A worm can infect a significant number of systems within a relatively short period of time. NIMDA's multi-vector infection routes posed serious difficulties for IDT in particular. While IDT provides some relief from each of these attacks, moving the detection and response directly into the firewalls using DPI allows for immediate termination of the intrusion by cutting the line of communication at a predetermined network demarcation point.

DPI Engine Requirements. A DPI engine will perform a combination of signature-matching and heuristic analysis of data in order to determine the impact on the communications stream. DPI-capable firewalls need to maintain the state of the underlying network connection, in addition to the integrity of the application utilizing that communication channel. Additionally, it must also perform statistical or anomaly analysis.

The engine must consistently process packets at 'wire speed', a rate comparable to today's routers and switches. While the concepts of DPI may appear simplistic, it is not easy to achieve using current practices. Hardware acceleration utilizing modern network processing units (NPUs) may be beneficial in providing fast discrimination of content within packets.

XML and Web Services Pose Challenges. XML-based Web services present challenges to the realization of VAN capabilities. XML documents are verbose, particularly when composed in Unicode. It will be necessary to marshal, compress, encrypt, decrypt, decompress and parse XML documents before deep packet inspection can occur. This imposes serious processing demands on any current network router that is applying DPI.

This is presently being 'solved' by brute force employing specialized proxy server XML acceleration hardware that processes the traffic quickly, but off-line.

Another challenge is that the development of advanced services using current disparate XML toolsets are difficult and too poorly defined to be widely deployed on generic network router platforms.

Cubicon DPI Approach

Cubicon uses a fundamental context-based protocol to transfer content between Internet nodes and uses a trusted binary stream that is inherently secure by architecture. This architecture eliminates the need for XML acceleration hardware and enables CubeRun to perform content inspection without document parsing and screening. A user has access to Service streams through high levels of visual abstraction to enable fine-grain content filtering as well as traversal, edit and transfer operations on Service structures.

Deep Packet Inspection Processing

Cubicon remains interoperable with XML services. Cubicon works optimally when all of the nodes involved in the network traffic (edge and routing devices) operate a CubeRun. However, it is important to note that Cubicon can be incrementally installed to work alongside of conventional technologies (XML-based Web services, data bases and client/server applications, etc.) It does not require an either/or decision for initial deployment or the dismantling of existing technologies.

Context processing is key to secure interchange. Context is the semantics (or meanings of things) that enables correct processing of content. Context refers to something we know about the content. With the formalization of context, a computer can reason about things independently without human intervention.

Context instance exchange. In an XML document as it is currently formed; content may be domain data or a service being interchanged between nodes in a network. The 'context' for processing current XML content is embodied in its markup tags that must conform to a voluminous series of complex W3C specifications including XML Schema, Namespace, DOM, XPath, XPointer, XSLT, XML Query, WSDL, RDF and now OWL.

XML Exchange

Within conventional object-oriented processing, each node dynamically parses and analyzes documents by using a broad range of proprietary means. Markup tag parsing is a major source of current processing latency.

By contrast, Cubicon takes a different approach to context processing. Instead of embedding context within content, Cubicon handles it separately. In this new environment, context is a community resource that describes the common meaning (semantics) and agreed to by community members. Cubicon expresses content as instances of a context.

Context will be established once when a schema is defined or enters the operating environment. It is then applied to individual instances at runtime. This step eliminates the need to parse and marshal the binary instance data on native payload. This reforming of traffic into native context instances also relieves demands on physical processing and memory, as well as enables on-the-fly wire-speed routing based on the content state.

Context Instance Exchange

A transport protocol based upon a binary context instance is efficient for service extensions such as packet voice and content distribution. CubeRun is a highly unified network interaction engine that is optimized for processing of context instances. At the application protocol layer, the effect of the context approach will be similar to an XML accelerator, but without the hardware cost. A CubeRun engine needs only to be installed within a node similar to installing a Java virtual machine. A CubeRun drives each node and Community Repository discussed below.

General network nodes, including those containing a CubeRun are protected, as a first line of defense, by traditional network security measures such as firewalls, VPNs, SSL, PKI, etc. In addition, CubeRun provides an intrinsically secure execution space that prevents malicious code from being introduced into the processing of context instances. All operational behaviors (programs) are maintained jointly by trading partners in a repository available to any particular community and create the basis for this security. At run time, context is transported to a node (which may locally cache them) as core instructions. It is virtually impossible to attack and insert viral or worm code into a core instruction stream.

Context Sourced from Community Repository

Context sourced from Community Repository. Cubicon defines and maintains context resources in a Community Repository. A node (that is any Cubicon-enabled network device) 'shares' only the content and can perform 'proprietary' processing based upon semantics of particular contexts found in a set of Community Repositories. Community members are entities that control nodes. Community formalization of a context specification enables entities to interchange content using shared protocols, when necessary, retaining independent processing behaviors. A Community Repository provides numerous mechanisms that automate a protocol lifecycle process.

The Cubicon platform can interoperate with extant information systems by transforming any well-formed schema into a set of context resources. Any well-formed schema can be transformed. Alternatively, a Cubist through CubeStudio can declare native context resources.

A Community Repository is the single source for a context type. In order to create or process an instance, a node must request context resources from a Community Repository. A requesting node receives a skeleton, input manifest and facets. Conversely, a responding node receives an output manifest and facets. A node has the option to cache these context resources in order to process subsequent instances. Automatic purging can be elected if no instance processing requests are made within a specified period.

Content is transported between requesting and responding nodes. It consists of an instance of a context that is comprised of a skeleton with affixed content data.

CubeNet architecture provides intrinsic security. A context instance only contains data, and is free of metadata or operating behavior. It is always in a binary canonical (in simplest or standard) form that assures data integrity. The nested tree structure of the binary itself is even obscured from pattern recognition sniffing techniques. While this binary is inherently secure for most applications (eliminating the need for additional process-intensive steps of encoding and decoding) selected structured collections or only semi-structured strings within a context instance may be additional encrypted for security. The skeleton is much more lightweight than an XML document and displaces all markup tags. Numerical data remains in binary and text strings can be optionally compressed and/or encrypted. Alternatively, selected collections and even particular attributes may be excluded from a transmitted instance based upon a user privacy mask.

Cubicon security is inherently designed with flexibility, ease of use and industry standard practices in mind. The use of a proprietary obfuscation (to make obscure or unclear) technique increases data security significantly without the need to resort to process-intense encryption. Additionally, the constraints of possible system behaviors through core operations further prevent traditional intrusions through buffer overflows, data destruction and modification, and data injection attacks.

Access control is also strictly controlled by the use of distributed Community Repositories that are well known and understood context sources, thus reducing the need for complex credential co-ordination among multiple servers. This reduced complication translates into a more efficient, easier and economical system to validate and maintain credentials appropriately, subsequently removing a major problem currently found within large complex knowledge-sharing based systems.

Cubicon by its very nature is decentralized through its CubeNet topology of repositories. A Cubicon context can be sourced only from a Community Repository. This automatically guarantees the integrity and version of a program file to process a particular context instance. However, a Community Repository can be mirrored to provide alternative context servers to demanding nodes in case of network interruption or congestion.

Cubicon also effectively assures the integrity of dynamic content by knowing and controlling the insertion of executable binary files into a context instance. Binary files can be simply prohibited, or isolated for selective virus and worm filtering, without scanning the entire context instance.

Intrinsic core operating processing. Composite data structures can be traversed and manipulated by executing core operations. They can also copy or move an attribute value to another collection structure. Alternately, external system calls by a Java or C-based program can evoke innate manifest and process operations that perform the following advanced behaviors:

These mechanisms are made available to a Cubist through CubeStudio, displaying a high level of abstraction that require little training and orientation for effective use. For example, the following sample fulfillment process is declared as an expression that uses the result of four Boolean core operations as operands that determine how a Service instance will be routed.

Fullfillment Processing

CubeRun Service Processing Configurations

CubeRun can process both foreign XML and native content through several engine configurations as follows:

Transport/processing efficiency. Cubicon provides significant improvements in the efficiency of transporting and processing of complex services:

Transport/Processing Efficiency

Bi-lateral run-time transformation integrity. Cubicon maintains its integrity with the XML document infrastructure by transforming a binary context instance into text through CubeRun's serializier prior to transmitting. Due to the high degree of semantic representation, this XML output is of exceptional quality, created quickly and very well formed for external processing.

Bi-lateral Run-time Transformation Integrity -
Cubicon-to-XML

Cubicon may receive a XML document that must first be converted into binary through a marshaller. There are several marshaller technologies now available that Cubicon will support. A marshaller writes into CubeRun through a manifest port. A manifest is a list of application calls from and to a programming language. A Community Repository for each Service context automatically maintains this API.

Bi-lateral Run-time Transformation Integrity -
XML-to-Cubicon

Connector development simplification. The connection between a Web application and an XML document is typically performed through a DOM (or a DTD) and/or WSDL and a manifest. Cubicon simplifies this process for the connector developer through a manifest socket. This mechanism provides a calling program direct binary access to service attribute values.

Connector Development Simplification -
DOM / XML vs. manifest socket

Design transformation integrity. Cubicon is compatible to existing XML-based schemata. A grammar parser that creates a set of coreObjects that faithfully mirror the legacy specification transforms a particular schema.

The coreObjects that comprise a Service context are distilled into a set of binary composites (a skeleton, I/O manifests, and facets). These binary composites are used to natively process outbound and inbound service instance payloads in a CubeRun device.

Design Transformation Integrity

Native execution. Native Service processing can take place end-to-end with process behavior located in a pair of Module instances (groves). Individual core operations read and write attribute values between a Service instance and the grove objects.

Native Application


 

email: klausner@coretalk.net
Planning for a Deep Semantic Net
Contact: Sanford B. Klausner, Founder and CTO
408.621.4709


  © Copyright 1987-2008, Sanford B. Klausner