The Cubicon Platform
Enabling a Semantic Net Environment
Semantic Engine Enables Transparent Computing
Knowledge Maps Support Machine-to-machine Context Processing
Software Transactional Memory Empowers multi/many-core Processors
Deep Packet Inspection Permits Intelligent Routing
These advanced capabilities will be difficult to achieve using current programming languages, databases, as well as XML and related W3C substrate technologies. These legacy technologies lack the performance, productivity, interoperability, agility, robustness and security characteristics to support a Semantic Net environment.
The following sections explain the requirement for each capability, its value to an advanced platform and a high-level technical description of the Cubicon platform architecture.
Semantic Engine (SE) Enables Transparent Computing
Transparency Requirement
A Semantic Net environment needs to enable a developer to create sophisticated programs without having to be concerned with physical execution memory (heap space) in a secure manner. Current Java (JVM) and Microsoft (.NET) virtual machines perform these functions, but have inherent performance limitations forcing a developer to workaround the garbage collector (GC) event when the VM is applied in a high performance environment. This problem exists at both ends of the network spectrum: router and embedded devices. A router needs to perform Deep Packet Inspection (DPI) at wire speed and many embedded environments are real time in nature requiring systems sampling rates far faster than current VMs can poll and respond (i.e. engine monitoring, fly-by-wire aircraft). A hard code workaround also wreaks havoc on interoperability destroying any resemblance to an intrinsically secure computational model.
Garbage Collection Explanation
The name 'garbage collection' implies that objects no longer needed by the program are 'garbage' and can be thrown away. A more accurate and current metaphor might be 'memory recycling'. When a program no longer references an object, the heap space it occupies can be recycled so that space is made available for subsequent new objects. The GC must somehow determine which objects are no longer referenced by the program and make available the heap space occupied by such unreferenced objects. In the process of freeing unreferenced objects, current commercial GCs must run a finalizer on objects being freed.
In addition to freeing unreferenced objects, a GC must also combat heap fragmentation. This occurs during the course of normal program execution. New objects are allocated, and unreferenced objects are freed so that available portions of heap memory are left in between positions occupied by live objects. Requests to allocate new objects may have to be filled by extending the size of the heap even though there is enough total unused space in the existing heap. This will happen if there is not enough contiguous free heap space available into which the new object will fit. On a virtual memory system, the extra paging (or swapping) required to service an ever growing heap could degrade the performance of the executing program. On an embedded system with small memory, fragmentation could cause a virtual machine to run out of memory unnecessarily.
Garbage Collection Algorithms
Any GC algorithm must do two basic things. First, it must detect garbage objects. Second, it must reclaim the heap space used by the released objects and make that space available again to the program. A GC ordinarily accomplishes defining a set of root objects and determining reachability from them. An object is reachable if there is some path of references from the roots by which the executing program can access the object. The roots are always accessible to the program. Any objects that are reachable from the roots are considered 'live'. Objects that are not reachable are considered garbage, because they can no longer affect the future course of program execution.
Commercial algorithms use a variety of GC approaches in an attempt to obtain transparency in the widest applicable set of target devices. The current approaches to distinguishing live objects from garbage are called reference counting and tracing. In addition, the science is overloaded with additional concepts such as mark and sweep, finalization, defragmentation, compaction, and copying/generational/adaptor collectors. GC research remains wide open since no current commercial implementation demonstrates the performance to meet the transparency requirements for a Semantic Net environment. A GC event in a JVM or .NET virtual machine takes over 20 milliseconds before control is returned back to an application.
CubeRun Semantic Engine Approach
A CubeRun garbage collection event is 1000x more efficient than JVM or .NET thus enabling effective utilization in router and embedded real time devices. This high performance is achieved through five fundamental and proprietary ways:
1) Context vs. object processing. The fundamental processing discourse in Cubicon is a context compared to an object. This higher order discourse provides additional metadata to the GC enabling it to perform object tracking, compaction and virtualization in a dynamic, automated fashion over a wide spectrum of target devices.
2) CoreObject component model. All processing within a CubeRun is performed by a finite set of coreObjects. This regularity between the language and execution engine eliminates much of the complexity found in contemporary language environments and greatly increasing algorithm efficiency. The following schema depicts the call relationships between functions within the Memory Manager module.
3) Cubicon iconic reference architecture. This reference architecture is a bit and pointer defined blueprint of the entire Memory Manager module. Iconic representation makes it possible to see and fine tune the algorithm in full living color prior to C coding.
4) Innovative algorithm. CubeRun uses a novel heap compacting technique that is generational, highly adaptive and incremental. Bench test monitoring indicates a nominal GC event takes less than 1 microsecond.
5) CubeStudio visual memory model. This 'dashboard' enables the GC parameters to be easily fine-tuned for a particular target environment from feedback provided by a visual characterization (2D and 3D) models.
Knowledge Maps (KM) Support
machine-to-machine Context Processing
Requirement
"The semantic web is an evolving extension of the World Wide Web in which content is expressed not only in natural language, but also in a format that can be read and used by software agents, permitting them to locate, share and integrate information more easily. It reinforces W3C Director Sir Tim Berners-Lee's vision of the Web as a universal medium for data, information and knowledge exchange.
"At its core, the semantic web comprises a philosophy, a set of design principles, collaborative working groups and a variety of enabling technologies. Some elements of the semantic web are expressed as prospective future possibilities that have yet to be implemented or realized. Other elements of the semantic web are expressed in formal specifications. Some of these include Resource Description Framework (RDF), a variety of data interchange formats (e.g. RDF/XML, N3, Turtle, N-Triples), notations such as RDF Schema (RDFS) and the Web Ontology Language (OWL), all of which are intended to provide a formal description of concepts, terms and relationships within a given knowledge domain.
"Humans are capable of using the Web to carry out tasks such as finding the Finnish word for car, to reserve a library book, or to search for the cheapest DVD and buy it. However, a computer cannot accomplish the same tasks without human direction because web pages are designed to be read by people, not machines. The semantic web is a vision of information that is understandable by computers, so that they can perform more of the tedious works involved in finding, sharing and combining information on the web."
An Emerging Knowledge Science
Knowledge Science is the discipline of understanding the mechanics through which humans and software-based machines know, learn, change and adapt their own behaviors. Throughout recorded history, knowledge has been made explicit through symbols, text and graphics on media such as clay, stone, papyrus, paper and most recently, as digitally stored representations. The digital effort began in the early 1970's when knowledge science was recognized as a vigorous field of study beginning with the development of natural language learning programs funded by the National Science Foundation (NSF). Now, knowledge science experts are engaged in a debate between:
- Meaning as represented by language-based propositions that adhere to universal truth-conditions, and
- The quantum relativist view, that meaning exists as a condition under which it can be verified and certified as acceptable without regard to universal truth-conditions.
Knowledge science and knowledge representations encompass philosophical, epistemological and ontological considerations. This article presents an overview of this new field of study, relying on historical data to provide insight and understanding, while it addresses the two schools of knowledge science."
The first language-based school is driven by the W3C and the use of HTML and accompanying HTTP protocol. The second value-based theory-based semantics school was developed by Richard Ballard and utilizes a single relational database table with four fields designed to store unique identifier codes that are pre-defined, stored and then later looked-up to populate the value-model/object/attribute/value table fields. This original technology was called the Mark 2.
The language-based knowledge representation has progressed from RDF in 1999 to DAML and now the Web Ontology Language (OWL-S) and other current formats. Researchers are now working on a numerical (value-based) representation to define a language-independent structured knowledge exchange.
Ballard's value-based technology has been refined in the Mark 3 implementation towards a goal where a "software-based machine can faithfully represent every form of knowledge and reason with that knowledge the way people do." (Ballard)
Topic Maps. Topic Maps is an ISO standard for the representation and interchange of knowledge, with an emphasis on the 'find-ability' of information. It combines concepts from the two knowledge representation schools while introducing several of its own.
A topic map can represent information using topics (representing any concept, from people, countries and organizations to software modules, individual files and events), associations (which represent the relationships between them) and occurrences (which represent relationships between topics and information resources relevant to them). They are thus similar to semantic networks and mind maps in many respects. In loose usage all those concepts are often used synonymously, though only topic maps are standardized.
Definition
Use cases
Cubicon Topic Map Approach
Cubicon's approach is value-based cast in a topic model. This advanced knowledge representation model has the following distinguishing features:
Representation of both particular and universal knowledge. Cubicon can represent particular information and universal theory as well as interrelate between both fundamental knowledge forms. Particular information represents anything that exists in time and space that can be processed by the senses, measured or counted. Universal theory is the coherent general propositions that comprise rational constraints that justify the relationships of concepts, ideas, thought patterns and their instances necessary to convey meaning.
An entity (a human or body (company, organization or government)) develops and maintains system components based upon its context within a community of practice. Cubicon is based on a converged structural and behavioral model that brings context and coherence to the systems world. Context incorporates conventional software object-oriented semantics, however its model deals more precisely with information through higher order cognitive constructs. Context in Cubicon is declared in four forms: Module, Plexus, Service and Topic. These forms share the identical composite data structure architecture yet play distinct system roles. The first three forms represent particular information:
A Module is a grove of objects and cells that represent something specific (e.g. an employee record in a database or an expression in a spreadsheet).
A Plexus is either logic/number-oriented (medium) or string-oriented (document) and is represented as an attribute of a Module object or Service (e.g. a JPEG, MPEG or a PDF, Word doc, HTML page).
A Service maps transaction data between heterogeneous systems controlled by different entities (e.g. financial transaction over the Net).
The Topic context form is unique in the respect that resource data is related to a universal theory. A topic represents a specific subject. The relationship between a topic and its subject is defined as reification. Reification of a subject allows internal resource data to be assigned and further defining the subject for a given topic. Occurrence resources provide means to access external data and information related to a topic.
Concentric model. A Topic Map automatically organizes around a center study topic. Any topic that has a direct association with the study topic appears in the top focus group circle (dark blue). Additional degrees of separation appear in subsequent clockwise groups. Selection of another topic moves it to the center and refreshes the map to depict its particular topic associations. There will never be a map 'top'. Dynamic viewing provides a Cubist the flexibility to reason about subjects and their characteristics from any topic perspective.
Net navigation augments Web search. The Web requires a person to reflect upon a page and search by performing a click to a linked HTML page. A page contains no semantic information that would allow a machine to self-determine a search purpose, relevance or direction. A Cubicon Topic Map is comprised of semantic-associated concepts that are navigated by a CubeRun semantic engine. Language text search is replaced by topic value-linked navigation. A Topic Map may consist of thousands of concepts (topics that are reified subjects) possibly linked to any number of other Topic Maps controlled by other entities. This myriad of interconnections evolves as an overlay on top of the Web and becomes the Semantic Net foundation.
An alternative viewpoint to this replacement is to consider the Web as a grid of interconnected HTML pages. Conversely, the Semantic Net will consist of a grid of interconnected 'topics' either general or specific. Each topic refers to resources about the same subject spread over many nodes around the Net.
Resources occur around the Net. Besides the traditional URI resources, a topic can also extend through three other external reference forms that are native to Cubicon. They consist of document, medium and service occurrences. External resources are particular data that are status-based and change over time. By contrast, internal resources are occurrences of discourse maps and universal data declarations of theory that are concept-based. Cubicon refers to this linked resource data as 'hyperdata', an extension of the original 'hypertext' term.
An efficient means to capture domain knowledge. A domain expert utilizing CubeStudio primarily through the topic frame, declares a map. A topic participates in relationships called 'associations', in which they play both client and server roles. A topic can also belong to one or more aspects. Thus, topics have three kinds of characteristics; resource occurrences, roles played as associations, and aspects where they belong. A topic along with its characteristics can also be declared by program behavior enabling the Semantic Net to evolve facilitated by reflective machine reasoning.
Value-based concept encoding. A Topic Map is composed of a number of parts that are binary encoded for efficient representation and native CubeRun processing. Integration with the virtual machine enables situation service agents to navigate vast knowledge spaces at the 'speed of thought'.
Global term/name space identification. Cubicon's multi-dimensional space identification technique provides universal discourse across countries and cultures by capturing the theory within concepts and information within component contents. This identification space is represented in globally unique concatenated integer values. Each identifier may be linked to multiple concept terms or component names. A concept may have multiple term synonyms. A component and its subcomponents may have multiple community natural language declarations. In addition, an entity may declare a component dialect name for use within their body.
Situation agents drive machine-driven reasoning. A situation map conveys knowledge about a set of topic maps by reasoning about their topic characteristics and content resources hosted by both native and alien entity repositories. A map supports a situation service, an agent behavior that navigates between topics to perform this inference.
A situation service is presented with predicate data called a premise. A premise consists of composite structures that are shared between topics. A situation service executing within a CubeRun analyzes both a topic's relationships with other topics as well as related occurrences of content resources. Based on this analysis, it will then eventually return to its origin device. Once home, 1) a calling service can consume contents of the suitcase, 2) return contents back to user through the interface, or 3) browse forward to another Web page.
There are two service forms: situation services and dialog services.
A dialog service is an exchange between two entities in operations that usually perform a transaction process consisting of two or more individual back and forth steps over a period of time. Dialog services may be nested to perform subcontracting tasks.
A situation service may call either service form. A situation may be called by another situation and therefore infinitely nested as one service leverages the knowledge of another.
A situation service creates a scenario that navigates from topic-to-topic and map-to-map. These dynamic scenarios are archived and can be displayed to enable a Cubist to understand system behavior more effectively.
Net navigation provides knowledge insight. Cubicon evolves the Internet from a Web browser to a Net navigation experience transforming information into knowledge. A Net Navigator map displays knowledge derived by accessing linked topics through situation service behaviors. These maps are declared in Cubicon and directly enable a domain expert to understand decision choices and weigh possible outcomes.
Development of Concept Genealogy as well as Topic, Situation and Net Navigator maps are context components that will employ the next generation of Semantic Net Designers and return employment to domestic firms where domain expertise will trump outsourced general-purpose labor.
Knowledge metering provide mechanism to create economic incentive. A fundamental business issue is to create an eco where communities and entities have an incentive to invest their human capital to create knowledge maps. Cubicon provides a Topic Map producer the capability to charge a consumer for knowledge on a per situation service access basis. A consumer request for access to a producer's Topic Map may be through a Net Navigator or program-driven through behavior declared within a system module (grove).
Software Transactional Memory (STM)
Empowers multi/many-core Processors
Requirement
The near-term future of processor technology is multi/many-core, multi-thread, and parallelism. However, programmability remains a key barrier to exploit these emerging environments and realize their potential in the marketplace.
Numerous roadblocks stand in the way of proposed higher-level packages along with the many language extensions created to support concurrency. The burden of overcoming the complexity of context/synchronization analysis, exceptional handling, performance monitoring and tuning, not to mention the debugging of parallel networks, severely limits adoption and optimal utilization of multi/many core processors to only an elite field of expert programmers.
These advanced processors face a serious programmability problem. Concurrency quickly overwhelms humans. Almost universally we find it much more difficult to reason about concurrent code over sequential code. Even the most skilled programmers can miss possible interleaving opportunities among simple collections of partially ordered operations. Manual parallel programming is error-prone.
The solution? As many have said before, optimal utilization of multi/many-core processors requires effective automation techniques that can go well beyond current language extensions. Automation will enable mainstream software developers, as well as users, to easily adopt new techniques.
Cubicon STM Approach
Cubicon comprehensively addresses parallel programming challenges facing the microprocessor industry through industrial application of STM technology. STM provides concurrency control among individual thread tasks as well as transaction commit and rollback sequencing on multicast content across the Internet.
Soft-lock transaction eliminates deadlock. A transaction is a sequence of tasks executed by a single thread through a closure mechanism. Transactions are atomic: each transaction either commits (it takes effect) or aborts (its effects are discarded). Transactions are linearizable (or serializable): they appear to take effect in a one-at-a-time order.
Soft-lock transaction facilitates fine-grained synchronization. There is no need to maintain tracking of what locks protect which structures, and no need for elaborate deadlock-avoidance protocols. A soft-lock transaction functions as an alternative to lock-based synchronization and is implemented in a lock-free way. As a transaction closure executes a series of reads and writes to a shared memory, the reads and writes logically occur at a single instant in time. Intermediate states are not visible to other (successful) closures.
Optimistic parallel execution provides transparent programming model. Unlike locking techniques used in most modern multi-threaded applications, Cubicon is optimistic. Every thread completes its modifications to a shared memory without regard to what other threads might be doing. It records every read and write made to a structure within a closure.
A closure operates within a session and includes a buffer that contains a copy of those object and cell structures controlled by a transaction. Instead of placing the onus on the writer to make sure a specific transaction does not adversely affect other operations in progress, it is placed on the reader. After completing an entire transaction, the reader verifies that other threads have not concurrently made changes to memory that the transaction of interest has accessed in the past.
In the final operation called a commit, changes of a transaction are validated and if this confirmation is successful it will be deemed permanent. A commit writes a buffer back to a shared memory. A transaction may also abort at any time, causing all prior closure changes to be rolled back or undone. If a transaction cannot be committed due to conflicting changes, it is typically aborted and re-executed from the beginning until it succeeds.
Immense performance gain on large number of cores. The benefit of this optimistic approach is increased concurrency. No thread needs to wait for access to a resource, and different threads can safely and simultaneously modify disjointed parts of a shared memory module (grove) that would normally be protected under the same soft-lock. Despite overhead of retrying transactions that fail, in most realistic applications, conflicts arise rarely enough that there is an immense performance gain over lock-based protocols. This is especially true when using large numbers of processor cores. Module processing is distributed among a number of core pools upon initialization of a node.
STM mechanism integrates with memory manager. A closure performance hit relative to fine-grained lock-based systems is minor. The closure mechanism is an integral component of CubeRun and uses memory pointers for copy operations. This solution easily scales since the mechanism sits directly above the Memory Manager that oversees garbage collection, compaction and virtualization of structures.
Alien entity transaction sessions not compromised. The closure mechanism works over TCP/IP. This means that a set of remote sessions can now perform transactions with an origin grove in a transparent manner. Structure access control is provided with an exchange set mechanism providing the ability to establish read and write privileges, thus allowing an alien entity to interact with grove structures as well as their own individual attributes.
Intent of Ajax is fully automated in Net browser model. A closure is also used for creating interactive web applications much like the Ajax (Asynchronous JavaScript and XML) model. Ajax allows exchanging small amounts of data with a server behind the scene, so that an entire Net page does not have to be reloaded each time the user makes a change. A commit action only returns modified data and automatically updates other closures that share a particular structure.
Net browser enables simple responsive behavior. A net browser replaces traditional application window and web browser mechanisms with an integrated architecture that enables a client's page to interact with a server as though it is part of the native remote application.
This window/browser fusion brings a common user experience regardless of the client/server relationship. This visual architecture uses a window closure composed of core components. Transaction data interaction remains uniform all the way down to core processing.
Deep Packet Inspection (DPI) permits intelligent routing
Requirement
Internet Protocol (IP) routing is the foundation of the Internet and considered one of the most important technologies of the past 20 years. IP routing technology forwards packets of data to the appropriate network destinations, creating the current most efficient use of aggregate network bandwidth.
As new uses for the Internet are created, the need for a 'smart' network becomes even more critical. Advanced capabilities will need to provide the intelligence on the network to offer a foundation for delivering robust services such as voice and video, quality of service, security and network-aware applications. These network services are helping to drive the growth of the Internet through the creation of new applications such as real-time trading, interactive support, on-demand media, multi-site communications and unified messaging. To stay competitive, more and more businesses are using sophisticated Internet applications that will rely on these types of advanced capabilities.
Three generation of routers:
- First-generation networking devices could only store and forward data packets.
- Today, routers can recognize, classify, and prioritize network traffic, optimize routing, and support voice and video applications based upon examination of a packet header.
- Tomorrow's router will need to perform DPI of payload content to provide higher-level virus detection and smart, network-aware services.
DPI is a fundamental process required to effectively utilize third-generation capabilities. It will provide the ability to access and analyze structured content as it passes through a router at wire speed. However, this process is currently weighted down by document parsing and virus screening tasks that require many machine cycles to perform. Efficiency can be significantly improved by allocating these tasks to acceleration hardware. Offering effective DPI is just the bottom tier of a tall stack of functionality required to enable future communication services architecture.
VAN Facilitates Service Deployment and Management
Virtual Active Networking (VAN) is a term used when referring to the advanced capabilities created by augmenting DPI. VAN needs to allow customized processing on packets in a stream. Packet processing can be applied to application-aware routing, information caching, multi-user communications as well as packet filtering. While a VAN may be owned and operated by a network provider, it allows its users to install and execute services on the network that may be simultaneously shared by many users. The primary benefit of this service automation is its ability to facilitate rapid deployment and provide flexible management.
Requirements for a VAN. Four fundamental concepts are required to unite in order to realize network-aware services:
- A user must be able to exploit customized packet processing.
- Users must be able to execute their own set of services on the provider's router.*
- The provider must be able to isolate different users
- An interaction framework between providers and users must be in place.
* along with network switch, proxy server, soft switch, SAN or other core network device, as differentiated from edge devices.
VAN Design Environment Requirements. A VAN requires a design environment with the following characteristics, to be used by both the network provider and its users:
- Integration of service-specific functionality into the network layers.
- Common net-centric environment to enable service sharing
- User's ability to rapidly co-develop and provision services.
- Provider's ability to allocate and meter node resource utilization.
Comparison to a Virtual Private Network (VPN). The VAN captures the functionality and resources that a provider offers to a user. In the same way that an active network can be understood as a generalization of a traditional network, a VAN can be seen as a generalization of a traditional Virtual Private Network (VPN). Similar to a traditional VPN, a VAN can be used to run network services using a provider's physical infrastructure.
In contrast to a traditional VPN, however, a VAN gives a user a much higher degree of service flexibility and control. A VAN is a very 'natural' abstraction for users and allows them to create management functionality and oversee their services at run-time without interaction with the provider's management system - a capability that is very difficult and costly to achieve in current router environments. Furthermore, a VAN supports a mechanism to isolate users from one another in order to avoid interference between their services. This mechanism should reside inside each active network node, thus enabling the provision of resource partitioning and the capability of policing consumption along VAN boundaries.
Stateful Inspection
The concept of using a firewall to provide security functions is growing. Originally, only the packet's header would be examined, perhaps to exclude traffic from suspected bad actors. It is now important to examine the payload as well, such as to detect virus-like content for example. This DPI capability, now mostly used for intrusion detection, is a prerequisite for the performance of future network-aware services. The network routing device performing DPI must look into the payload and then make decisions based on the significance of that data.
A firewall performing stateful inspection, by contrast, analyzes packets at the network layers, as well as overseeing higher layers, in order to assess the overall packet. By combining information from various layers (transport, session and network) the firewall now is better able to understand the content it is inspecting. This advanced level of review enables establishment of virtual sessions in order to track connectionless protocols such as UDP- and RPC-based applications.
Emerging Need for Stateful Inspection Technology. Modern application services require firewalls that have the capability to gain a more intimate knowledge of the application payload. Emerging applications utilizing eXtensible Markup Language (XML) and Simple Object Access Protocol (SOAP) require the firewall to monitor the content within the packets at wire-speed. Additionally, services that can change their communication ports (perhaps in order to spoof outbound filtering) or those that tunnel within commonly allowed ports (such as TCP port 80) must be monitored in order to maintain security within a network. In order to meet these new demands, firewall stateful inspection technology must further evolve.
Traditional stateful inspection dictates that a router maintains packet header knowledge about a particular stream until a session ends. This capability must be extended into packet content as well, even to allow for the state of the data to be modified by a network-aware service.
Requirements for Advanced Intrusion Detection and Prevention. In order for the firewall to successfully provide intrusion detection and prevention, it must have DPI capabilities including:
- Screening for viruses in-line and at wire-speed./li>
- Parsing, analysis and if necessary filtering of XML traffic.
- Dynamic proxy action for instant messaging (IM) services.
Additionally, the firewall must be able to provide for wire-speed Secure Socket Layer (SSL) session inspection and filtering. This will require the ability to decrypt an SSL session and then re-establish it once the packets have been inspected.
Intrusion Detection Technologies Evolving to DPI. The need for DPI capability in firewalls stems from pervasive data-driven attacks that effect many network sectors. DPI represents the integration of stateful inspection alongside traditional intrusion detection and prevention capabilities. Current intrusion detection technology (IDT), while able to detect these attacks, provides very little preemptive capability.
A worm can infect a significant number of systems within a relatively short period of time. NIMDA's multi-vector infection routes posed serious difficulties for IDT in particular. While IDT provides some relief from each of these attacks, moving the detection and response directly into the firewalls using DPI allows for immediate termination of the intrusion by cutting the line of communication at a predetermined network demarcation point.
DPI Engine Requirements. A DPI engine will perform a combination of signature-matching and heuristic analysis of data in order to determine the impact on the communications stream. DPI-capable firewalls need to maintain the state of the underlying network connection, in addition to the integrity of the application utilizing that communication channel. Additionally, it must also perform statistical or anomaly analysis.
The engine must consistently process packets at 'wire speed', a rate comparable to today's routers and switches. While the concepts of DPI may appear simplistic, it is not easy to achieve using current practices. Hardware acceleration utilizing modern network processing units (NPUs) may be beneficial in providing fast discrimination of content within packets.
XML and Web Services Pose Challenges. XML-based Web services present challenges to the realization of VAN capabilities. XML documents are verbose, particularly when composed in Unicode. It will be necessary to marshal, compress, encrypt, decrypt, decompress and parse XML documents before deep packet inspection can occur. This imposes serious processing demands on any current network router that is applying DPI.
This is presently being 'solved' by brute force employing specialized proxy server XML acceleration hardware that processes the traffic quickly, but off-line.
Another challenge is that the development of advanced services using current disparate XML toolsets are difficult and too poorly defined to be widely deployed on generic network router platforms.
Cubicon DPI Approach
Cubicon uses a fundamental context-based protocol to transfer content between Internet nodes and uses a trusted binary stream that is inherently secure by architecture. This architecture eliminates the need for XML acceleration hardware and enables CubeRun to perform content inspection without document parsing and screening. A user has access to Service streams through high levels of visual abstraction to enable fine-grain content filtering as well as traversal, edit and transfer operations on Service structures.
Cubicon remains interoperable with XML services. Cubicon works optimally when all of the nodes involved in the network traffic (edge and routing devices) operate a CubeRun. However, it is important to note that Cubicon can be incrementally installed to work alongside of conventional technologies (XML-based Web services, data bases and client/server applications, etc.) It does not require an either/or decision for initial deployment or the dismantling of existing technologies.
Context processing is key to secure interchange. Context is the semantics (or meanings of things) that enables correct processing of content. Context refers to something we know about the content. With the formalization of context, a computer can reason about things independently without human intervention.
Context instance exchange. In an XML document as it is currently formed; content may be domain data or a service being interchanged between nodes in a network. The 'context' for processing current XML content is embodied in its markup tags that must conform to a voluminous series of complex W3C specifications including XML Schema, Namespace, DOM, XPath, XPointer, XSLT, XML Query, WSDL, RDF and now OWL.
Within conventional object-oriented processing, each node dynamically parses and analyzes documents by using a broad range of proprietary means. Markup tag parsing is a major source of current processing latency.
By contrast, Cubicon takes a different approach to context processing. Instead of embedding context within content, Cubicon handles it separately. In this new environment, context is a community resource that describes the common meaning (semantics) and agreed to by community members. Cubicon expresses content as instances of a context.
Context will be established once when a schema is defined or enters the operating environment. It is then applied to individual instances at runtime. This step eliminates the need to parse and marshal the binary instance data on native payload. This reforming of traffic into native context instances also relieves demands on physical processing and memory, as well as enables on-the-fly wire-speed routing based on the content state.
A transport protocol based upon a binary context instance is efficient for service extensions such as packet voice and content distribution. CubeRun is a highly unified network interaction engine that is optimized for processing of context instances. At the application protocol layer, the effect of the context approach will be similar to an XML accelerator, but without the hardware cost. A CubeRun engine needs only to be installed within a node similar to installing a Java virtual machine. A CubeRun drives each node and Community Repository discussed below.
General network nodes, including those containing a CubeRun are protected, as a first line of defense, by traditional network security measures such as firewalls, VPNs, SSL, PKI, etc. In addition, CubeRun provides an intrinsically secure execution space that prevents malicious code from being introduced into the processing of context instances. All operational behaviors (programs) are maintained jointly by trading partners in a repository available to any particular community and create the basis for this security. At run time, context is transported to a node (which may locally cache them) as core instructions. It is virtually impossible to attack and insert viral or worm code into a core instruction stream.
Context sourced from Community Repository. Cubicon defines and maintains context resources in a Community Repository. A node (that is any Cubicon-enabled network device) 'shares' only the content and can perform 'proprietary' processing based upon semantics of particular contexts found in a set of Community Repositories. Community members are entities that control nodes. Community formalization of a context specification enables entities to interchange content using shared protocols, when necessary, retaining independent processing behaviors. A Community Repository provides numerous mechanisms that automate a protocol lifecycle process.
The Cubicon platform can interoperate with extant information systems by transforming any well-formed schema into a set of context resources. Any well-formed schema can be transformed. Alternatively, a Cubist through CubeStudio can declare native context resources.
- A skeleton provides backbone for a context instance consisting of a minimum set of linked composite collections from which content is affixed.
- A manifest provides a programming interface to enable a requesting node to input content from an application into a service instance and a responding node to output content from a service instance into an application.
- A facet provides a collection or attribute characteristic that is common between all instances of a particular context.
A Community Repository is the single source for a context type. In order to create or process an instance, a node must request context resources from a Community Repository. A requesting node receives a skeleton, input manifest and facets. Conversely, a responding node receives an output manifest and facets. A node has the option to cache these context resources in order to process subsequent instances. Automatic purging can be elected if no instance processing requests are made within a specified period.
Content is transported between requesting and responding nodes. It consists of an instance of a context that is comprised of a skeleton with affixed content data.
CubeNet architecture provides intrinsic security. A context instance only contains data, and is free of metadata or operating behavior. It is always in a binary canonical (in simplest or standard) form that assures data integrity. The nested tree structure of the binary itself is even obscured from pattern recognition sniffing techniques. While this binary is inherently secure for most applications (eliminating the need for additional process-intensive steps of encoding and decoding) selected structured collections or only semi-structured strings within a context instance may be additional encrypted for security. The skeleton is much more lightweight than an XML document and displaces all markup tags. Numerical data remains in binary and text strings can be optionally compressed and/or encrypted. Alternatively, selected collections and even particular attributes may be excluded from a transmitted instance based upon a user privacy mask.
Cubicon security is inherently designed with flexibility, ease of use and industry standard practices in mind. The use of a proprietary obfuscation (to make obscure or unclear) technique increases data security significantly without the need to resort to process-intense encryption. Additionally, the constraints of possible system behaviors through core operations further prevent traditional intrusions through buffer overflows, data destruction and modification, and data injection attacks.
Access control is also strictly controlled by the use of distributed Community Repositories that are well known and understood context sources, thus reducing the need for complex credential co-ordination among multiple servers. This reduced complication translates into a more efficient, easier and economical system to validate and maintain credentials appropriately, subsequently removing a major problem currently found within large complex knowledge-sharing based systems.
Cubicon by its very nature is decentralized through its CubeNet topology of repositories. A Cubicon context can be sourced only from a Community Repository. This automatically guarantees the integrity and version of a program file to process a particular context instance. However, a Community Repository can be mirrored to provide alternative context servers to demanding nodes in case of network interruption or congestion.
Cubicon also effectively assures the integrity of dynamic content by knowing and controlling the insertion of executable binary files into a context instance. Binary files can be simply prohibited, or isolated for selective virus and worm filtering, without scanning the entire context instance.
Intrinsic core operating processing. Composite data structures can be traversed and manipulated by executing core operations. They can also copy or move an attribute value to another collection structure. Alternately, external system calls by a Java or C-based program can evoke innate manifest and process operations that perform the following advanced behaviors:
- Privilege - Establishes consumer access to a provider Module instance (grove) structure and can limit value read/write privileges to a particular composite structure and even an individual attribute.
- Hold back stem - Limits a Service instance transport to select branch and leaf collections. Limits content distribution on a need-to-know basis.
- Encrypt stem - Secures a composite branch and is especially useful in multi-hop dialogs where an intermediate node needs to be prohibited from reading a portion of a Service instance.
- Correlate attribute - Establishes invariant value relationship between Service attributes within and in between structures.
- Fulfillment and personalization marker - Qualifies manner in which provider entity processes Service based on analysis of consumer entity profile and instance content.
These mechanisms are made available to a Cubist through CubeStudio, displaying a high level of abstraction that require little training and orientation for effective use. For example, the following sample fulfillment process is declared as an expression that uses the result of four Boolean core operations as operands that determine how a Service instance will be routed.
CubeRun Service Processing Configurations
CubeRun can process both foreign XML and native content through several engine configurations as follows:
Transport/processing efficiency. Cubicon provides significant improvements in the efficiency of transporting and processing of complex services:
- Instead of using a DOM and WSDL to transform a XML document, a manifest socket provides a transparent interface to read and write directly to all nested attribute values regardless of the complexity of a composite using pointer tree navigation. There is no separate DOM expansion in memory of the Service instance.
- There is no requirement to use a serializer or marshaller with a context instance prior to network transport, thus problems associated with document parsing and well-formedness checking are eliminated. This architecture removes the need to apply hardware acceleration to an incoming payload.
- A context instance is already in canonical form, thus making it far easier to apply a digest algorithm as well as PKI (Public Key Infrastructure) trust services.
- A Service or Plexus instance is typically 1/5 the size of a XML document, in most instances eliminating the need to apply compression and decompression.
- Service instances can be encapsulated in a SOAP wrapper, such that existing conventional XML communication channels can be utilized.
Bi-lateral run-time transformation integrity. Cubicon maintains its integrity with the XML document infrastructure by transforming a binary context instance into text through CubeRun's serializier prior to transmitting. Due to the high degree of semantic representation, this XML output is of exceptional quality, created quickly and very well formed for external processing.
Cubicon-to-XML
Cubicon may receive a XML document that must first be converted into binary through a marshaller. There are several marshaller technologies now available that Cubicon will support. A marshaller writes into CubeRun through a manifest port. A manifest is a list of application calls from and to a programming language. A Community Repository for each Service context automatically maintains this API.
XML-to-Cubicon
Connector development simplification. The connection between a Web application and an XML document is typically performed through a DOM (or a DTD) and/or WSDL and a manifest. Cubicon simplifies this process for the connector developer through a manifest socket. This mechanism provides a calling program direct binary access to service attribute values.
DOM / XML vs. manifest socket
Design transformation integrity. Cubicon is compatible to existing XML-based schemata. A grammar parser that creates a set of coreObjects that faithfully mirror the legacy specification transforms a particular schema.
The coreObjects that comprise a Service context are distilled into a set of binary composites (a skeleton, I/O manifests, and facets). These binary composites are used to natively process outbound and inbound service instance payloads in a CubeRun device.
Native execution. Native Service processing can take place end-to-end with process behavior located in a pair of Module instances (groves). Individual core operations read and write attribute values between a Service instance and the grove objects.
Cubicon Clean Slate
Semantic Use Cases
Big Ten Technology Innovations
Open Source vs. Designed Source
Semantic Net Architecture
First Release Capability
Enabling a Semantic Net Environment
An Effective Parallel Programming Architecture
Planning for a Deep Semantic Net
408.621.4709