Good Times With VirtualBox Networking

Good Times With VirtualBox Networking

 

oracle virtualbox

Image Source: Oracle

TL;DR version: if you run multiple VirtualBox VMs on the same desktop, setup 3 network interfaces on each such VM (one NAT, one internal, one bridged).

Now for the long, more entertaining (hopefully!) version:

Recently I switched from VMware Workstation to Oracle VirtualBox for my personal virtualization needs. I’m very happy overall. VirtualBox seems faster to me – when I minimize a VM, do lots of other work, then restore the VM, it is responsive right away; where vmWare would page for a minute or two.  And each VirtualBox VM is in a separate host window, which I like more than VMware’s single tabbed window.

Still, I must say VMware’s networking was easier to deal with.  Here’s how I ended up with 3 IP addresses in each of my local VMs…

I have a CentOS VM running Alfresco and Oracle; a Fedora VM running Apache SOLR and IntelliJ IDEA; and a Windows 2012 Server VM running Active Directory. I need connectivity to each of them from my host desktop (Windows 8.1), and they need connectivity to each other, and they need to be able to connect to Armedia’s corporate VMs. Plus, I’d rather not update my hosts file or IP settings every time I move between the office and home!

1st VirtualBox network: a Network Address Translation (NAT network) which allows each VM to talk to the other VMs, but not to any other machine; and does not allow connection from the host desktop. This meets Goal #2 (connectivity to each other). But Goals #1 and #3 are not met yet.

2nd VirtualBox network: a VirtualBox Host-Only network which allows connectivity from the host desktop. Now Goals #1 (connectivity from the host) and #2 (connectivity to each other) are just fine.

Also, both the NAT and the host-only network offer stable IP addresses; whether at home or at work, my VM’s get the same address each time, so I don’t spend 10 minutes updating IP references every time I switch location.

Danger!  Here is where VirtualBox tricks you!  It seems like Goal #3 (access to corporate VMs) is met too!  With the NAT and internal IP addresses, I can see our internal websites and copy smaller files to and from the data center VMs. But if I transfer a larger file, I get a Connection Reset error! Twice in the last month, I’ve spent hours tracking down the “defect” in the corporate network settings.  (You’d think I’d remember the problem the second time around, but in my defense, the error manifested in different ways).

Solution? Add the 3rd VirtualBox network: a bridged network (i.e. bridged to your physical network adapter, so this network causes each VM to have an IP address just like the host gets, from the corporate/home DHCP server): Now the 3rd goal is really met! I can transfer files all day long, no worries.

Something to watch out for: when you disconnect a wired ethernet cable, VirtualBox automatically changes the bridged network to bind to your wireless interface.  This is nice since your VMs automatically get new addresses.  BUT! When you plug in the ethernet again (which in my case deactivates the wireless), VMware does NOT switch back to the wired interface! That happened to me this morning. Spent a few hours trying to figure out why my file uploads failed. Finally saw where VirtualBox re-bound my bridged network. Changed it back to the wired interface, and all was well.

ArkCase: Introduction to Data Access Control

Background

ArkCase is a framework for developing case management applications.

Data Access Control ensures each user sees only records they are authorized to see, and are prevented from seeing unauthorized records.  Data access control is applied to individual business objects, as opposed to role-based access control, which is only based on the user identity.

Role-based access is usually applied to each URL in a REST-oriented application.  It ensures the user is authorized for that URL; for example, that only document approvers can invoke the “/approveDocument” URL.  But role-based access by itself means any document approver can approve any document.

Spring Security easily integrates with Spring MVC to enable URL-based access control.  How can we easily add the logic to ensure that only Document A’s approver can approve Document A, and only Document B’s approver can approve Document B?  Not to mention ensuring that, until the document is approved, only users involved in the draft and approval process can even see it – so that it does not appear in anyone else’s search results or queries?

Straightforward Custom Applications

If the application is written for a single customer and implements a straightforward process with the same rules for everyone, the easy path is to build these controls into the application code.  Embed the rules in the query logic such that the database only returns appropriate rows; add checks in the document approval logic to ensure the user is on the approver list for that document.

If you design and build this application very carefully, and you understand the customer very well, and their requirements do not change faster than you can update the application; then this approach can work.  I’ve written many such applications and they were very successful; the users were happy, and all was well.

Larger, More Diverse Customers

The larger the customer organization, and the more departments and geographic regions being served, the harder it gets to implement data access logic in code.  I tried this approach for a large government agency where each region had slightly different rules.  The implementation got pretty difficult.  The queries become long, and the results came back much slower; the database isn’t really meant to apply sophisticated access control rules over very large data sets.  Let’s just say the customer was less happy.

Many Different Customers with Different Problem Domains

Let’s just extend the previous scenario to the ArkCase arena, where the framework has to satisfy entirely different customers, each of whom has radically different rules and even different types of business objects.  Now the straightforward approach of implementing access logic in code amounts to a commitment to rewrite the entire application for each customer.  Now my Armedia leadership team (the people who sign my paychecks!) are less happy!

The ArkCase Solution

In ArkCase, we have a flexible solution.  We have a fast way to implement the most common types of rules and we also provide a mechanism to implement more complicated rules.  And we have a fast, reliable way to evaluate the data access controls at runtime.

My next few blog posts will explore this solution in more detail.

In a nutshell, ArkCase requires the results of all data access control rules to be embodied in a database table.  Not the rules themselves; but the results of each rule as applied to each domain/business object.  This table holds all the access rules for each individual domain/business object.  Each domain/business object’s rules are also indexed in the Apache SOLR search engine.  This allows ArkCase to encode the current user’s access rights (group membership and any other access tokens) as a set of boolean restrictions in the SOLR search query.  SOLR is designed to efficiently evaluate multiple boolean search conditions.    This gives us fast search results including only domain/business objects the user is allowed to see.

More to come – stay tuned!

 

 

Adding Full Text Search to Ark via Spring and JPA

What, No Full Text Search Already?

My project ArkCase is a Spring application that integrates with Alfresco (and other ECM platforms) via CMIS – the Content Management Interoperability Standard.  ArkCase stores metadata in a database, and content files in the ECM platform.  Our customers so far have not needed integrated full text search; plain old database queries have sufficed. Eventually we know full text search has to be addressed.  Why not now, since ArkCase has been getting some love?  Plus, high quality search engines such as SOLR are free, documented in excellent books, and could provide more analytic services than just plain old search.

Goals

What do we want from SOLR Search integration?

  1. We want both quick search and advanced search capabilities.  Quick search should be fast and search only metadata (case number, task assignee, …).  Quick search is to let users find an object quickly based on the object ID or the assignee.  Advanced search should still be fast, but includes content file search and more fields.  Advanced search is to let users explore all the business objects in the application.
  2. Search results should be integrated with data access control.  Only results the user is authorized to see should appear in the search results.  This means two users with different access rights could see different results, even when searching for the same terms.
  3. The object types to be indexed, and the specific fields to be indexed for each object type, should be configurable at run time.  Each ArkCase installation may trace different object types, and different customers may want to index different data.  So at runtime the administrator should be able to enable and disable different object types, and control which fields are indexed.
  4. Results from ArkCase metadata and results from the content files (stored in the ECM platform) should be combined in a seamless fashion.  We don’t want to extend the ECM full-text search engine to index the ArkCase metadata, and we don’t want the ArkCase metadata full text index to duplicate the ECM engine’s data (we don’t want to re-index all the content files already indexed by the ECM).  So we will have two indexes: the ArkCase metadata index, and the ECM content file index.  But the user should never be conscious of this; the ArkCase search user interface and search results should maintain the illusion of a single coherent full text search index.

Both Quick Search and Advanced Search

To enable both quick search and advanced search modes, I created two separate SOLR collections.  The quick search collection includes only the metadata fields to be searched via the Quick Search user interface.  The full collection includes all indexed metadata.  Clearly these two indexes are somewhat redundant since the full collection almost certainly includes everything indexed in the quick search collection.  As soon as we have a performance test environment I’ll try to measure whether maintaining the smaller quick search collection really makes sense.  If the quick search collection is not materially faster than the equivalent search on the full index, then we can stop maintaining the quick search collection.

Integration with Data Access Control

Data access control is a touchy issue since the full text search queries must still be fast, the pagination must continue to work, and the hit counts must still be accurate.  These goals are difficult to reach if application code applies data access control to the search results after they leave the search engine.  So I plan to encode the access control lists into the search engine itself, so the access control becomes just another part of the search query.  Search Technologies has a fine series of articles about this “early binding” architecture: https://www.searchtechnologies.com/search-engine-security.html.

Configurable at Runtime

ArkCase has a basic pattern for runtime-configurable options.  We encode the options into a Spring XML configuration file, which we load at runtime by monitoring a Spring load folder.  This allows us to support as many search configurations as we need: one Spring full-text-search config file for each business object type.  At some future time we will add an administrator control panel with a user interface for reading and writing such configuration files.  This Spring XML profile configures the business object to be indexed.  For business objects stored in ArkCase tables, this configuration includes the JPA entity name, the entity properties to be indexed, the corresponding SOLR field names, and how often the database is polled for new records.  For Activiti workflow objects, the configuration includes the Activiti object type (tasks or business processes), and the properties to be indexed.

Seamless Integration of Database, Activiti, and ECM Data Sources

The user should not realize the indexed data is from multiple repositories.

Integrating database and Activiti data sources is easy: we just feed data from both sources into the same SOLR collection.

The ECM already indexes its content files.  We don’t want to duplicate the ECM index, and we especially don’t want to dig beneath the vendor’s documented search interfaces.

So in our application code, we need to make two queries: one to the ArkCase SOLR index (which indexes the database and the Activiti data), and another query to the ECM index.  Then we need to merge the two result sets.  As we encounter challenges with this double query and result set merging I may write more blog articles!

Closing Thoughts

SOLR is very easy to work with.  I may use it for more than straight forward full text search.  For example, the navigation panels with the lists of cases, lists of tasks, lists of complaints, and so on include only data in the SOLR quick search collection.  So in theory we should be able to query SOLR to populate those lists – versus calling JPA queries.  Again, once we have a performance test environment I can tell whether SOLR queries or JPA queries are faster in general.

 

Mule ESB: How to Call the Exact Method You Want on a Spring Bean

The Issue

Mule ESB provides a built-in mechanism to call a Spring bean.  Mule also provides an entry point resolver mechanism to choose the method that should be called on the desired bean.  One such method is the property-entry-point-resolver.  This means the incoming message includes a property that specifies the method name.  It looks like this:

        <component doc:name="Complaint DAO">
            <property-entry-point-resolver property="daoMethod"/>
            <spring-object bean="acmComplaintDao"/>
        </component>

This snippet means the incoming message includes a property “daoMethod”; Mule will invoke the acmComplaintDao bean’s method named by this property.

I’ve had three problems with this approach.  First, you can only specify the bean to be called, and hope Mule chooses the right method to invoke.  Second, Mule is in charge of selecting and providing the method arguments.  Suppose the bean has several overloaded methods with the same name? Third, only an incoming message property can be used to specify the method name.  This means either the client code invoking the Mule flow must provide the method name (undesirable since it makes that code harder to read), or the flow design  must be deformed such that the main flow calls a child flow only in order to provide the method name property.

How I Resolved the Issue

Last week I finally noticed Mule provides access to a bean registry which includes all Spring beans.  And I noticed Mule’s expression component allows you to add arbitrary Mule Expression Language to the flow.  Putting these two together results in much simpler code.  I could replace the above example with something like this:

<expression-component>
     app.registry.acmComplaintDao.save(message.payload);
</expression-component>

The “app.registry” is a built-in object provided by the Mule Expression Language.

In my mind this XML snippet is much more clear and easy to read than the previous one.  At a glance the reader can see which method of which bean is being called with which arguments.    And it fits right into the main flow; no need to setup a separate child flow just to specify the method name.

A nice simple resolution to the issues I had with my earlier approach. And the new code is smaller and easier to read!  Good news all around.

 

Writing a Framework is Not Like Developing an Application!

ArkCase is both a framework and an application.  As a framework, ArkCase provides a scaffolding to write case management applications tailored to custom-fit a specific customer.  As an application, ArkCase provides pre-built web application archives (WAR files) suitable for generic customers.  For example, we provide a law enforcement WAR and an inspector general WAR.

So we’ve known for a long time ArkCase should be both a software development tool to write case management applications, and also a standard out-of-the-box case management app.  Like JIRA is both a software defect and issue tracking application, and also a tool to specialize your own defect and issue tracking app.  Rational ClearQuest is the same way: it is both a default out of the box application life cycle management solution, and a tool to specialize your own such solution.

So how does ArkCase support these twin goals?  We all know it’s an order of magnitude easier to write a purpose-built app to specific requirements than it is to write a software development tool for other people to write their own purpose-built apps.

To write a purpose-built app we can naively map requirements to the technical architecture.  Does Customer X need a case file attribute that will never be needed by any other customer?  We just add the column to the database, add a field to our model objects, update our MVC controllers to pass the field around, and update our view to show it.  When it comes time to deliver another version of the system to Customer 99, they get all Customer X’s special fields… or else we have to make a special effort to remove Customer X’s special fields, perhaps by maintaining a branch per customer.  Shortly it becomes impossible to keep straight which customers get which fields.  We don’t really have a framework at all; we have many different purpose-built apps, one per customer.

To write a framework for building case management applications, we have to add a Core Object Model.  Then we write a standard library of components, implemented in terms of this Core Object Model.  Then we write our pre-built case management WAR files using the standard library and the Core Object Model.

For ArkCase, our Core Object Model includes:

  • ArkBusinessObject – real world objects tracked by our customers (persons, organizations, firearms)
  • ArkContainer – case management objects that manage business objects (case files, documents)
  • ArkRelation – links (unidirectional or bidirectional) between business objects, other business objects, and containers (a vehicle contained contraband)
  • ArkExternalEvent – real world events that form the history of a container (subject was arrested; a vehicle was stopped at the border)
  • ArkAction – user actions that change the state of business objects or containers (split one document into two; consolidate two case files into one)
  • ArkBusinessRule – constraints or guidelines that can be changed at runtime by business analysts (when a document is approved, declare a record in the records management application)
  • ArkBusinessProcess – guides the life cycle of a container or business object (documents must be approved)

This object model is the set of design artifacts to write the standard component library and customer-specific applications.  The object model is the extra conceptual layer that makes the ArkCase framework different from just naively developing a purpose built application.  The proof is in the pudding; we no longer have to maintain one branch per customer; we can have a single ArkCase framework source tree, with customer-specific projects that import the framework artifacts.