Secure HBase : Access Controls

October 11, 2010
By Eugene Koontz

Please also see slides from our presentation where this topic was presented at the NYC HBase Meetup, and the Trend Hadoop Group’s security branch of HBase on Github.

Building and maintaining an HBase cluster is a complex undertaking, especially if you build your own hardware infrastructure. Leasing from a cloud service such as Amazon EC2 allows you to avoid the expense and complexity of setting up your own hardware, but you’ll still need to know how to install, configure and tune your HBase cluster on top of your leased instances.

But what if you could simply connect to a HBase instance, hosted in a public cloud, and let someone else worry about HBase setup and maintenance? We believe there’s a group of potential HBase users who simply want to connect to a managed HBase cluster and start storing their data.

Another class of customers may be large organizations that want to centralize IT resources within a private cloud: a single company-internal cluster running HBase. Such organizations may have several departments, each of which is a tenant in the private cloud.

Both of these groups of potential HBase users want to keep their data secure in the presence of other tenants of the hosted HBase cluster: they want own their own tables, and provide defined access to users in their department, and perhaps even provide defined access to other tenants.

Users of Relational Database Management Systems (RDBMS) are used to having a security subsystem that controls access to the database amongst the system’s users. The newer generation of “NoSQL” database systems, on the other hand, generally assume a simple model that all connections are equally trusted. We at Trend Micro believe that adding security features to HBase may make it attractive to a larger group of users who want to host their Big Data on a managed HBase cluster.

Enter Secure HBase. Secure HBase adds support for table and column family ownership and access control. Secure HBase builds on recent development work in the Hadoop and HBase community, specifically:

Authentication

In our Secure HBase implementation, when HBase receives a command request via Remote Procedure Call (RPC), it uses doAs(), a method of the Principal JAAS (Java Authentication and Authorization System) class, to run the command. JAAS can be configured to use Kerberos to authenticate, so that at the point of the doAs() call, the user is guaranteed to be correctly authenticated.

The following code fragment shows how HBaseServer::run() changed to allow HBase to assume a user’s identity to run code as that user. Note especially line 28, where we do the aforementioned doAs.

diff --git a/src/main/java/org/apache/hadoop/hbase/ipc/HBaseServer.java b/src/main/java/org/apache/hadoop/hbase/ipc/HBaseServer.java
index 40473e3..2dec6b5 100644
--- a/src/main/java/org/apache/hadoop/hbase/ipc/HBaseServer.java
+++ b/src/main/java/org/apache/hadoop/hbase/ipc/HBaseServer.java
@@ -58,6 +58,7 @@
 import java.util.concurrent.ConcurrentHashMap;
 import java.util.concurrent.ExecutorService;
 import java.util.concurrent.Executors;
 import java.util.concurrent.LinkedBlockingQueue;
+import java.security.PrivilegedExceptionAction;

 import org.apache.commons.logging.Log;
 import org.apache.commons.logging.LogFactory;
@@ -1060,7 +1061,7 @@ public abstract class HBaseServer implements RpcServer {
       ByteArrayOutputStream buf = new ByteArrayOutputStream(buffersize);
       while (running) {
         try {
-          Call call = myCallQueue.take(); // pop the queue; maybe blocked here
+          final Call call = myCallQueue.take(); // pop the queue; maybe blocked here

           if (LOG.isDebugEnabled())
             LOG.debug(getName() + ": has #" + call.id + " from " +
@@ -1073,7 +1074,32 @@ public abstract class HBaseServer implements RpcServer {
           CurCall.set(call);
           // TODO: simple auth -- store user in context
           try {
-            value = call(call.connection.protocol, call.param, call.timestamp);             // make the call
+            Object obj = (Writable)call.connection.ticket.doAs(new PrivilegedExceptionAction<Object>() {
+                public Object run() throws IOException, InterruptedException {
+                  return call(call.connection.protocol, call.param,
+                              call.timestamp);
+                }
+              });
+
+            if (obj instanceof Writable) {
+              value = (Writable)obj;
+            }
+            else {
+              // doAs() return value could not be converted to Writable:
+              // probably should throw an exception here in that case.
+            }
           } catch (Throwable e) {
             LOG.debug(getName()+", call "+call+": error: " + e, e);
             errorClass = e.getClass().getName();

Permissions and Authorization

Overall, HBase has a relatively simpler feature set than relational databases, especially in terms of client operations. We don’t need to distinguish between an insert (new record) and update (of existing record), for example, as both collapse down into a Put() with HBase. The following table shows the relation of permissions to HBase client operations.

Permissions Required HBase Operation
Read Get
Scan
Exists
Write Put
Delete
Flush
Compact
Lock/Unlock Row
Read,Write Increment column value
Check and Delete
Check and Put

User actions that require authentication can be divided according to whether they are row-oriented or schema-oriented. The former involve reading or writing data in a table, while the latter involves changing the table structure : e.g., adding a column or column family, or deleting the entire table. For row-oriented operations, we use coprocessor methods, described in the next section. Schema-oriented operations, on the other hand, are enforced by the master, which does not use the coprocessor framework. Nevertheless, note that the code shown above runs on both the master and the regionserver, so whether the user wishes to do something that requires in-master functioning (such as creating or deleting a table) or in-regionserver functioning (such as reading or writing rows in the table), that functionality is performed with that user’s identity.

Table Ownership

Ownership is represented as key on a table’s TableDescriptor, which means that it’s stored in the regioninfo:owner column in the table’s corresponding .META. rows. We assume a simple owner authorization scheme whereby an owner can do anything on an owned table, including assigning permissions to it or disabling or deleting it. A table’s owner is initially set to the user who created it. The -ROOT- and .META. owned by the HBase system user. All users have permission to read both of these system tables, which is required if clients are to access other tables. A designated superuser (which can be configured in hbase-site.xml) can change a table’s ownership.

Coprocessors

As we mentioned above, Secure HBase depends on a Coprocessor framework (which we have developed here at Trend and have submitted to the entire HBase community for review). When a Coprocessor-enabled Regionserver starts, it loads the coprocessors specified in its configuration. Each loaded coprocessor implement the Coprocessor interface’s methods, which we discuss below. Coprocessors are loaded according to the configuration file hbase-site.xml‘s values for the property hbase.coprocessor.default.classes. The following configuration fragment loads our Access Control coprocessor:

<property>
<name>hbase.coprocessor.default.classes</name>
<value>org.apache.hadoop.hbase.security.rbac.AccessController</value>
</property>

The Coprocessor Interface specifies a set of pre- and post- methods, each of which corresponds to a client’s request to the Regionserver. For example, because there is a client
method Get(), the Coprocessor interface specifies a preGet() and a postGet() method corresponding this client request. A preX() or a postX() method may throw a CoprocessorException. Any number of coprocessors can be
loaded, each of which will be run on the client request. Each coprocessor is associated with a priority that determines the order in which the coprocessors are run. The following figure illustrates a client request to a regionserver with three coprocessors loaded. If a CoprocessorException is thrown by any pre- or post-coprocessor method in the pipeline, this exception will be communicated to the client, and no further servicing of the client request will occur.

example of 3 coprocs

We use the relevant pre- methods to do authorization. For example, if a client sends a Put(T,Row,...) to a regionserver, the regionserver checks in the authorization coprocessor’s prePut() method that the client’s username has permission to write to the table T. If the user has
the necessary permission, the actual Put() occurs. Otherwise, the Put() does not occur, and an AccessDeniedException is thrown. Below, in
the Example Request section, we give more detail on the complete lifecyle of an authorized client request.

The following table provides an enumeration of Coprocessor interface methods and how each relates to Authorization.

Method Permission checking Required permissions
Get() preGet() Read
Scan() preScan() Read
Put() prePut() Write

We make use of the Coprocessor Interface’s preX() methods for permission checking, as described above. But we also use two post-interface methods – namely postOpen() and postPut() – for Permission storage, which we now describe in detail.

Storage of Permissions

Our implementation allows permissions to be designated to entire tables or to specific column families of a table, but for the simplicity of explanation, we’ll only consider table-level permissions in this post. The permissions for table table1 are stored in three places:

  • The row in .META. for the first region of table1
  • The node /hbase/acl/table1 of Zookeeper
  • The in-memory Permissions Mirror of every regionserver that serves table1

The arrows in the figure below indicate flow of permission information in the system.

Three locations for permissions

.META.

Our implementation creates a new column family, acl:, in the .META. table. For any table T, permissions for T are stored in the first .META. row for T, in the acl:U column, for each user U who has permissions on T. This is the canonical location for permission information, and the source from which the other locations are copied. When a Secure HBase cluster first comes online and .META. regions are first loaded into RegionServers, permission information is copied to Zookeeper and from there, to the Regionservers’ permission mirrors. When permissions are modified by an administrator, the modifications similarly flow from .META. and then the Regionserver mirrors. The specifics of this copying are described below.

Zookeeper

Zookeeper acts as an intermediary between the canonical location in .META. on the one hand, and the Regionservers’ permission mirrors on the other. In Zookeeper, permission information for a given path /hbase/acl/T (where T is a specific table. For each such node T, all regionservers which serve a region of T set a watch on this node to watch for permission
information changes. Zookeeper acts as an instantaneous propagation of permission information from the canonical .META. store to the regionservers.

Regionserver in-memory mirror

Each regionserver maintains its own in-memory permission mirror for tables whose regions it serves. The mirror is populated by copying permissions from the relevant Zookeeper nodes when a region is opened, and consistency is maintained between Zookeeper and the Regionservers’ mirrors by Zookeeper Watches.

Zookeeper Permission Node population

When a regionserver opens a .META. region, the AccessController:postOpen() function scans all rows to find all first rows for each table represented in the .META. region. For each such row representing table T, a zookeeper node /hbase/acl/T is created, and the row’s acl: columns are copied into this zookeeper node as serialized user => permissions map: see figure below.

postOpen(.META.)

After opening .META.

Consistency Maintenance: from .META. to Zookeeper

An administrator’s changes to a table T‘s permissions is essentially simply a write (a Put()) to the the first row in .META. that corresponds to table T. Such a write causes the Coprocessor Interface’s postPut() implementation method to be called. This method writes to the corresponding node in Zookeeper: /hbase/acl/T.

postPut(.META.)

postPut(meta)

Consistency Maintenance: from Zookeeper to Regionserver Permission Mirror

The figure below shows two regionservers, each serves a region of table1 and tableN, respectively. Each has a ZKPermissionWatcher that implements the nodeCreated() and nodeChanged() methods. For both watcher methods, a change in the corresponding Zookeeper nodes causes a refresh of the Permissions Mirror.

zk2cache

Note that a regionserver R only uses its local Permissions Mirror (not Zookeeper or .META.) as the basis of authorizing client requests. This ensures that authorization of client is fast, since the regionserver needs only to look at its own Permission Mirror, rather than anything outside itself. However, for for correct authorizations to occur, Regionservers’ Permission Mirrors must always accurately reflect the permissions in .META.. For this, we rely on the fact that the population of the
Permissions Mirror happens before any client access because a table T can only be opened after the .META. region corresponding to the first region of T is opened. This guarantees that a regionserver of T will always have its permissions loaded prior to any client access to T.

To summarize the above discussion concerning permission locations, we quote from the relevant JIRA: “Coprocessor based simple access control” (HBASE-3025):

The acl: family in META serves as persistent storage for access policy and as the canonical interface for defining access permissions. ZooKeeper serves to immediately and atomically propagate policy changes into the local permissions caches of all
nodes in the cluster. For the typical user operation neither .META. nor ZooKeeper need be consulted when determining if a user has sufficient access privilege; up to date information will be found in the local cache.

Note finally that changing permissions does not require the table to be disabled (go offline): permission changes to enabled tables are immediately and atomically propagated from .META. to affected regionservers via Zookeeper watches.

Example Request

We’ve just described how Permissions are stored. Now it’s time to look at how the regionservers use their local Permission Mirrors to perform authorization.

Here we show an end-to-end read request in Secure HBase. Below, a user U attempts a Get() on a table T. In the example below, S represents the user that started the HBase regionserver process and whom it usually runs as).

Component User Action Class Method
1 client U sends IPC call C: <U,'get',T> to regionserver. HTable get()
2 regionserver S receives C:<U,'get',T>. HBaseServer processData()
3 regionserver S => U U.doAs('get',T) run()
4 regionserver U check in-memory Permission Mirror:

UserPerms = getUserPermissions(UserGroupInformation.getCurrentUser(),T)
AccessController preGet()
5 regionserver U
          if (UserPerms imply Get)
        then return;
        else throw AccessDeniedException;
6 client U Receives either the requested Get() return value (if permitted) or AccessDeniedException (if not). HTable get()

In step 3, the code that we discussed in the “Authorization” section runs: the regionserver uses doAs() to transition from running as user S (the HBase “System” user), to user U, the client user). This is analogous to the Unix sudo command. In Step 4, the regionserver calls getCurrentUser(), analogous to the Unix whoami command. In step 5, the regionserver checks its Permissions Mirror to determine whether U‘s permissions imply that U can perform a Get. Note also that in Step 5, in preGet(), the return allows the Get call to
continue, (which will eventually return the requested value), whereas throwing AccessDeniedException will pre-empt the Get() call (that is, the Get() call will not happen).

Future work

There’s still plenty to do on Secure Hbase, and we at Trend welcome you, as a member of the larger HBase community, to contribute. We encourage you to fork from our Github security branch of HBase. This blog post describes the implementation available in that branch.

Here are a few enhancements that we’re interested in:

Roles

Groups and roles are distinct conceptually. A user’s groups are stored in an object representing that user of type org.apache.hadoop.UserGroupInformation. A role, on the other hand, is a set whose members may have varying types: users, groups, and even other roles. These sets, used in sophisticated Role-Based Access Control (RBAC) implementations available in RDBMSes such as PostgreSQL, are what we’d like to provide to users of Secure HBase.

As discussed in HBASE-3025, we intend roles to be defined internally to HBase: an HBase table would allow us to map users to a user’s roles.

Per-row or per-cell access controls

Our current implementation only enforces permissions at the table and column family, though per-row and per-cell permissions may be a desirable feature for later implementation.

Zookeeper ACLs

Currently, Zookeeper is a weak point in the security scheme presented here, since any client may connect to Zookeeper and modify the permissions encoded there. We intend to build and contribute a Kerberos authentication plugin for Zookeeper to supplement the
existing Authentication and Authorization functionality in Zookeeper. This would conform to the documentation that Benjamin Reed supplied in “Document how to integrate 3rd party authentication into ZK server ACLs” (ZOOKEEPER-329).

Auditing trails

From [Discretionary Access Controls] Audit (HBASE-2014):

Important actions taken by subjects should be logged for accountability, a chronological record which enables the full reconstruction and examination of a sequence of events, e.g. schema changes or data mutations. Logging activity should be protected from all subjects except for a restricted set with administrative privilege, perhaps to only a single super-user.

As the JIRA mentions, the audit trail would likely integrate with Facebook’s Scribe or Cloudera’s Flume as an audit record sink.

One Response to Secure HBase : Access Controls

  1. [...] This post was mentioned on Twitter by Andrew Purtell, Takuya UESHIN. Takuya UESHIN said: TODO @ueshin: あとで読む。 Secure HBase : Access Controls http://bit.ly/cDJyJR #HBase [...]

Leave a Reply

Your email address will not be published. Required fields are marked *

*