Thursday, July 24, 2014

Book Review: WildFly Performance Tuning

Packt Publishing recently published Arnold Johansson's and Anders Welén's WildFly Performance Tuning with the subtitle, "Develop high-performing server applications using the widely successful WildFly platform." As I discussed in my post Video Review: JBoss EAP Configuration, Deployment, and Administration, WildFly is the new name for the freely downloaded (community edition) version of the JBoss Application Server ("next generation of JBoss AS"). This post is my review of the PDF version of WildFly Performance Tuning.

Preface: Summary of Book's Contents and Who Book is Intended For

The Preface of Packt books is typically a useful place to quickly determine whether a Packt book is likely to address the subjects one is interested in and this continues to be the case with this book. The Preface of WildFly Performance Tuning contains brief summaries of what each of the book's ten chapters covers. It also explains what readers need to run many of the examples covered in the book (the book does cover some performance metrics tools not on this list because they are considered "optional" by the authors): Java SE 7 (I assume Java SE 8 will be okay), VisualVM (examples are based on VisualVM 1.3.6), and JMeter.

The "Who this book is for" section of the Preface describes the audience of this book and is a pretty good indicator of the content of the book: "This book is for anyone interested in what performance tuning is all about and how to do it using Java technologies, the WildFly Application Server, and other Open Source software." I also like that this section highlights something I was going to highlight in this review: Chapter 1 covers very general performance tuning principles, Chapter 2 and Chapter 3 cover performance tuning principles general to Java applications, and the remainder of the chapters focus on WildFly-specific performance tuning concepts and principles.

Chapter 1: The Science of Performance Tuning

The authors state in the initial chapter of WildFly Performance Tuning that "performance testing and tuning is a science." The chapter introduces, and describes in some detail, key performance-related terms such as response time, throughput, and scalability. The first chapter also covers some common high-level "performance tuning anti-patterns," many of which are more process and organizational in nature than technical in nature. Chapter 1 describes the software development cycle and how performance considerations need to be made in architecture and design phases and not simply just before shipping. There is a small amount of introductory WildFly-specific and Java/JMX-specific discussion in this chapter, but most of the chapter is on concepts at a higher level than a specific application server or even specific programming language or platform.

Chapter 2: Tools of the Tuning Trade

The second chapter of WildFly Performance Tuning discusses using tools "from the open source ecosystem" for JVM profiling and sampling, for operating system monitoring, and for load testing. The tools covered include VisualVM (including remote access with jstatd and JMX), top, netstat, vmstat (and vm_stat), OS X Activity Monitor, WildFly Command Line Interface (CLI), WildFly Management Console (HAL), and Apache JMeter.

Although the focus of this second chapter is on tools for profiling, monitoring, and load testing, there is more WildFly specific content in this chapter than in the first chapter. Several of the concepts and tools described are done so from a WildFly perspective.

Chapter 3: Tuning the Java Virtual Machine

Chapter 3 of WildFly Performance Tuning takes on the ambitious task of explaining basics of the JVM. It's particular focus is on tuning Oracle's HotSpot JVM and the authors state, "We focus on Java SE 7 based on the Oracle JDK 7 distribution in this book, we'll follow that trail by looking at its VM implementation named Hotspot." One consequence of the Java 7 JVM being covered is that changes to the JVM in Java 8 (such as removal of permanent generation) are not covered. A single chapter is necessarily briefer coverage than one would find in a full book such as Java Performance.

Chapter 3's JVM coverage begins by describing the relationship between concepts described by the terms Java Virtual Machine (JVM), Java Runtime Environment (JRE), and Java Development Kit (JDK). I would expect readers of a book on performance tuning related to a Java-based application server to already know the difference between the JDK and JRE, but this material is there for those who aren't aware of the difference. I think Oracle's JRE 8 Download page has about as clear of an explanation of the difference between the JDK and JRE as I have seen for non-developers:

Do you want to run JavaTM programs, or do you want to develop Java programs? If you want to run Java programs, but not develop them, download the Java Runtime Environment, or JRETM. If you want to develop applications for Java, download the Java Development Kit, or JDKTM. The JDK includes the JRE, so you do not have to download both separately.

The third chapter talks about the JVM stacks, the JVM heap, and garbage collection (GC) with a HotSpot JVM focus. The chapter introduces some of the most commonly used HotSpot JVM options such as -XX:+PrintFlagsFinal, -client and -server, -Xss, -Xms, and -Xmx. After reviewing these options, the chapter looks at how to determine through analysis and monitoring what the settings should be for initial heap size and maximum heap size. The chapter looks at using heap options -XX:MaxNewSize, -XX:NewSize, and -XX:NewRatio together to manage the size of the young generation portion of the heap.

Chapter 3's coverage of balancing eden space and survivor space on the JVM heap introduces the options -XX:+PrintTenuringDistribution, -XX:InitialTenuringThreshold, -XX:MaxTenuringThreshold, -XX:SurvivorRatio, and -XX:TargetSurvivorRatio. There is also a short description of permanent generation heap space with a side note that it is removed in Java 8.

The JVM option -XX:+UseLargePages for large memory pages is covered in Chapter 3 before moving onto discussion of OutOfMemoryError. This portion of the chapter has a highly useful break-down of several of the different specific types of OutOfMemoryError and how to deal with them.

There is also Chapter 3 discussion on memory leaks in Java and how they can be identified and resolved. An example is presented to illustrate using VisualVM to diagnose and address a memory leak.

The section of Chapter 3 that focuses on garbage collection strategies introduces and briefly describes the four "most common strategies of the Hotspot VM" (serial collector, parallel collector, concurrent collector, and G1 ["incremental parallel compacting"] collector). This section also briefly compares these collectors and describes the situations in which each fits best.

Chapter 3 introduces three ways to specify JVM options for the JVM in which an instance of WildFly is running. The chapter then goes onto introduce and briefly describe four virtual machine logging parameters that the authors assert "should always be set in your production environment to log relevant information" (-verbose:gc, -XX:+PrintGCDetails, -XX:+PrintTenuringDistribution, and -Xloggc). The authors list several available tools (such as GCViewer and the tools provided with the Oracle JDK) for analyzing this log output.

Chapter 4: Tuning WildFly

Each of the first three chapters provided a little more WildFly specific information than the one before it, but even the third chapter was mostly more "general Java" than "specifically WildFly." Chapter 4, which begins about one-third of the way into the book, is the first chapter with heavy emphasis on WildFly specifically. It is in this chapter that WildFly background information is provided. For example, the authors state, "WildFly is, as of 2013, the new name of the historically famous and well renowned JBoss Application Server (JBoss AS)."

The fourth chapter of WildFly Performance Tuning provides a history of WildFly as it evolved from Enterprise Java Bean Open Source Software (EJB-OSS) to JBoss Application Server to WildFly. A table in this section associates major releases of JBoss/WildFly with the version of J2EE/Java EE it implemented (albeit not always being certified). This table indicates that WildFly 8 (first version with new name WildFly) implements Java EE 7. There is a brief section of Chapter 4 on WildFly architecture that talks about WildFly's modular design with separate components and subsystems.

Chapter 4 describes each of the WildFly subsystems and their associated pools: thread pool executor with six of its offered thread pools (listed in Table of Contents), resource adapter subsystem (makes use of Java EE Connector Architecture [JCA] and Resource Archives [RAR files]), batch subsystem (implements Java EE 7 Batch API), remoting subsystem (based on NIO and XNIO), transaction subsystem, and logging subsystem. Several pages are devoted to configuring WildFly logging.

Chapter 5: EJB Tuning in WildFly

The fifth through ninth chapters of WildFly Performance Tuning each focus on tuning of a particular Java EE component type within WildFly. Chapter 5's focus is Enterprise JavaBean (EJB) tuning and the chapter begins with a history of EJB and a brief description of the various types of EJBs.

The "Performance tuning EJBs in WildFly" section of the fifth chapter begins by showing the CLI command to instruct WildFly to generate detailed EJB-related statistics. Several of the approaches listed in this chapter are true of Java EE optimization in general (not limited to WildFly). Examples of this include using pass-by-reference (@Local/shallow/not serialized) rather than pass-by-value (@Remote/deep/serialized) when possible and using coarse grained access for remote calls.

Chapter 5 contains numerous examples of WildFly-specific monitoring and tuning of different types of EJBs (stateless, stateful, singleton session, and message-driven). Many of the monitoring examples use CLI commands and JMX/JConsole and the tuning examples tend to be via XML configuration.

Chapter 6: Tuning the Persistence Layer

WildFly Performance Tuning's sixth chapter presents some WildFly-specific tuning information, but much of the information presented in this chapter is more general (database design, SQL statement design, JDBC, etc.). The chapter begins with an overview of normalization in relational database design, database partitioning and indexes, all at a high-level and covered in general terms rather than focusing on a specific relational database implementation.

Chapter 6 describes three approaches to tuning JDBC: connection pooling, setting fetch size [ResultSet.setFetchSize(int) and Statement.setFetchSize(int)], batching [Statement.executeBatch()], and use of PreparedStatements. The sections on connection pooling and using PreparedStatements include information on how to specify and monitor these things in WildFly. There are short sections in this chapter on database isolation and low-level JDBC network tuning.

A significant portion of Chapter 6 is focused on tuning Java Persistence API (JPA) and Hibernate. The authors state, "WildFly 8.0.0.Final makes use of the JPA 2.1 specification and the bundled module since the provider is Hibernate in Version 4.3.1." Numerous tactics and strategies related to JPA are covered here and the list of them is available in the listing for Chapter 6 in the Table of Contents (includes entity caching and named queries).

Most of Chapter 6 applies to the persistence layer in any Java EE application server, but there are some sections that discuss WildFly specifics.

Chapter 7: Tuning the Web Container in WildFly

Chapter 7 introduces Undertow, described on its web page as "a flexible performant web server written in java, providing both blocking and non-blocking API’s based on NIO." The chapter provides some Undertow implementation details and also mentions several things that are not yet fully implemented, but are planned for the future. The seventh chapter moves onto tuning the Undertow Worker and buffer pool using WildFly CLI and the WildFly Admin Console.

Chapter 7 introduces Jastow, which it describes as "the JSP engine in Undertow" and "a fork of the Apache Jasper project." This section provides a table with a subset of options available for tuning Jastow performance and demonstrates using CLI, JMX, and Admin Console to view these settings.

WildFly Performance Tuning's seventh chapter ends with a discussion of using Apache HTTPD as a "frontend" to WildFly. As part of this discussion, this section introduces Apache JServ Protocol (AJP) as an alternative to HTTP for communication between the Apache HTTPD server and the WildFly instance(s). Examples are included to demonstrate configuring Apache HTTPD and WildFly for this configuration.

Chapter 8: Tuning Web Applications and Services

Chapter 8 begins it coverage of web applications and web services with a history of web application development from Common Gateway Interface (CGI) to Java servlets to JavaServer Pages (JSP) to JavaServer Faces (JSF). There is also some discussion regarding why one might choose to use or not use JSF and the authors mention that "WildFly ships with the Mojorra JSF 2.2 implementation."

Chapter 8 has sections on tuning web applications and tuning web services. The section on tuning web applications presents performance considerations related to servlets and JSPs (with JSTL), JSF, and JSF with PrimeFaces 4.x. The chapter uses a "simple data table" example to illustrate performance considerations for all three approaches to building Java EE web applications. Some of the topics covered along the way include JSP scopes, session timeouts, asynchronous servlets, JSF client state versus server state, and JSF immediate. There are several topics specific to Undertow and JSF component libraries (PrimeFaces and RichFaces) also covered in the web application performance tuning section before brief discussion of Java support for WebSockets.

Chapter 8's section of performance tuning web services covers Java API for XML Web Services (JAX-WS) along with WildFly's implementation of JAX-WS [JBossWS and Apache CXF (JBossWS-CXF)] and JAXB. Recommendations for reducing XML messaging overhead are covered. The section of web services tuning also includs a very short section on REST-based web services [Java API for RESTful Web Services (JAX-RS)].

Chapter 9: JMS and HornetQ

WildFly Performance Tuning's ninth chapter focuses on Java Message Service (JMS) and HornetQ version 2.4.1.Final (WildFly's JMS broker provider). The chapter covers JMS basics such as JMS destination types and JMS message types. It discusses general JMS tuning ideas such as tuning the JMS session and tuning the JMS MessageProducer.

A significant portion of Chapter 9 focuses on performance tuning options specific to HornetQ. These recommendations, which include "using input and output streams for handling large messages, are listed in the Table of Contents. The chapter concludes with examples of using WildFly-provided mechanisms to monitor HornetQ.

Chapter 10: WildFly Clustering

The tenth and final chapter of WildFly Performance Tuning focuses on WildFly clustering. The chapter begins with definitions of clustering and distributed computing terms and discussion of the basics of clustering. It opens up its coverage of WildFly clustering by pointing out that WildFly's clustering is built upon JGroups and Infinispan. The chapter then covers using each of these foundational open source projects. Chapter 10 then runs down the Java EE components and explains which have Java EE standard cluster support and which have WildFly proprietary cluster support.

General Observations
  • WildFly Performance Tuning covers both Java EE and JDBC standard tuning concepts as well as proprietary WildFly tuning concepts. Readers who are familiar with tuning other application servers and JDBC applications will likely be able to skim or skip chapters 1 through 3 and much of chapter 6. Readers who are new to performance tuning of Java EE (and Java SE) applications will likely find the inclusion of these more general chapters useful.
  • Chapter 3 ("Tuning the Java Virtual Machine") has very little WildFly-specific detail in it, but a related consequence of this is that this is a great introductory chapter for anyone interested in learning principles of JVM tuning that apply to various Java EE implementations and even to Java SE applications. This chapter provides a quick preview of the most important concepts in JVM tuning.
  • The PDF edition I reviewed has color screen snapshots.
  • Code examples are generally well-formatted and indented, but do not have any color syntax or line numbers.
  • With rare exceptions, the text flows well and sentences are generally easy to understand.
  • A relatively small number of items discussed in this book have been delayed to WildFly 9 or else are expected in the near future. The book does a nice job of calling these relatively infrequent cases out and making it clear what's in WildFly 8.x and what will be in WildFly 9.x.
Conclusion

WildFly Performance Tuning provides background information, descriptive text, and illustrative examples that allow the reader to learn principles, concepts, and approaches that can be applied to tuning a WildFly-hosted application. WildFly Performance Tuning mixes general JVM and Java EE tuning principles and tools with WildFly-specific tuning principles and tools to provide a composite view of how performance tuning of WildFly applications can be performed.

Monday, July 7, 2014

Custom Cassandra Data Types

In the blog post Connecting to Cassandra from Java, I mentioned that one advantage for Java developers of Cassandra being implemented in Java is the ability to create custom Cassandra data types. In this post, I outline how to do this in greater detail.

Cassandra has numerous built-in data types, but there are situations in which one may want to add a custom type. Cassandra custom data types are implemented in Java by extending the org.apache.cassandra.db.marshal.AbstractType class. The class that extends this must ultimately implement three methods with the following signatures:

public ByteBuffer fromString(final String) throws MarshalException
public TypeSerializer getSerializer()
public int compare(Object, Object)

This post's example implementation of AbstractType is shown in the next code listing.

UnitedStatesState.java - Extends AbstractType
package dustin.examples.cassandra.cqltypes;

import org.apache.cassandra.db.marshal.AbstractType;
import org.apache.cassandra.serializers.MarshalException;
import org.apache.cassandra.serializers.TypeSerializer;

import java.nio.ByteBuffer;

/**
 * Representation of a state in the United States that
 * can be persisted to Cassandra database.
 */
public class UnitedStatesState extends AbstractType
{
   public static final UnitedStatesState instance = new UnitedStatesState();

   @Override
   public ByteBuffer fromString(final String stateName) throws MarshalException
   {
      return getStateAbbreviationAsByteBuffer(stateName);
   }

   @Override
   public TypeSerializer getSerializer()
   {
      return UnitedStatesStateSerializer.instance;
   }

   @Override
   public int compare(Object o1, Object o2)
   {
      if (o1 == null && o2 == null)
      {
         return 0;
      }
      else if (o1 == null)
      {
         return 1;
      }
      else if (o2 == null)
      {
         return -1;
      }
      else
      {
         return o1.toString().compareTo(o2.toString());
      }
   }

   /**
    * Provide standard two-letter abbreviation for United States
    * state whose state name is provided.
    *
    * @param stateName Name of state whose abbreviation is desired.
    * @return State's abbreviation as a ByteBuffer; will return "UK"
    *    if provided state name is unexpected value.
    */
   private ByteBuffer getStateAbbreviationAsByteBuffer(final String stateName)
   {
      final String upperCaseStateName = stateName != null ? stateName.toUpperCase().replace(" ", "_") : "UNKNOWN";
      String abbreviation;
      try
      {
         abbreviation =  upperCaseStateName.length() == 2
                       ? State.fromAbbreviation(upperCaseStateName).getStateAbbreviation()
                       : State.valueOf(upperCaseStateName).getStateAbbreviation();
      }
      catch (Exception exception)
      {
         abbreviation = State.UNKNOWN.getStateAbbreviation();
      }
      return ByteBuffer.wrap(abbreviation.getBytes());
   }
}

The above class listing references the State enum, which is shown next.

State.java
package dustin.examples.cassandra.cqltypes;

/**
 * Representation of state in the United States.
 */
public enum State
{
   ALABAMA("Alabama", "AL"),
   ALASKA("Alaska", "AK"),
   ARIZONA("Arizona", "AZ"),
   ARKANSAS("Arkansas", "AR"),
   CALIFORNIA("California", "CA"),
   COLORADO("Colorado", "CO"),
   CONNECTICUT("Connecticut", "CT"),
   DELAWARE("Delaware", "DE"),
   DISTRICT_OF_COLUMBIA("District of Columbia", "DC"),
   FLORIDA("Florida", "FL"),
   GEORGIA("Georgia", "GA"),
   HAWAII("Hawaii", "HI"),
   IDAHO("Idaho", "ID"),
   ILLINOIS("Illinois", "IL"),
   INDIANA("Indiana", "IN"),
   IOWA("Iowa", "IA"),
   KANSAS("Kansas", "KS"),
   LOUISIANA("Louisiana", "LA"),
   MAINE("Maine", "ME"),
   MARYLAND("Maryland", "MD"),
   MASSACHUSETTS("Massachusetts", "MA"),
   MICHIGAN("Michigan", "MI"),
   MINNESOTA("Minnesota", "MN"),
   MISSISSIPPI("Mississippi", "MS"),
   MISSOURI("Missouri", "MO"),
   MONTANA("Montana", "MT"),
   NEBRASKA("Nebraska", "NE"),
   NEVADA("Nevada", "NV"),
   NEW_HAMPSHIRE("New Hampshire", "NH"),
   NEW_JERSEY("New Jersey", "NJ"),
   NEW_MEXICO("New Mexico", "NM"),
   NORTH_CAROLINA("North Carolina", "NC"),
   NORTH_DAKOTA("North Dakota", "ND"),
   NEW_YORK("New York", "NY"),
   OHIO("Ohio", "OH"),
   OKLAHOMA("Oklahoma", "OK"),
   OREGON("Oregon", "OR"),
   PENNSYLVANIA("Pennsylvania", "PA"),
   RHODE_ISLAND("Rhode Island", "RI"),
   SOUTH_CAROLINA("South Carolina", "SC"),
   SOUTH_DAKOTA("South Dakota", "SD"),
   TENNESSEE("Tennessee", "TN"),
   TEXAS("Texas", "TX"),
   UTAH("Utah", "UT"),
   VERMONT("Vermont", "VT"),
   VIRGINIA("Virginia", "VA"),
   WASHINGTON("Washington", "WA"),
   WEST_VIRGINIA("West Virginia", "WV"),
   WISCONSIN("Wisconsin", "WI"),
   WYOMING("Wyoming", "WY"),
   UNKNOWN("Unknown", "UK");

   private String stateName;

   private String stateAbbreviation;

   State(final String newStateName, final String newStateAbbreviation)
   {
      this.stateName = newStateName;
      this.stateAbbreviation = newStateAbbreviation;
   }

   public String getStateName()
   {
      return this.stateName;
   }

   public String getStateAbbreviation()
   {
      return this.stateAbbreviation;
   }

   public static State fromAbbreviation(final String candidateAbbreviation)
   {
      State match = UNKNOWN;
      if (candidateAbbreviation != null && candidateAbbreviation.length() == 2)
      {
         final String upperAbbreviation = candidateAbbreviation.toUpperCase();
         for (final State state : State.values())
         {
            if (state.stateAbbreviation.equals(upperAbbreviation))
            {
               match = state;
            }
         }
      }
      return match;
   }
}

We can also provide an implementation of the TypeSerializer interface returned by the getSerializer() method shown above. That class implementing TypeSerializer is typically most easily written by extending one of the numerous existing implementations of TypeSerializer that Cassandra provides in the org.apache.cassandra.serializers package. In my example, my custom Serializer extends AbstractTextSerializer and the only method I need to add has the signature public void validate(final ByteBuffer bytes) throws MarshalException. Both of my custom classes need to provide a reference to an instance of themselves via static access. Here is the class that implements TypeSerializer via extension of AbstractTypeSerializer:

UnitedStatesStateSerializer.java - Implements TypeSerializer
package dustin.examples.cassandra.cqltypes;

import org.apache.cassandra.serializers.AbstractTextSerializer;
import org.apache.cassandra.serializers.MarshalException;

import java.nio.ByteBuffer;
import java.nio.charset.StandardCharsets;

/**
 * Serializer for UnitedStatesState.
 */
public class UnitedStatesStateSerializer extends AbstractTextSerializer
{
   public static final UnitedStatesStateSerializer instance = new UnitedStatesStateSerializer();

   private UnitedStatesStateSerializer()
   {
      super(StandardCharsets.UTF_8);
   }

   /**
    * Validates provided ByteBuffer contents to ensure they can
    * be modeled in the UnitedStatesState Cassandra/CQL data type.
    * This allows for a full state name to be specified or for its
    * two-digit abbreviation to be specified and either is considered
    * valid.
    *
    * @param bytes ByteBuffer whose contents are to be validated.
    * @throws MarshalException Thrown if provided data is invalid.
    */
   @Override
   public void validate(final ByteBuffer bytes) throws MarshalException
   {
      try
      {
         final String stringFormat = new String(bytes.array()).toUpperCase();
         final State state =  stringFormat.length() == 2
                            ? State.fromAbbreviation(stringFormat)
                            : State.valueOf(stringFormat);
      }
      catch (Exception exception)
      {
         throw new MarshalException("Invalid model cannot be marshaled as UnitedStatesState.");
      }
   }
}

With the classes for creating a custom CQL data type written, they need to be compiled into .class files and archived in a JAR file. This process (compiling with javac -cp "C:\Program Files\DataStax Community\apache-cassandra\lib\*" -sourcepath src -d classes src\dustin\examples\cassandra\cqltypes\*.java and archiving the generated .class files into a JAR named CustomCqlTypes.jar with jar cvf CustomCqlTypes.jar *) is shown in the following screen snapshot.

The JAR with the class definitions of the custom CQL type classes needs to be placed in the Cassandra installation's lib directory as demonstrated in the next screen snapshot.

With the JAR containing the custom CQL data type classes implementations in the Cassandra installation's lib directory, Cassandra should be restarted so that it will be able to "see" these custom data type definitions.

The next code listing shows a Cassandra Query Language (CQL) statement for creating a table using the new custom type dustin.examples.cassandra.cqltypes.UnitedStatesState.

createAddress.cql
CREATE TABLE us_address
(
   id uuid,
   street1 text,
   street2 text,
   city text,
   state 'dustin.examples.cassandra.cqltypes.UnitedStatesState',
   zipcode text,
   PRIMARY KEY(id)
);

The next screen snapshot demonstrates the results of running the createAddress.cql code above by describing the created table in cqlsh.

The above screen snapshot demonstrates that the custom type dustin.examples.cassandra.cqltypes.UnitedStatesState is the type for the state column of the us_address table.

A new row can be added to the US_ADDRESS table with a normal INSERT. For example, the following screen snapshot demonstrates inserting an address with the command INSERT INTO us_address (id, street1, street2, city, state, zipcode) VALUES (blobAsUuid(timeuuidAsBlob(now())), '350 Fifth Avenue', '', 'New York', 'New York', '10118');:

Note that while the INSERT statement inserted "New York" for the state, it is stored as "NY".

If I run an INSERT statement in cqlsh using an abbreviation to start with (INSERT INTO us_address (id, street1, street2, city, state, zipcode) VALUES (blobAsUuid(timeuuidAsBlob(now())), '350 Fifth Avenue', '', 'New York', 'NY', '10118');), it still works as shown in the output shown below.

In my example, an invalid state does not prevent an INSERT from occurring, but instead persists the state as "UK" (for unknown) [see the implementation of this in UnitedStatesState.getStateAbbreviationAsByteBuffer(String)].

One of the first advantages that comes to mind justifying why one might want to implement a custom CQL datatype in Java is the ability to employ behavior similar to that provided by check constraints in relational databases. For example, in this post, my sample ensured that any state column entered for a new row was either one of the fifty states of the United States, the District of Columbia, or "UK" for unknown. No other values can be inserted into that column's value.

Another advantage of the custom data type is the ability to massage the data into a preferred form. In this example, I changed every state name to an uppercase two-digit abbreviation. In other cases, I might want to always store in uppercase or always store in lowercase or map finite sets of strings to numeric values. The custom CQL datatype allows for customized validation and representation of values in the Cassandra database.

Conclusion

This post has been an introductory look at implementing custom CQL datatypes in Cassandra. As I play with this concept more and try different things out, I hope to write another blog post on some more subtle observations that I make. As this post shows, it is fairly easy to write and use a custom CQL datatype, especially for Java developers.

Friday, June 27, 2014

Next Generation Project Valhalla Proposed

Earlier this week, Brian Goetz proposed Project Valhalla on an OpenJDK mailing list. Goetz's proposal states:

In accordance with the OpenJDK guidelines, this project will provide a venue to explore and incubate advanced Java VM and Language feature candidates such as Value Types, Generic Specialization, enhanced volatiles (and possibly other related topics, such as reified generics.)

Note that Goetz's proposal should not be confused with the 1997 version of Oracle's Project Valhalla that is described as "the code name for Oracle's flexible Java development environment for building, debugging and deploying component-based applications for the network computing platform." This "other" Project Valhalla is also described in the 15 December 1997 edition of InfoWorld: "Code-named Project Valhalla, the Oracle AppBuilder for Java is slated to ship in the first quarter of 1998. The Java tool, which is based on Borland's JBuilder Java IDE, is designed for writing n-tier, thin client applications."

The "other" Project Valhalla became AppBuilder for Java and was included with JDeveloper Suite (15 April 1998). The Oracle Application Server (4.0 at the time) that was part of this same JDeveloper Suite has new company at Oracle with Oracle's acquiring WebLogic with the BEA acquisition and acquiring GlassFish with the Sun acquisition and JDeveloper is an Oracle-provided (free in the "free beer" sense, but not open source) Java IDE.

Returning to the "next generation" of Project Valhalla, the voting on this Goetz proposal closes on 7 July 2014. Several have already expressed "yes" in replies to Goetz's original e-mail post. In that forum, Patrick Wright asked, "how is this/will this be different from the Da Vinci Project/MLVM?" and John Rose answered that "the charter for Da Vinci is to incubate JVM features for languages other than Java ... or for general language support without associated Java language changes" while "Valhalla is intended to support the evolution of Java itself."

I was excited about Project Coin when it was announced and enjoyed reading and hearing about its progress, but Project Valhalla would be much more ambitious and much more exciting if it is investigated and some or all of the proposed features are implemented.

Wednesday, June 25, 2014

Book Review: Penetration Testing with the Bash Shell

This post is my review of the Packt Publishing book Penetration Testing with the Bash shell by Keith Makan. I think it's important to emphasize its subtitle: "Make the most of the Bash shell and Kali Linux's command-line-based security assessment tools." As the main title and sub-title suggest, this book is about penetration testing with Bash using the Kali Linux. The book has approximately 125 pages covering 5 chapters.

The Preface of Penetration Testing with the Bash shell begins with an articulation of why it is important for penetration testers to be familiar with command-line tools. The Preface also provides brief summaries of each of the five chapters in the book and some of the command-line tools covered in each chapter.

Under the "What you need for this book" section of Penetration Testing with the Bash shell's Preface, Makan writes, "The only software requirement for this book is the Kali Linux operating system, which you can download in the ISO format from http://www.kali.org." You don't necessarily need to use Kali Linux as many of the bash examples and described command-line tools are also available with or for other Linux distributions. The "Who this book is for" section of the Preface states, "Command line hacking is a book for anyone interested in learning how to wield their Kali Linux command lines to perform effective penetration testing."

Chapter 1: Getting to Know Bash

The initial chapter explains that "throughout the book, the bash environment or the host operating system that will be discussed will be Kali Linux" and that "Kali Linux is a distribution adapted from Debian." The chapter then goes onto explain with text and examples Bash concepts such as man pages and bash commands such cd, ls, pwd, and find. The chapter also describes using Linux redirection (output and input) and pipes before concluding with a discussion of grep and regular expressions. As the author suggests, this chapter could be generally be skipped by developers familiar with Linux, but would be must-read information for those new to Linux. Although Kali Linux is used for the examples, I didn't see anything in this initial chapter that seemed specific to Kali Linux.

Chapter 2: Customizing Your Shell

The second chapter of Penetration Testing with the Bash shell begins with coverage of terminal control sequences to change appearance of text in the terminal. The chapter builds upon this information to demonstrate customization of the terminal prompt. Chapter 2 also introduces aliases, customizing command history, and customizing tab completion. Like the first chapter, this second chapter covers additional general Bash concepts rather than specifics of penetration testing.

Chapter 3: Network Reconnaissance

Chapter 3 of Penetration Testing with the Bash shell transitions from coverage of Linux and bash to "move on to using the shell and the Kali Linux command-line utilities to collect information about the networks you find yourself in." The chapter examines tools such as whois, dig, dnsmap, Nmap, and arping.

Chapter 4: Exploitation and Reverse Engineering

Penetration Testing with the Bash shell's fourth chapter focuses on reverse engineering and tools that "may enable you to discover memory corruption, code injection, and general data- or file-handling flaws that may be used to instantiate arbitrary code execution vulnerabilities." Tools covered in this chapter include Metasploit (including command-line msfcli, mixing msfcli with bash commands and other command line tools, msfpayload, Meterpreter), objdump, and GDB.

Chapter 5: Network Exploitation and Monitoring

The fifth and final chapter of Penetration Testing with the Bash shell focuses "on the network exploitation available in Kali Linux and how to take advantage of it in the modern bash shell environment." This relatively longer chapter begins with discussion of "Media Access Control (MAC) spoofing" and "Address Resolution Protocol (ARP) abuse." The chapter looks at tools such as macchangeer, ifconfig, arpspoof.

Chapter 5 describes man-in-the-middle (MITM) attacks and provides a detailed introduction to ettercap. The chapter also explains server interrogation and describes tools for Simple Network Management Protocol (SNMP) interrogation (snmpwalk and Metasploit's snmp_enum and snmp_login) and for Simple Mail Transfer Protocol (SMTP) interrogation (smtp-user-enum).

There is a section in Chapter 5 on brute force authentication that demonstrates use of Medusa. The section of Chapter 5 on traffic filtering demonstrates use of TCPDump. SSLyze is demonstrated as a tool for "assess[ing] SSL/TLS implementations" and SkipFish and Arachni are demonstrated as tools for scanning web pages and web applications for vulnerabilities.

General Observations
  • The electronic version of Penetration Testing with the Bash shell that I reviewed included numerous colored screen snapshots.
  • I liked the "Further Reading" sections at the end of each chapter that provided links to online information with more detail about subjects covered in the chapter.
  • The first two chapters of Penetration Testing with the Bash shell have very little information specific to penetration testing but provide an overview of some bash features used in later chapters. The third, fourth, and fifth chapters are heavily focused on penetration testing and build upon the bash basics covered in the first two chapters.
  • Penetration Testing with the Bash shell provides background information on various security and assessment concepts as its illustrates the tools available that are related to those concepts.
  • There are some awkward sentence structures in Penetration Testing with the Bash shell and several of the sentences are too long for my taste (especially early in the book), but I was generally able to make out the intent of the writing. An example of the occasional awkward but understandable text is on the first page of the first chapter: "Why are discussing the bash shell?"
  • Although this book does mention Kali Linux specifically and frequently, the majority of the described tools are available as built-in tools or as tools that can be installed separately with other flavors of Linux. With no or just a a bit of effort, one could run the examples in different Linux implementations. It's also worth pointing out that the author repeatedly mentions that several commands runs with no special effort in Kali Linux because things are run as root. In other Linux implementations, you may need to use sudo to apply these tools.
  • Although I am fairly familiar with bash, I still learned a few new things from the first two chapters (primarily the second chapter). I am far less experienced with penetration testing and learned quite a bit from the other three chapters.
Conclusion

Penetration Testing with the Bash shell: Make the most of the Bash shell and Kali Linux's command-line-based security assessment tools outlines how bash in general and Kali Linux in particular provide command-line security assessment tools. The book introduces the tools and how to apply them and explains security-related concepts along the way.

Monday, June 23, 2014

I Don't Think That Software Development Word Means What You Think It Means

There are several terms used inappropriately or incorrectly in software development. In this post, I look at some of these terms and the negative consequences of misuse of these terms.

"agile"

The Agile Manifesto started a movement that resonated with many software developers frustrated with inefficiencies and inadequacies of prevalent software development methodologies. Unfortunately, the relatively simple concepts of the Agile Manifesto were interpreted, changed, evangelized, commercialized, and sold in so many different ways that it became difficult to uniquely describe agile. To some, agile became synonymous with "no documentation" and to others agile meant going straight to coding without any process. So many disparate methodologies and practices are now sold as agile that it's become increasingly difficult to describe what makes something agile or not.

There have been several negative consequences of the multiple interpretations of what agile means. Implementation of so-called agile practices without understanding of agile can lead to failures that are blamed on agile, but which failures may have little or nothing to do with agile. Unrealistic expectations of agile and what it can do for development can lead to inevitable disappointment as there still is no silver bullet. It is difficult to help new developers, developers new to agile, managers, customers, and other stakeholders understand what agile is and how it may or may not be appropriate for them with so many different interpretations. I was at a presentation by an agile enthusiastic several years ago when he suggested that agile was anything that was successful and was not anything that is not successful.

For me, "agile" means processes and approaches that closely match the values outlined in the Agile Manifesto (individuals and interactions, working software, customer collaboration, and responding to change). There are other approaches and methodologies out there that may be useful and positive, but if they aren't inspired by these values, it is difficult for me to hear them called "agile."

"REST"

Roy Fielding's dissertation Architectural Styles and the Design of Network-based Software Architectures popularized the term Representational State Transfer (REST). Unfortunately, many have used REST and HTTP interchangeably and in the process have muddled the conversations about both the REST architectural style and the Hypertext Transfer Protocol (HTTP).

It is easy to see from a historical perspective why REST and HTTP are often treated interchangeably. REST embraced the functionality already provided by HTTP as a significant part of its architectural style at a time other popular architectural styles and frameworks were doing everything they could to hide or abstract away HTTP specifics details. REST leverages HTTP's stateless nature while others were trying to wrap HTTP with state. Although REST certainly played a major role in raising awareness of HTTP, REST is more than HTTP. I have found that many who think of REST and HTTP as one and the same don't appreciate the HATEOAS concept in REST. HATEOAS stands for Hypermedia as the Engine of Application State and refers to the concept of application state being embodied within the hypermedia exchanged between server and client rather than in the client.

"refactoring"

I've known of clients and managers filled with dread when they hear a developer state that he or she is going to "refactor" something. The reason is that "refactoring" too often means the developer plans to change the code structure and "improve" or "fix" behavior as part of this. Refactoring is supposed to be code improvements that do not effect the results of the software but lead to more maintainable code. Too many developers are lured into making other changes "while they are in the code" that change results. Even when for the better, these changes are not in the spirit of refactoring and so, when they have led to breaking of existing functionality, have led to "refactoring" being seen in a bad light.

Comprehensive unit tests and other tests can help ensure that refactoring does not change any expected behavior, but developers should also clearly understand whether the goal is to maintain current functionality with improved code structure (refactor) or actually change/improve functionality and only use the term "refactoring" when appropriate to avoid confusion.

"premature optimization"

I generally agree with the principle behind the now famous quotation, "Premature optimization is the root of all evil." However, my interpretation of this is that one should not write less maintainable or less readable code in an attempt to achieve small expected performance gains. However, as I posted in When Premature Optimization Isn't, this term occasionally gets used as justification for not making good architecture and high-level design decisions just because they have a performance benefit associated with them. Some architectural decisions are difficult to change at a later point and performance does need to be accounted for. Similarly, even at implementation level, there are times when better performing code is as readable and easy to write as less performing code and so there is no good reason to not write the better performing code.

NoSQL

The term NoSQL was an unfortunate one for a class of databases that probably would have been better labeled "Not Relational." As numerous "NoSQL databases" have adopted SQL (without adopting the relational model), alternative terms have been tried such as "Not Just SQL."

"open source"

The term "open source" has often led to confusion about whether the software in question is "free" in terms of "freedom" (libre/free speech) and/or "free" in terms of no monetary price (gratuit/free beer). There can even be confusion about the minor differences between "open source" and "free software." For me, "open source" means source code that I can look at and modify as necessary.

JavaScript

With JavaScript's increased popularity, its poorly chosen name does not seem to confuse as many people as it used to. However, I still do occasionally hear people who think that JavaScript must have some relationship to Java because "Java" is in both languages' names.

SLOC

I generally despise everything about the idea of source line of code. The appeal of SLOC is the pretense that somehow lines of code can be counted the same as beans and widgets. All lines of code are not created equal and there are differences in lines of code across different languages, across different developers, and across different functionality. Some have even gone so far as to think that more SLOC is always a good thing whereas I've found that more concise code with fewer lines of code can often be preferable. I have blogged before on lines of code and unintended consequences.

SOAP

This one is not a big deal in terms of negative consequences from its misuse, but it is worth noting that SOAP no longer stands for Simple Object Access Protocol.

JDBC

This is another one that doesn't really lead to any problems even though it technically has never stood for Java Database Connectivity and is not even an acronym. The fact that it does indeed relate to connecting to databases, that it is Java-related, and that it is so widely said to stand for Java Database Connectivity means that this misuse of the term JDBC has no significant negative side effects. In fact, I suspect that Sun Microsystems folks intentionally wanted people to think of it as an acronym for Java Database Connectivity while explicitly stating that it was not an acronym because it allowed people to quickly understand what JDBC is via their awareness of ODBC.

Conclusion

The incorrect use of many of the terms discussed in this post could be described as largely pedantic, but misuse of a few of them can lead to miscommunication and general confusion. In some cases (such as "agile" and "refactoring"), the misuse of terms has led to negative experiences and soiled reputations for those terms. In other cases (such as using JDBC and SOAP as acronyms when they really are not acronyms), the confusion seems small and harmless as everyone discussing the falsely advertised "acronym" seems to understand what it implies.

Monday, June 2, 2014

Review: DZone's 2014 Guide to Mobile Development

DZone is launching the public release of its DZone's 2014 Guide to Mobile Development today. The 35-page PDF is intended to be used with the DZone Mobile Development Research site to "explore the mobile application development landscape, examining best-practice strategies and comparing tools and frameworks that accelerate mobile development."

For those desiring an "executive summary" of the state of mobile development as described in DZone's 2014 Guide to Mobile Development, a single page (page 3) contains a paragraph and five bullets summarizing what the guide covers along with four "key takeaways" from the mobile development research.

The "Key Research Findings" section (pages 4 and 5) includes summary text and colorful graphics to illustrate results of surveys of mobile developers. The findings include things such as types of mobile developers, types of mobile applications being developed, and common timeline for mobile app development.

Pages 6 and 7 of DZone's 2014 Guide to Mobile Development look at developing mobile apps as web apps, as native apps, and as hybrids of those. Pages 10 and 11 elaborate more on the "state" of apps that are native, web-based, and hybrid and presents a table of 7 comparison factors for the three approaches.

Pages 14-15 of DZone's 2014 Guide to Mobile Development discuss issues related to integration of mobile applications with back-end enterprises. Pages 21-22 discuss perceived performance versus actual performance metrics.

The 23rd page of DZone's 2014 Guide to Mobile Development features a "Mobile Application Development Checklist." Pages 24-34 are the "Solutions Directory" and it is this section that may be of most interest to those trying to decide on tools, frameworks, and platforms to use in their new mobile development projects. Each "solution" has a brief description listed along with certain characteristics (the set of characteristics depend on whether it is a framework, mobile application development platform [MADP], or both).

The DZone 2014 Guide to Mobile Development also includes several pages of full-page advertisements as well as pages split in half with the top half containing text about a mobile development issue and the bottom half containing an advertisement for a related solution addressing the need discussed in text. These half-page text sections and associated advertisements are typically written by "research partners" (Progress/Progress Pacific, Outsystems/Outsystems Platform, ICEsoft/ICEmobile, Telerik/Telerik Platform, and New Relic Mobile Monitoring).

The DZone 2014 Guide to Mobile Development is a polished product that will likely be of most benefit to those considering starting mobile application development or those just entering mobile application development. Although it contains ideas likely to interest even more experienced mobile application developers, much of it is introductory in nature. Experienced mobile developers who want to see what others are doing will also find the research findings interesting and may find it useful if they are looking for an alternate framework or MADP to adopt. This report is not a highly detailed, but instead focuses on high-level trends, survey results, and product offerings that affect mobile development.

The sections of DZone's 2014 Guide to Mobile Development that I found most interesting are the Summary and Key Takeaways, Key Research Findings, Cross-Platform Problems and Solutions, The State of Native vs. Hybrid vs. Web, the checklist, and the Solutions Directory. Reading this report has helped solidify some of my opinions by providing background and support, has helped me increase my understanding of a couple things, and helped me to think about some common mobile development issues from a different perspective. I also was able to read about some frameworks and MADPs that I had not previously been aware of. In short, this report provides a succinct and well presented summary of the overall current state of mobile development.

Tuesday, May 27, 2014

Connecting to Cassandra from Java

In my post Hello Cassandra, I looked at downloading the Cassandra NoSQL database and using cqlsh to connect to a Cassandra database. In this post, I look at the basics of connecting to a Cassandra database from a Java client.

Although there are several frameworks available for accessing the Cassandra database from Java, I will use the DataStax Java Client JAR in this post. The DataStax Java Driver for Apache Cassandra is available on GitHub. The datastax/java-driver GitHub project page states that it is a "Java client driver for Apache Cassandra" that "works exclusively with the Cassandra Query Language version 3 (CQL3)" and is "licensed under the Apache License, Version 2.0."

The Java Driver 2.0 for Apache Cassandra page provides a high-level overview and architectural details about the driver. Its Writing Your First Client section provides code listings and explanations regarding connecting to Cassandra with the Java driver and executing CQL statements from Java code. The code listings in this post are adaptations of those examples applied to my example cases.

The Cassandra Java Driver has several dependencies. The Java Driver 2.0 for Apache Cassandra documentation includes a page called Setting up your Java development environment that outlines the Java Driver 2.0's dependencies: cassandra-driver-core-2.0.1.jar (datastax/java-driver 2.0), netty-3.9.0-Final.jar (netty direct), guava-16.0.1.jar (Guava 16 direct), metrics-core-3.0.2.jar (Metrics Core), and slf4j-api-1.7.5.jar (slf4j direct). I also found that I needed to place LZ4Factory.java and snappy-java on the classpath.

The next code listing is of a simple class called CassandraConnector.

CassandraConnector.java
package com.marxmart.persistence;

import com.datastax.driver.core.Cluster;
import com.datastax.driver.core.Host;
import com.datastax.driver.core.Metadata;
import com.datastax.driver.core.Session;

import static java.lang.System.out;

/**
 * Class used for connecting to Cassandra database.
 */
public class CassandraConnector
{
   /** Cassandra Cluster. */
   private Cluster cluster;

   /** Cassandra Session. */
   private Session session;

   /**
    * Connect to Cassandra Cluster specified by provided node IP
    * address and port number.
    *
    * @param node Cluster node IP address.
    * @param port Port of cluster host.
    */
   public void connect(final String node, final int port)
   {
      this.cluster = Cluster.builder().addContactPoint(node).withPort(port).build();
      final Metadata metadata = cluster.getMetadata();
      out.printf("Connected to cluster: %s\n", metadata.getClusterName());
      for (final Host host : metadata.getAllHosts())
      {
         out.printf("Datacenter: %s; Host: %s; Rack: %s\n",
            host.getDatacenter(), host.getAddress(), host.getRack());
      }
      session = cluster.connect();
   }

   /**
    * Provide my Session.
    *
    * @return My session.
    */
   public Session getSession()
   {
      return this.session;
   }

   /** Close cluster. */
   public void close()
   {
      cluster.close();
   }
}

The above connecting class could be invoked as shown in the next code listing.

Code Using CassandraConnector
/**
 * Main function for demonstrating connecting to Cassandra with host and port.
 *
 * @param args Command-line arguments; first argument, if provided, is the
 *    host and second argument, if provided, is the port.
 */
public static void main(final String[] args)
{
   final CassandraConnector client = new CassandraConnector();
   final String ipAddress = args.length > 0 ? args[0] : "localhost";
   final int port = args.length > 1 ? Integer.parseInt(args[1]) : 9042;
   out.println("Connecting to IP Address " + ipAddress + ":" + port + "...");
   client.connect(ipAddress, port);
   client.close();
}

The example code in that last code listing specified default node and port of localhost and port 9042. This port number is specified in the cassandra.yaml file located in the apache-cassandra/conf directory. The Cassandra 1.2 documentation has a page on The cassandra.yaml configuration file which describes the cassandra.yaml file as "the main configuration file for Cassandra." Incidentally, another important configuration file in that same directory is cassandra-env.sh, which defines numerous JVM options for the Java-based Cassandra database.

For the examples in this post, I will be using a MOVIES table created with the following Cassandra Query Language (CQL):

createMovie.cql
CREATE TABLE movies
(
   title varchar,
   year int,
   description varchar,
   mmpa_rating varchar,
   dustin_rating varchar,
   PRIMARY KEY (title, year)
);

The above file can be executed within cqlsh with the command source 'C:\cassandra\cql\examples\createMovie.cql' (assuming that the file is placed in the specified directory, of course) and this is demonstrated in the next screen snapshot.

One thing worth highlighting here is that the columns that were created as varchar datatypes are described as text datatypes by the cqlsh describe command. Although I created this table directly via cqlsh, I also could have created the table in Java as shown in the next code listing and associated screen snapshot that follows the code listing.

Creating Cassandra Table with Java Driver
final String createMovieCql =
     "CREATE TABLE movies_keyspace.movies (title varchar, year int, description varchar, "
   + "mmpa_rating varchar, dustin_rating varchar, PRIMARY KEY (title, year))";
client.getSession().execute(createMovieCql);

The above code accesses an instance variable client. The class with this instance variable that it might exist in is shown next.

Shell of MoviePersistence.java
package dustin.examples.cassandra;

import com.datastax.driver.core.ResultSet;
import com.datastax.driver.core.Row;

import java.util.Optional;

import static java.lang.System.out;

/**
 * Handles movie persistence access.
 */
public class MoviePersistence
{
   private final CassandraConnector client = new CassandraConnector();

   public MoviePersistence(final String newHost, final int newPort)
   {
      out.println("Connecting to IP Address " + newHost + ":" + newPort + "...");
      client.connect(newHost, newPort);
   }

   /**
    * Close my underlying Cassandra connection.
    */
   private void close()
   {
      client.close();
   }
}

With the MOVIES table created as shown above (either by cqlsh or with Java client code), the next steps are to manipulate data related to this table. The next code listing shows a method that could be used to write new rows to the MOVIES table.

/**
 * Persist provided movie information.
 *
 * @param title Title of movie to be persisted.
 * @param year Year of movie to be persisted.
 * @param description Description of movie to be persisted.
 * @param mmpaRating MMPA rating.
 * @param dustinRating Dustin's rating.
 */
public void persistMovie(
   final String title, final int year, final String description,
   final String mmpaRating, final String dustinRating)
{
   client.getSession().execute(
      "INSERT INTO movies_keyspace.movies (title, year, description, mmpa_rating, dustin_rating) VALUES (?, ?, ?, ?, ?)",
      title, year, description, mmpaRating, dustinRating);
}

With the data inserted into the MOVIES table, we need to be able to query it. The next code listing shows one potential implementation for querying a movie by title and year.

Querying with Cassandra Java Driver
/**
 * Returns movie matching provided title and year.
 *
 * @param title Title of desired movie.
 * @param year Year of desired movie.
 * @return Desired movie if match is found; Optional.empty() if no match is found.
 */
public Optional<Movie> queryMovieByTitleAndYear(final String title, final int year)
{
   final ResultSet movieResults = client.getSession().execute(
      "SELECT * from movies_keyspace.movies WHERE title = ? AND year = ?", title, year);
   final Row movieRow = movieResults.one();
   final Optional<Movie> movie =
        movieRow != null
      ? Optional.of(new Movie(
           movieRow.getString("title"),
           movieRow.getInt("year"),
           movieRow.getString("description"),
           movieRow.getString("mmpa_rating"),
           movieRow.getString("dustin_rating")))
      : Optional.empty();
   return movie;
}

If we need to delete data already stored in the Cassandra database, this is easily accomplished as shown in the next code listing.

Deleting with Cassandra Java Driver
/**
 * Deletes the movie with the provided title and release year.
 *
 * @param title Title of movie to be deleted.
 * @param year Year of release of movie to be deleted.
 */
public void deleteMovieWithTitleAndYear(final String title, final int year)
{
   final String deleteString = "DELETE FROM movies_keyspace.movies WHERE title = ? and year = ?";
   client.getSession().execute(deleteString, title, year);
}

As the examples in this blog post have shown, it's easy to access Cassandra from Java applications using the Java Driver. It is worth noting that Cassandra is written in Java. The advantage of this for Java developers is that many of Cassandra's configuration values are JVM options that Java developers are already familiar with. The cassandra-env.sh file in the Cassandra conf directory allows one to specify standard JVM options used by Cassandra (such as heap sizing parameters -Xms, -Xmx, and -Xmn),HotSpot-specific JVM options (such as -XX:-HeapDumpOnOutOfMemoryError, -XX:HeapDumpPath, garbage collection tuning options, and garbage collection logging options), enabling assertions (-ea), and exposing Cassandra for remote JMX management.

Speaking of Cassandra and JMX, Cassandra can be monitored via JMX as discussed in the "Monitoring using JConsole" section of Monitoring a Cassandra cluster. The book excerpt The Basics of Monitoring Cassandra also discusses using JMX to monitor Cassandra. Because Java developers are more likely to be familiar with JMX clients such as JConsole and VisualVM, this is an intuitive approach to monitoring Cassandra for Java developers.

Another advantage of Cassandra's Java roots is that Java classes used by Cassandra can be extended and Cassandra can be customized via Java. For example, custom data types can be implemented by extending the AbstractType class.

Conclusion

The Cassandra Java Driver makes it easy to access Cassandra from Java applications. Cassandra also features significant Java-based configuration and monitoring and can even be customized with Java.