Friday, March 21, 2014

Elastic Search: Preparing for Prod Deployment

Elastic search comes with pretty good defaults. But they may not be suitable for your prod environment. I am giving here a list of most important configuration options in general with reasons. In all means this might not be the complete list for your requirement, so please explore all possible options you want to look for. One needs to give a serious thought to various configuration options based on requirements of the application before moving to prod environment.

Configuration
  • Set cluster name to your application specific name, should be different from cluster name in your QA/Dev environments to prevent unwanted node to join the cluster in prod.
    • cluster.name: mycluster
  • Set node name to some meaningful name specific to application
    • node.name: mynode
  • Set paths to locations outside of ES installation directory to avoid any overwriting while upgrading to next versions
    • path.conf: path to directory containing elasticsearch.yml and logging.yml
    • path.data: path to directory where ES stores index related data
    • path.work: path to directory to hold temporary files
    • path.logs: path to directory where ES generates log files
  • Avoid deleting all indices by mistake
    • action.disable_delete_all_indices: true
  • If you do not want to create index automatically set following property to false. This may be desired if you want to add custom settings to index i.e. custom analyzers, number of shards and replicas
    • action.auto_create_index: false
  • Set the following property to false if you want to disable schemaless feature which may desired in prod. ES will not create mapping for unmapped types automatically if set to false.
    • index.mapper.dynamic: false
  • Specify default field for query, search is performed against this field when no field is specified. By default it's _all
    • index.query.default_field: _all
  • Set type of storage, set it to non-blocking I/O type for file system based storage as given below
    • index.store: niofs
  • Set the number of shards of an index, 5 is by default. This value can be overridden by specifying it in index settings json while creating index via REST API.
    • index.number_of_shards: n
  • Set the number of replicas of an index, 1 is by default. This value can be overridden by specifying it in index settings json while creating index via REST API.
    • index.number_of_replicas: 1
  • set minimum number of nodes that should come up after full restart of cluster before they start replication of data nodes will start swapping data as soon as they come up if this property is not set. This may result in very high I/O traffic in large clusters.
    • gateway.recover_after_nodes: n
  • Prefer setting node discovery to unicast and disable multicast. Add names of few hosts to seed list of unicast.
    • discovery.zen.ping.multicast.enabled: false
    • discovery.zen.ping.unicast.hosts: ["host1", "host2:port", "host3[portX-portY]"]
  • Set node discovery timeout to higher value for slower/congested network, default value is 3 seconds
    • discovery.zen.ping.timeout: 3s
  • Set minimum number of nodes to quorum to elect a master in split brain situation. So master is not elected in a split brain part if total number of master-eligible nodes in that part are not more than half of total number of nodes in normal cluster. Here n is total number of master-eligible nodes in cluster. This is a dynamic setting, so it's value can be changed on live cluster as number of nodes changes by using cluster API i.e. curl -XPUT localhost:9200/_cluster/settings -d '{"persistent" : {"discovery.zen.minimum_master_nodes": 3}}'
    • discovery.zen.minimum_master_nodes: (n/2) + 1
  • Disable memory swapping, swapping kills performance. Enabling mlockall tries to lock the process address space so it won’t be swapped.
    • bootstrap.mlockall: true
    • ulimit -l unlimited (unix/linux)        
  • Number of max. open file descriptors, in the machine running ES, should be set to a high value i.e. 65535. Verify this using curl -XGET localhost:9200/_nodes/process?pretty
JVM Heap
JVM heap size, ideally half of the available memory can be set to ES JVM. Set ES_MIN_MEM and ES_MAX_MEM env variables to same value in ES script and enable mlockall as explained above. Verify this using curl -XGET localhost:9200/_nodes/jvm

Hardware
  • Prefer to use SSD in place of spinning disk. SSDs can give hundred of times faster IO ops.
  • More CPU cores the better. Since ES is asynchronous it can use multiple cores efficiently.
  • Low latency network

Tuesday, February 14, 2012

Remote Debugging with Mule and Eclipse

One of my projects is still using Mule 2.2.1. I wanted to start it in debug mode so that I can do some remote debugging using Eclipse. I had already spent couple of hours before I could figure it out.


Mule 2.x
It needs a change in wrapper.conf file of Mule. There is a commented line
#wrapper.java.additional.=-Xdebug -Xnoagent -Djava.compiler=NONE -Xrunjdwp:transport=dt_socket,server=y,suspend=y,address=5005

This line needs to be broken down into four lines one for each Java argument. Make sure it is replaced with proper consecutive number for wrapper.java.additional property in your file. They are 7,8,9 and 10 for me.

wrapper.java.additional.7=-Xdebug
wrapper.java.additional.8=-Xnoagent
wrapper.java.additional.9=-Djava.compiler=NONE
wrapper.java.additional.10=-Xrunjdwp:transport=dt_socket,server=y,suspend=y,address=5005

Make sure line given below is uncommented in Mule executable script  $MULE_HOME/bin/mule or it's counter part in $MULE_HOME/bin/mule.bat for windows
JPDA_OPTS="-Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=5005"

Mule 3.x
It's easier in case of Mule 3.x. Run mule with debug option $MULE_HOME/bin/mule -debug
By default it listens on port 5005 that can be changed in wrapper.conf
 
Eclipse Debug Configuration  
You can change debug port 5005 if you want and configure same in Eclipse debug configuration as it's explained in my another post Remote Debugging - Tomcat+Eclipse

Thursday, December 29, 2011

Graph Database: neo4j

If we start looking around us in real life we’ll find more and more things are either in form graph or web of graphs. How are peoples connected with each other? How does money flow in a system? How are restaurants, hotels, and roads interconnected? How does a message flow on a social network? We’ll end up with a graph if we try to draw them on white board.

We are dealing with kind of similar domain model in our project. We got a flexible model using some out of box design approaches over relational database. But this flexibility came with some tradeoffs put in by limits of relational database. Growing size of data is a concern; tens of millions of rows get added to a single table every couple of months. We are striving to come up with an improved and better-fit solution before we hit the wall down the road after few years. This provoked us to dive into NoSQL movement and do some experimentation. This paradigm shift to see things in natural form seems interesting or may be some food for thoughts. It may help in solving some of the problems what we have been thinking of. I'll try to touch on NoSQL and neo4j in this post.

Faceoff: acronym NoSQL is not for “No SQL” or “Never SQL”; it is “Not Only SQL”. So most of the times, it means working on some no SQL persistence system along with SQL (relational) database, separating content between the two based on use case.

Different Types of NoSQL Data Models:

Key/Value Stores – fits well for very high volume of data having relatively low complexity e.g. Amazon Dynamo
Column Stores – fits well for high volume of data having fairly high complexity e.g. Hbase, Cassandra, BigTable
Document Databases – a middle path between high volume of data and high complexity e.g. Mongo, CouchDB
Graph Databases – fits well for fairly high volume of data having high complexity e.g. neo4j



Why a graph database (neo4j)?
  • Many domains are graph oriented and they are poorly mapped to tables. Why take the pain of squeezing a graph into table?
  • Performance problems due to SQL joins for connected data  
  •  ACID and JTA compliant – only NoSQL DB, I know so far, which supports transactions like relational DB 
  •  Relationships can be added dynamically if required 
  •  It can represents one to one mapping from real life domain model
You might have heard about Facebook Graph API and Open Graph Protocol; which see data in form of graph of different domains like people, places, business, and events.

Data Modeling in Graph Database (neo4j): 
Entities are nodes – all nodes have ids; id is uniquely created automatically for every new node and it cannot be changed

-         Tied relationships to connect nodes – uniquely identified by its type and direction

-       Properties (key/value pairs) – they can be attached to any node or relationship. Only java primitives can be used for properties, objects go as nodes.

This is a little similar to how we bind data (e.g. single level JSON object) to a node in DOM and then access it later. For example we do this using jQuery


var user = {‘name’:‘John’, ‘age’:28, ‘department’:‘IT’};

${“#user_div”}.data(“userInfo”, user);  // bind data to element having id=“user_div”

alert( ${“#user_div”}.data(“userInfo”).name );  // prints John


Spring Data Graph:
This is a JPA for graph database. An annotation driven, aspectJ based domain layer framework from SpringSource for mapping.

Cypher:
Graph query language. Neo4j also supports a powerful native traversing mechanism to retrieve data from graph.

Real Life Use Cases:
-         Social Networks
-         Geo Spatial Data
-         Recommendation Engines

Performance:
Test Case – 1000 persons having 50 friends on average over 4 levels. What is query time to find out if any two persons, picked randomly, are friends?

Result – it is 2000 ms for relational database and 2 ms for neo4j. Neo4j reports 2 ms even if number of persons are increased to 1 million. Remember search complexity of a tree from graph theory and algorithm class in school?

References:



Wednesday, July 13, 2011

Java 7: New Feauters

Attended webcast of launch of Java 7 last week. A number of evolutionary new features are added by project coin to make Java developers life easy. In addition to them but more exciting step has been taken by opening JVM for dynamic languages. Now languages like Javascript, JRuby, Python and many more can dance within JVM. What a confluence of back-enders and front-enders, static lovers and dynamic lovers! Stage is set for a new paradigm in programming world ...

I would love to share some of there here.

1. Hurrrey.... finally String can be used in switch statement.
void printSubjectType(String subject) {
String subjectType;

switch(month) {
case "english": subjectType = "Language"; break;
case "physics": subjectType = "Science"; break;
case "algebra": subjectType = "Mathematics"; break;
...
default: subjectType = "Uknown";
}

System.out.println( subjectType );
}


2. No need to mention types of generic on right hand side while initializing variable.
//Old
Map<String, Integer> monthDays = new HashMap<String, Integer>();

//Java 7
Map<String, Integer> monthDays = new HashMap<>(); // equivalent to new HashMap<String, Integer>();

// few more in Java 7
Map<? extends Number> monthDays = new HashMap<>(); // equivalent to new HashMap<Number>();
List<?> list = new ArrayList<>(); // equivalent to new ArrayList<Object>();


3. Multi catch - specify more than one exception in catch clause.
// Old
try {
new FileImageInputStream(new File("test.txt"));
} catch ( FileNotFoundException e ) {
throw(e);
} catch ( IOException e ) {
throw(e);
}

// Java 7
try {
new FileImageInputStream(new File("test.txt"));
} catch ( FileNotFoundException | IOException e ) {
throw(e);
}

4. try-with-resources - New try clause syntax where resources can be initialized. These resources get automatically closed for sure after completing try block.
//Java 7
try( InputStream in = FileInputStream("src"); OutputStream out = FileOutputStream("dest"); ) {
..........
..........
}
// no need to close in and out in finally block what we generally used to do.


Some more sophisticated features:
1. JVM supports dynamic languages like Javascript, JRuby etc. using invokdynamic and call site features.
2. New file APIs.
3. NIO2 - New IO part two APIs. Support for asynchronous IO has been added.

Ref: http://www.oracle.com/us/corporate/events/java7/index.html

Tuesday, July 06, 2010

REST Client


This is a feature rich rest client what I developed and released to open source under EPL 1.0 for developers to play with web services. It can be used to test any URL for all HTTP methods
  • GET
  • POST
  • PUT
  • DELETE
  • HEAD 
  • OPTIONS
  • TRACE

 

Main Features

  1. Simultaneous views of request, response and browser.
  2. Post raw data or file, text content or binary.
  3. Post params in either body or as part of URL (twitter style).
  4. Post multipart form data with same ease as of normal post.
  5. Handle response equally well even if it is binary e.g. image, pdf etc. No gibberish characters anymore.
  6. Play with headers and params.

Min. Requirement

  • Java 1.6
  • Eclipse 3.4 (for plugin)
  • HTTP 1.1 compatibility

Project Home:

http://tinyurl.com/rest-client

Approved by eclipse.org:

http://marketplace.eclipse.org/content/rest-client


Friday, May 28, 2010

Volatillity of "volatile" in Java Multi-threading

Many people say volatile keyword in Java is poorly understood and underused and I am not the exception. I would try to throw some light over it to make it simple to understand.

volatile has its meaning in context of multi-threading. Many explain it as

"If a variable is declared as volatile then it is guaranteed that any thread which reads it see the most recently written value."

Well first we need to understand that each thread has private memory (cache) in addition to access to shared main memory. Thread contains a copy of shared object, present in main memory, in its cache. There is time-to-time synchronization between cached value and value in main memory; it happens on event of obtaining or releasing lock. But this is not true if variable is declared as volatile. Volatile variables are read and written to main memory only. So there is no need of synchronization and any thread trying to read value will read from main memory.

Having said that there is still an open question of dead lock. Well access to volatile variable is like accessing a synchronized block without holding lock.

Monday, March 29, 2010

Ajax - Internal

Objective of this post is to expose behind the scene story of Ajax but lets start with classical definition of Ajax ...

1. What is Ajax?
Ajax is a web technology by which web client (web page) can interact with server asynchronously. Ajax stands for Asynchronous Javascript and XML. The whole story is around an object XMLHttpRequest which is an implementation, available in most browsers, of an interface XMLHttpRequest provided by scripting engine. IE has a different name of this object called XMLHTTP and instantiated as an activeX object. This object can be used by scripts to programmatically connect to their originating server via HTTP synchronously and asynchronously. Its asynchronous capability is exploited by AJAX for interactive communication with server.

2. Example
function ajaxCall(url) {
var req;

try {
req = new XMLHttpRequest();
} catch(e) {
try {
req = new ActiveXObject("Msxml2.XMLHTTP"); // IE 6.0+
} catch(e) {
try {
req = new ActiveXObject("Microsoft.XMLHTTP"); // Older IE
} catch(e) {
throw new Error("Your browser doesn't support AJAX => " + e); // Doesn't support AJAX
}
}
}

req.onreadystatechange = function() {
if(this.readyState == 4) {
var data = req.responseText;
document.getElementById('response').innerHTML = data; // create a div with id 'response'
// $("#response").html(data); // if using jquery
}
};
req.open("GET", url, true);
req.send(null);
}

Working with GET
Parameters can be passed as part of URL with GET e.g. http://server.com/getUser?fname=susan&lname=hank

Working with POST (sending JSON object, sending DOM)
With POST send() method can have a Document, DOM string, JSON string or any simple text string.

◆ A new instance of XMLHttpRequest object needs to be created for each new request. This is so because once readyState gets changed it is not reset to 0.
◆ Some browsers don't implement no argument send() method. So it would be better to pass 'null' if there is nothing to pass.

3. XMLHttpRequest Explained

XMLHttpRequest has the following methods

MethodDescription
abort()Aborts the current request
getAllResponseHeaders()Returns all of the HTTP headers as a string
getResponseHeader( headerName )Returns the specific value of the given HTTP header.
open( method, URL, async, userName, password )Opens a connection to the given URL using the given Method.
URL - The URL that you wish to connect to.
method - The HTTP method of which you wish to communicate by.
Possible Methods:
  • GET - most common
  • POST
  • HEAD
  • PUT
  • DELETE - least common
async - whether or not the connection should be asynchronous. For Ajax this is always true.
userName - username IF login is required
password - password IF login is required
send( content )Sends the request to the url in the open function. Content is any information that you wish to send to the server. This is typically null but depends on the method specified in the open function.
setRequestHeader( key, value )Adds a key-value pair to the HTTP header to be sent.

XMLHttpRequest has the following properties

PropertyDescription
onreadystatechangeA reference to an event handler for an event that triggers everytime the object changes state.
readyStateReturns the state of the object as follows:
  • 0 = uninitialized
  • 1 = open
  • 2 = sent
  • 3 = receiving
  • 4 = ready
responseTextThe response from the server contained in a single string.
responseXMLThe response from the server in XML format.
statusThe HTTP status code as a number
statusTextThe HTTP status as a string, ex: "Not Okay" or "Ok"

4. Interface of XMLHttpRequest object

XMLHttpRequest object is the implementation of interface provided by scripting engine as specified by W3C. Below is the interface


[NoInterfaceObject]
interface XMLHttpRequestEventTarget : EventTarget {
// for future use
};

[Constructor]
interface XMLHttpRequest : XMLHttpRequestEventTarget {
// event handler attributes
attribute Function onreadystatechange;

// states
const unsigned short UNSENT = 0;
const unsigned short OPENED = 1;
const unsigned short HEADERS_RECEIVED = 2;
const unsigned short LOADING = 3;
const unsigned short DONE = 4;
readonly attribute unsigned short readyState;

// request
void open(DOMString method, DOMString url);
void open(DOMString method, DOMString url, boolean async);
void open(DOMString method, DOMString url, boolean async, DOMString? user);
void open(DOMString method, DOMString url, boolean async, DOMString? user, DOMString? password);
void setRequestHeader(DOMString header, DOMString value);
void send();
void send(Document data);
void send([AllowAny] DOMString? data);
void abort();

// response
readonly attribute unsigned short status;
readonly attribute DOMString statusText;
DOMString getResponseHeader(DOMString header);
DOMString getAllResponseHeaders();
readonly attribute DOMString responseText;
readonly attribute Document responseXML;
};


More information is provided by the W3C at http://www.w3.org/TR/XMLHttpRequest/