Big Data

Today’s business organizations are collecting vast amounts of data – varying in structure, complexity and size. However, one thing all these organizations are discovering is that a wealth of strategic value that lies in this data is difficult to extract using traditional relational database management tools. Besides strong technical competency, using big data tools requires a fundamental shift in how organizations view their data, its structure, usage scenarios and the roles & responsibilities of the IT and user organizations.

Our Big Data practice assists our clients in both strategic upstream activities such as evaluating and developing big data road-map to implementation and support of large environments.

Our Advisory Services include:

Identifying/defining Big Data business/project initiatives
Developing a Big Data implementation road-map
Creating proof of concepts, white papers, technology / tool evaluation services
Providing a road-map to help clients choose appropriate technologies / frameworks / tools
Implementing best-practices and industry standards
Implementing new tools, technologies to provide innovative solutions

Our Execution Services include:

Planning, design and implementation of a Hadoop and other Big Data environments
Developing/enhancing Java or C++ or LAMP based applications on existing or new Hadoop implementations
Troubleshooting/performance optimization of existing Hadoop implementations
Data quality management and data harmonization projects
Testing/QA of big data applications, automation of data validations and regression test scenarios
Documentation, programmer trainings, reverse-engineering, upgrade, maintenance, migration and other steady-state services

Below are some of the technologies our Big Data practice works with


Programming Languages	Java, Python, JavaScript (client-side as well as NodeJS)
Distributed File Systems	Apache Hadoop HDFS, Tachyon,
Key/Value Data Stores	Apache Accumulo, BerkleyDB, MemcachedDB, Redis, Amazon DynamoDB
Column-oriented Data Stores	Apache Hive, Apache Hbase, Apache Cassandra, Amazon Redshift
Document-oriented Data Stores	MongoDB, CouchDB, Riak, RethinkDB
Graph-oriented Data Stores	Apache Giraph, Neo4J, Blueprints, OrientDB, GraphX
Relational Data Stores	Oracle, MySQL, PostgreSQL, MariaDB, Greenplum, Teradata, BlinkDB, Shark
Search Platforms	Apache Solr, Elastic Search, GSA
Text Processing	Apache Tika, Apache Mahout, Apache Stanbol
In-memory/Realtime Processing	Apache Spark, Apache Spark Streaming, Apache Storm
Statistics, Visualization	Gnuplot, VizQL (Tableau), D3JS, Leaflet (maps)
Cloud Platforms	Amazon AWS, OpenStack