Big Data
Today’s business organizations are collecting vast amounts of data – varying in structure, complexity and size. However, one thing all these organizations are discovering is that a wealth of strategic value that lies in this data is difficult to extract using traditional relational database management tools. Besides strong technical competency, using big data tools requires a fundamental shift in how organizations view their data, its structure, usage scenarios and the roles & responsibilities of the IT and user organizations.
Our Big Data practice assists our clients in both strategic upstream activities such as evaluating and developing big data road-map to implementation and support of large environments.
Our Advisory Services include:
- Identifying/defining Big Data business/project initiatives
- Developing a Big Data implementation road-map
- Creating proof of concepts, white papers, technology / tool evaluation services
- Providing a road-map to help clients choose appropriate technologies / frameworks / tools
- Implementing best-practices and industry standards
- Implementing new tools, technologies to provide innovative solutions
Our Execution Services include:
- Planning, design and implementation of a Hadoop and other Big Data environments
- Developing/enhancing Java or C++ or LAMP based applications on existing or new Hadoop implementations
- Troubleshooting/performance optimization of existing Hadoop implementations
- Data quality management and data harmonization projects
- Testing/QA of big data applications, automation of data validations and regression test scenarios
- Documentation, programmer trainings, reverse-engineering, upgrade, maintenance, migration and other steady-state services
Below are some of the technologies our Big Data practice works with
Programming Languages | Java, Python, JavaScript (client-side as well as NodeJS) |
Distributed File Systems | Apache Hadoop HDFS, Tachyon, |
Key/Value Data Stores | Apache Accumulo, BerkleyDB, MemcachedDB, Redis, Amazon DynamoDB |
Column-oriented Data Stores | Apache Hive, Apache Hbase, Apache Cassandra, Amazon Redshift |
Document-oriented Data Stores | MongoDB, CouchDB, Riak, RethinkDB |
Graph-oriented Data Stores | Apache Giraph, Neo4J, Blueprints, OrientDB, GraphX |
Relational Data Stores | Oracle, MySQL, PostgreSQL, MariaDB, Greenplum, Teradata, BlinkDB, Shark |
Search Platforms | Apache Solr, Elastic Search, GSA |
Text Processing | Apache Tika, Apache Mahout, Apache Stanbol |
In-memory/Realtime Processing | Apache Spark, Apache Spark Streaming, Apache Storm |
Statistics, Visualization | Gnuplot, VizQL (Tableau), D3JS, Leaflet (maps) |
Cloud Platforms | Amazon AWS, OpenStack |