Running a Spark Job using PySpark on AWS EMR

Spark is considered as one of the data processing engines which is preferable, for usage in a vast range of situations. Data Scientists and application developers integrate Spark into their own implementations in order to transform, analyze and query data at a larger scale.

Read More

Simple IOS Image Caching Technique

Loading a large number of images asynchronously in a scrollable view like  UITableView or UICollectionView can be a common task. However, keeping the app responsive in terms of scrolling while images are being downloaded can be a bit of a challenge. In worst cases, we have also experienced app crashes.

Read More

AWS Redshift Optimization

Amazon Redshift is a fast, fully managed, petabyte-scaled data warehouse solution, that uses columnar storage to minimize Input/Output (I/O), provide high data compression rates, and offer fast performance. As a typical Data Warehouse, it is primarily designed for Online Analytic Processing (OLAP) and Business Intelligence (BI) and not designed to use as an Online Transaction Processing (OLTP) tool. It supports Ansi-SQL and is a massively parallel processing database.

Read More

Implementing an Enterprise Data Lake using AWS – A Case Study

One of the leading independent investment management company was seeking to implement a cloud-based Enterprise Data Lake (EDL), Enterprise Data Warehouse (EDW) and an Enterprise Data Pipeline (EDP) that leverage both the AWS services as well as other complimentary open source tools in the market.

Read More

An AWS Data Integration platform for Data Analytics and Machine Learning

The Project is an AWS based solution to provide a data integration platform that can accelerate digital analytics capture of the customer journey and identify some of the buying/cancellation patterns using Machine Learning (ML) approaches for one of the world’s most widely recognized cruise brands.

Read More

Implementing a Big Data Pipeline using AWS and Open Source Frameworks – A Case Study

The client is a top American company specialized in the use of marketing to sell home care, health and beauty products. The company’s global data platform has a legacy data warehouse to store various marketing data for generating BI reports.

Read More

Data Lake Reference Architecture

Data lake is a single platform which is made up of, a combination of data governance, analytics and storage. It’s a secure, durable and centralized cloud-based storage platform that lets you to ingest and store, structured and unstructured data. It also allows us to make necessary transformations on the raw data assets as needed. A comprehensive portfolio of data exploration, reporting, analytics, machine learning, and visualization on the data can be done by utilizing this data lake architecture.

Read More

Basics of K-Means Clustering

Machine Learning is considered as the execution of utilizing the existing algorithms, in order to inject data, grasp from it, and then make a resolution or forecast about something. So rather than developing software procedures with a certain set of directives to achieve a specific task, the machine is instructed using huge amounts of data and algorithms that provides it the capability to absorb, how to accomplish the endeavour.

Read More

Cloud Enabled DevOps Strategy on AWS

Any organization which is serious about releasing iterations of bug free software, in a frequent manner should have some level of DevOps processes in place in their delivery pipeline. The following post will discuss how to implement a DevOps continuous delivery/deployment pipeline in the AWS Cloud infrastructure.

Read More

Machine Learning with AWS

Machine learning often feels a lot harder than it should be to most developers because the process to build and train models, and then deploy them into production is too complicated and too slow. First, you need to collect and prepare your training data to discover which elements of your data set are important. Then, you need to select which algorithm and framework you’ll use.

Read More

AWS Kinesis Firehose – Real-time data streaming on AWS

AWS Kinesis Firehose is a fully managed service for transforming and delivering streaming data to a given destination. A Destination can be a S3 bucket, Redshift cluster, Splunk or Elasticsearch Service. In the following tutorial I’ll walk you through the process of streaming CloudWatch Logs to a S3 bucket generated by an AWS Lambda function.

Read More

Building SaaS based Enterprise Cloud Applications – White Paper

According to NIST, “cloud computing is a model for enabling ubiquitous, convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction (NIST, 2011The traditional approach incur a huge capital expenditure upfront along with too much excess capacity not allowing to predict the capacity based on the market demand.

Read More

Setting up a testing cluster using ClusterRunner

So you have written some tests for your project and now you are waiting for the test run to be completed to see if anything breaks due to the changes you have made by your last commit. Finally you can merge your changes to develop when everything seems green on your CI. But when the number of tests increases, their execution time will also increase.

Read More



+1 919-943-6974 (USA)
+94112 337 516 (SL)
130E, San Fernando Street,
#514 San Jose, CA 95112, USA