Skip to main content

Angular vs React

An attempt to answer one question which all people trying to pick a front end framework has . Is Angular or React the right choice ? Angular vs React is a burning adoption question on most Project planning sessions and one which often a developer has to confront. React is a Facebook supported initiative and is a javascript library . It would be wrong to call it a framework based on its initial release versions . It helps to create resusable components in applications and is javascript based . Angular has equally competent heritage from Google .With Angular being a new version to previous Angular JS which was widely successful. Angular has now ditched Javascript and adopted Typescript , i was very skeptic of this choice especially typescript being previously unknown in the front end developer circles . Once I started using Typescript I could see some clear advantages  and choice for strong typing  which was missing in Javascript for long was being enforced. Of...

Lambda architecture


Lambda is a most discussed architecture pattern in data science space .The below is an attempt to explain it .

Lambda is a data processing architecture and framework designed to address robustness and scalability and fault tolerance of big data systems.

In this study we are planning to focus on the batch and speed layers to achieve data processing.

As proposed in the summary for both the batch and stream we will be using Spark.



What constitutes the lambda architecture for data processing?

Lambda is a architecture pattern .

Here the architecture we investigate has Lambda implemented primarily with Spark for batch and stream processing, Cassandra for No -SQL database storage, Kafka for accessing and sending
the stream data and Zeppelin for visualizations.

Sample application workflow investigated -


In this sample application Spark is  used for both batch and real time.

We will persist the  data in HDFS to perform batch operations against it.
[HDFS stands for  Hadoop distributed file system]

We can use the same processing framework, We can also reuse code leveraging business logic and our experience from the implementation of another layer.

In this case we have investigated storing the data in Cassandra, and we'll be using Zeppelin to visualize some of our data and run some Spark and Cassandra commands.

The application used click stream producer to send data to Kafka in a few different formats .

The sample  application we are going to create uses Spark to synchronise the data to HDFS and perform the stream and batch processing. 

What is Apache Spark?


Spark is a general-purpose cluster computing platform designed with components for scheduling and executing against large datasets.

Spark is now in version 2.X where there is more focus on structured streaming.

How spark fits into lambda Architecture?


Spark is a general engine for large -scale data processing.
It also scales horizontally like Map Reduce the major difference comes in the speed.

Spark is one of those frameworks that was built to address some of the inefficiencies with MapReduce.

Spark performs specific optimizations by building its own directed acyclic graph or DAG based on your program and optimizes that DAG with a substantially less amount of data hitting disk and passed on through memory instead. 

Spark also builds its own execution DAG as well and has its own optimizations and scheduling for executing that DAG.

 The core strength of Spark's performance when compared to other frameworks is that it can utilize memory and cache objects efficiently and that it also keeps a lineage graph of your operations, so it can re-compute on failures. 

These are two of the fundamental things that the resilient distributed dataset implementation in Spark is all about.  

What are the Spark components and how to perform scheduling?


Spark downloaded usually contains the  Spark Core that includes high-level API and an optimised engine that supports general execution graphs, Spark SQL for SQL and structured data processing, and Spark Streaming that enables scalable, high-throughput, fault-tolerant stream processing of live data streams. Spark MLLIb contains the machine learning libraries.

GraphX is for graph computations.

Spark also supports a variety of languages like Java, Python, Scala, and R.

Different Aggregations available in Spark

RDD API


Fundamental abstraction and building block. RDD represents a “Resilient Distributed Dataset”.

RDD which is a collection of elements partitioned across the nodes of the cluster that can be operated on in parallel. RDDs are created by starting with a file in the Hadoop file system (or any other Hadoop-supported file system), or an existing Scala collection in the driver program, and transforming it. Users may also ask Spark to persist an RDD in memory, allowing it to be reused efficiently across parallel operations. Finally, RDDs automatically recover from node failures.

 

Data Frame and Dataset API’s


Like an RDD, a Data Frame is an immutable distributed collection of data. Unlike an RDD, data is organized into named columns, like a table in a relational database. Designed to make large data sets processing even easier, Data Frame allows developers to impose a structure onto a distributed collection of data, allowing higher-level abstraction; it provides a domain specific language API to manipulate your distributed data; and makes Spark accessible to a wider audience, beyond specialized data engineers.

Spark 1.6 brings us a Dataset API, which adds to the Data Frame API by adding type safety to the structured table representation of data that Data Frames bring.

 

Caching


Spark also supports pulling data sets into a cluster-wide in-memory cache. This is very useful when data is accessed repeatedly, such as when querying a small “hot” dataset or when running an iterative algorithm like PageRank.

Comments

Popular posts from this blog

Scrumban using Asana

I had to create a project management practice in Agile for one of my projects. I had experience in working with Scrum  and Scrumban Agile methodolgies . I decided to adopt Scrumban as i found it better that Scrum and more suited for product development with uncertain requirements and often user stories which require a depeer research and understanding . You can read on further from the links below on Scrumban https://www.agilealliance.org/what-is-scrumban/ http://leansoftwareengineering.com/ksse/scrum-ban/ I had to choose a tool that was open source and readily available , I evaluated the below tools Taiga  Trello Asana Of the above I found Asana highly suitable and configurable and bingo they also had a mobile application which was quite user friendly. The  Sprints were planned for 2 weeks duration and a Milestone of 1 month duration . Each Milestone had two Sprints . For scrumban I choose the below project structure Here the Backlog had th...

Angular vs React

An attempt to answer one question which all people trying to pick a front end framework has . Is Angular or React the right choice ? Angular vs React is a burning adoption question on most Project planning sessions and one which often a developer has to confront. React is a Facebook supported initiative and is a javascript library . It would be wrong to call it a framework based on its initial release versions . It helps to create resusable components in applications and is javascript based . Angular has equally competent heritage from Google .With Angular being a new version to previous Angular JS which was widely successful. Angular has now ditched Javascript and adopted Typescript , i was very skeptic of this choice especially typescript being previously unknown in the front end developer circles . Once I started using Typescript I could see some clear advantages  and choice for strong typing  which was missing in Javascript for long was being enforced. Of...

For Machine learning should we use R or Python ?

Some real important differences to consider when you are choosing  R  or  Python  over one another: Machine Learning  has 2 phases. Model Building and Prediction phase. Typically, model building is performed as a batch process and  predictions are done realtime . The model building process is a compute intensive process while the prediction happens in a jiffy. Therefore, performance of an algorithm in Python or R doesn't really affect the turn-around time of the user. Python 1, R 1. Production:  The real difference between Python and R comes in being production ready. Python, as such is a full fledged programming language and many organisations use it in their production systems. R is a statistical programming software favoured by many academia and due to the rise in data science and availability of libraries and being open source, the industry has started using R. Many of these organisations have their production systems either in Java, C++, C#, ...