Tuesday, December 17, 2019

DATA ANALYTIC LIFE CYCLE

               
                Hi friends, we heard about the many life cycle events especially Life cycle of Butter fly, The Structure of Caterpillar change into Butter Fly. Even in a Software Development Life Cycle
( SDLC ) the whole process are the most important aspects for the project implementation.
                                                                                     
Project Planing -------Analysis ------ Design ------- Implementation
                                                                               |
                                                                                | 
 Maintenance ------------------ Testing & Integration                      



The main important phases are the data analytics process in a big data. A data scientist must know the phases, that would success their analytic design project.
 The Success Analytic Report contains many phases:
  1. Business User
  2. Project Sponsor
  3. Project Manager
  4. Business Intelligence
  5. Data Engineer
  6. Data Base Administrator
  7. Data Scientist
Business User:
          He is the end user or client of the project.The client should tells the requirements what type of data he want, what kind of prediction the data must have, etc....

Project Sponsor:
          A Project sponsor have a little much idea whether the funding project is success or not. The sponsor must have a marketing team and what period to sell the project. He links the project between business community,decision making groups.

Project Manager:
          The project manager has huge burden over the project. he was the lead the team to a sufficient path. He managing the team in a good manner. He check out the every module in a project is correct or not. The project manager continuously reports to the project sponsor. He enthusiast his team to make the project successful.

Business Intelligence:
          He gives the data of past and present algorithm to the data engineer.

Data Engineer:
          He supports the extraction & data ingest to analyst sandbox.(retrieves the data from sandbox.)

Database Administrator:
          He provides the access to the data,whether it was public user or private user.

Data Scientist:
          He was responsible for all the manager,Business Intelligence, Data Engineer and Administrator.
         

 DATA ANALYTIC LIFE CYCLE:

Discovery:
                  First we need to identify the data whatever it should be used for and what kind of data it requires,then we need to prepare the data in next phase

Data Preparation:
                   The preparation reduces the unwanted and noisy data, and make the efficient way of using the data.

Model Planning: 
                   Select Particular algorithm to make the data alive.
example: k means Clustering, Recursive, Association Rules, Classification, etc..

Model Building:
                   It prefers to Build the data to configure for the end users environment

Communicative Results:
                    It gives the samples outputs for the end users. If it was not convenience
the data scientist want to rebuild the same module.

Operationalize:
                    Final Completion of the projects, it refers the worth of the project and tells what kind efforts the data scientists do.





KEY ROLES OF DATA ECOSYSTEM:
  1. Deep Analytical Talent
  2. Data Savvy Professional
  3. Technology & Data Enablers
PROFILE OF DATA SCIENTIST:

Quantitative:
                Problem solving techniques in mathematical or statistical analysis.

Curious & Creative:
                The data scientist should be enthusiast about the idea of a data and explore concept as big deal to the end user.

Communication & Collaborative: 
                The communication between the social communities would be very clear. It must develop their data architecture to the data sets. The data scientist would discuss with the project sponsor, that the sponsor thought he spending money for the reasonable projects. 

Skeptical: 
                The skeptical analysis is diagrammatic solution for the developers, it is easy to analyse better than see a data.

Technical: 
                The data should related to the technical words, that relates the convenient usage of end user.The data scientist must updates their technical analysis every time. It refers to a good data scientist.

R-STUDIO ENVIRONMENT:
                 
                   R Studio is a programming platform for analysing the Big data . It contains the programming language of R. Globally 12% of developers are working on the BIG DATA.
R programming is a interpreted language that you executes it with line by line coding.If you believe it or not,it is very easy to understand the code.

                   Let's we move to the installation of the software in windows. You need to click the link for downloading the package for Rstudio Download Package
                   After, downloading the package. Install the application on your root directory (C:\\)
                   When completing the install you are enter to the R studio environment.
                    Let's we see about How to use the R studio in my next post. I'm going to explain the R language with my hands on dirty.
                   If you have a queries or about those contents comment or contact me in kamalkk54321@gmail.com  
  

Saturday, December 14, 2019

BIG DATA ANALYTICS & DATA SCIENCE


                      Hi,guys I recently attended a EMC certification event about data science and big data analytics conducted by ICT academy. I've totally got a new idea about the emerging trends in Big Data.The certification which was provided by the EMCDSA (EMC data Scientist Associate Professional)
             Nowadays, Big data is the most trending technology and more valuable platform to a data scientist. Even a Fresher could earns an income of 17L /yr packages. If a software engineer cannot survive in the world but a data scientist we can rule over the world. Let's we see,
 Big Data is data whose scale,distribution,diversity,and timeliness require the use of new technical architectures and analytics to enable insights that unlock new sources of business value.
Those the three golden concepts satisfies the big data,
  1. Volume
  2. Velocity
  3. Variety   
          In all over the world every seconds, trillions amount of data can be shared. If we take sample of facebook as a Big data each and every person must have a personal account.Everyone relays on the network and shares a minimum data bytes in a single day. The whole data stored on to the facebook server. If we considered the facebook data , its also be satisfies the three V's of golden concepts.The data should be any kind it could be text messages,videos,images, gifs , etc........

          Most of people have a question about why facebook store their big data?
You see in the most pages, likewise "you may like this page, you may like this post, you have a suggestion from your friend."
          Facebook analyse their sequence of individual user and verifies more number analytical process they need more number of data,so they gathers a individual persons data.The individual data should be a useful or useless but the organization stores the whole data.
          At first, facebook was started with the whole data of text messages,then it would get frequent updates to improve their performance.Now currently it works with the machine learning algorithm  millions of data in a every second.

          If we consider in the Amazon prime is one of the streaming web series that consumes lot of data.The company retrieves the information of the web searches of a individual users and make it widely to enhance their whole platform.They officially personalized their whole data of a every user.

          Big data also must proceed the analyze of profit and loss of particular year with a particular technology
Here we intensely go to the big data concept, first we needs to be study below,

KEY CHARACTERISTICS OF BIG DATA
  1. Data Volume                 :  Total Storage of the data in a container  
  2. Processing Complexity : The performance of each dataset that required to be easy and quick
  3. Data Structure               : Easy structure of data can retreives the information properly and convenient usage 

DATA STRUCTURE GROWTH

Growth of DataStructure

Structured: The data can be aligned in a specified manner.Any data scientist can retrieves the particular data for the official use.

Semi-Structured: The data should be aligned in JSON(Java script object notation) , XML(Extended Markup Language), and NoSQL databases. Total data can be configured in a specified file formats.

Quasi-Structured: The Structured data would be blindly used for the google search engine. It can be searches under the partial information of the query keyword.

Un-Structutred: It can be expressed as collections of all combinations of text,video,audio,etc.... . Any type of data format can be stored on the structure.

The Data Structure can be two more simple examples,and it can be shown as
                The  Structured query URL like be
https://navybird.blogspot.com/big-data-analytics-data-science/
                The Unstructured URL query like be
https://www.blogger.com/blogger.g?blogID=3920101252358251749/img?#4322
               


DATA REPOSITORIES

Data Islands




      - which the data spreads everywhere on the world wide of the internet.

Data Ware House



    - collections of arranged data that contains number of libraries followed by a chain by chain formation. All the data sets are stored in one place.

Analytic Sandbox  




      - Pooled Data stored in single place and whenever it can be used by the data analyst.

BUSINESS DRIVERS:
  1. Desire to optimize
    1. To optimize their sales & profitability
  2. Desire to Identify Business risk    
    1. To reduce the fraud and customer churn
  3. Predict New Business Opportunities   
    1. To enhace their Cross sell prospects
  4. Comply with laws or regulatory requirements 
              The laws can be modulated every year, it existed to the additional complexity and data requirements for the organization.The laws can be mainly concentrated to the Anti-Money Laundering

BUSINESS INTELLIGENCE



                     The Business Intelligence mainly focuses on using a consistent set of metrics to measure the past performance and inform business planning. It contains the analytics of past and present data sets.

  • It requires Traditional sources, Structured Data and manageable data sets.
  • Its based on the Standard and Ad-Hoc reporing
Some of the common questions to identify the Business intelligence
      What happened last quarter?
      How many did we sell?
      Where is the problems? In which situations?

    DATA SCIENCE




                     It collects the tones of stored data that would be mentioned on the past & present to predicts the future data. A data science always contains the Business Intelligence
    It requires very large data sets.

    • Its a technique of Structured (or) Unstructured Data
    • It gives optimization, forecasting , Predictive modeling and statistical analysis.
    Some of the common questions to identify data science
        What if...?
        What's the optimal scenerio for our business?
        What will happen next?What if these trends continue?
        Why this is happening?

                           The Big data must solve its issue by their own historical data.The data must belongs to the past, present data to predict the future data and it accurately calculates their values,it must be profit or loss.