Data Science is a field of Information Technology and Computer Science where we analyze data using different programming models and tools.
It is a science that is moving around the data loading and data processing using the latest tools and technologies.
Data is raw facts that do not have any meaning but if you analyze data sets then you produced some meaningful information.
Data Science is not a new field or domain as from past decades we are mining and extracting information using various traditional tools.
Data Science has a couple of sub-modules.
1-Big Data Distribution at Cluster.
3-Data Loading ( ETL).
5-Real-Time Data Analysis.
6-Batch Data Analysis.
List of tools and technology that are used to process data.
1- python Programming
3- Scala Programming.
we can use various programming languages to analyze data but python is tremendously growing and many professionals and IT Organizations opting python programming as a base language for data analysis.
Frameworks for Big Data Processing.
1- Apache Hadoop Echo System. (Cloudera, IBM)
2- Apache Spark Echo System.
Hadoop and Spark Both are Data processing Frameworks having a couple of tools that are used to process different varieties of data sets. Hadoop process data into disk while the spark might process data into RAM and Disk Both.
Apache Hadoop is not meant for Small data Processing it is a framework that is meant for Big Data ( Huge amount of data that can not be processed using traditional tools.)
Data Loading Tools.
1- Apache Sqoop
2- Apache Flume
1- Matplotlib ( A python library)
2-Apache Zepplin Server.
Tools for Data Processing and Analysis.
3- Spark SQL.
Machine Learning is a part of data science where we develop a model for data analysis that is based on the data sets so we need to train our ML model using Training data sets (part of actual data sets) and finally we test the accuracy of our model using Test data sets (might be a part of actual data sets) then we try to reduce our model errors using gradient decent model.
2-Classification Techniques ( Logistic Regression)
3-Clustering ( K mean Clustering)
Reinforcement Learning is a part of Data Science where our system or model tries to learn by itself. In this approach, our system learns from its mistakes and store every result in mind to keep not doing mistakes in the future.
Reinforcement Learning is a part of artificial Intelligence.
Example: Q Learning.