Big Data is the most valuable asset for companies and institutions, which rely on data analysis techniques for taking informed decisions and studying problem solutions. Tons of PBs (petabytes) of public data published by government bodies is electronically accessible, and this volume is growing. The availability of big data is bringing a paradigm shift in understanding public opinions, and planning and growing cities in which advances are becoming more and more data-driven. Government officials, city planners, statisticians, computer scientists, and engineers have begun collaborating to tackle these data science problems. The amount and diversity of available data requires innovative methods in data extraction, analysis, integration and visualization.
Data Science is an emerging discipline that combines data management, machine learning, data analysis, statistics and mathematics, computational science, artificial intelligence, data visualization and user experience design to big data. Data scientists are engaged in developing methodologies, algorithms and software implementations for making data more accessible and useful.
WHAT IS DATA SCIENCE?
Data Science is the discipline which studies how to effectively use data. From a scientific point of view, data science consists of:
- data acquisition, representation, cleaning and integration;
- information/knowledge modelling and semantic and context/based enrichment;
- information retrieval, data mining, knowledge discovery;
- artificial intelligence methods and machine learning;
- topological data analysis and other statistical methods;
- data trustworthiness, prediction and recommendation,
- data visualization and exploration.
These phases create a pipeline for data science which is assisted by general technologies and tools. The most important technologies are based upon cloud and parallel computing, along the road which has been traced by the giants of data science, such as Google, Facebook, Amazon – whose main asset is the analysis of socially produced big data content. Among the emerging platforms and tools, we cite Hadoop, Hive, Spark, Flink and many others.
WHY IS DATA SCIENCE INTRINSICALLY INTERDISCIPLINARY?
Scientists are increasingly recognizing the value of analyzing vast amounts of data to answer many interesting questions. Data science is strictly connected to empirical evaluation and practical application to real-world case studies coming from diverse fields. The development of these new methods benefits several fields: Bio-Medicine, Molecular Genetics, Business Intelligence, Sociology, Media Analysis, etc., which face similar challenges dealing with the rapidly increasing amount of available data. Knowledge in the world continuously evolves, scientists are challenged to keep the pace with it, by studying new and more robust knowledge extraction methods. To achieve this, it is necessary to a) work in parallel on multiple facets of the problem and b) combine scientists who are specialized in different areas.