The Web and Data Science course focuses on the study of large-scale socio-technical systems associated with the World Wide Web. It considers the relationship between people and technology, the ways that society and technology complement one another and the way they impact on broader society. These analyses are inherently associated with Big Data management issues.

The course is given in Como Campus by Marco Brambilla and Emanuele Della Valle.

LESSON 1. IntroductionScenarios (1) and Scenarios (2), Exam and project rules

LESSON 2. Introduction to Big Data.  How Netflix uses Big Data since mid 2000s’. Vertical vs. horizontal scalability.

LESSON 3. Scaling storing horizontally with Key-Value pairs. Scaling processing horizontally with Map-Reduce.

LESSON 4. The logical architecture of a Big Data platform. Hands-on HIVE: context, read me, data (138MB!). Innovating the hadoop ecosystem: the approach of Berkeley Data Analytics Stack. Introduction to Spark.

LESSON 5. Let’s try Databricks Community Edition. From RDD, transformations, and actions to Datasets and SQL queries (Databricks’ notebooks). Notebooks developed in class: basics and word count (using hamlet.txt). Demo of 100x effect using the same data of the HIVE hands-on of lesson 4 (readme, 127 MB of data in parquet).

LESSON 6. Practical statistics for Web Science. Intro to R and related tools. R introductory examples.

LESSON 7. Web API, Rest API and Scraping for Web data collection. Including Source code of examples (ZIP).

LESSON 8. Clustering and PCA



LESSON . Classification

LESSON . Recommendations

LESSON . Data Wrangling and Data Cleansing

LESSON . Web Search foundations

LESSON . Human Computation and Crowdsourcing

LESSON . Semantic Web and RDF and exercise


LESSON . RDF-S and OWL practical cases. Solutions

LESSON . SPARQLexamples, and putting it all together

Additional resources:

Guidelines for the exam


40% of the grade of this course is granted based on the evaluation of a project work (see also the slides presented in the 1st lesson).

