The Web and Data Science course focuses on the study of large-scale socio-technical systems associated with the World Wide Web. It considers the relationship between people and technology, the ways that society and technology complement one another and the way they impact on broader society. These analyses are inherently associated with Big Data management issues.
The course is given in Como Campus by Marco Brambilla and Emanuele Della Valle.
Up-to-date calendar of the course on Google Docs: calendar
Official course page on Polimi site: web page
This is a (possibly partial) list of course materials related to the course:
LESSON 4. The logical architecture of a Big Data platform. Hands-on HIVE: context, read me, data (138MB!). Innovating the hadoop ecosystem: the approach of Berkeley Data Analytics Stack. Introduction to Spark.
LESSON 5. Let’s try Databricks Community Edition. From RDD, transformations, and actions to Datasets and SQL queries (Databricks’ notebooks). Notebooks developed in class: basics and word count (using hamlet.txt). Demo of 100x effect using the same data of the HIVE hands-on of lesson 4 (readme, 127 MB of data in parquet).
LESSON 8. Clustering and PCA
PAST YEAR’S CLASSES:
LESSON . Classification
LESSON . Recommendations
LESSON . Data Wrangling and Data Cleansing
LESSON . Web Search foundations
LESSON . Human Computation and Crowdsourcing
LESSON . RDF-S and OWL
40% of the grade of this course is granted based on the evaluation of a project work (see also the slides presented in the 1st lesson).
In order to access the data you shall use for the project, you have to sign a non-disclosure agreement (NDA). Please download it from http://bit.ly/WebSci2018NDA, complete it in digital form (so that I can read your email address), print it in two copies and bring it to the lecturer. As soon as you will submit the signed NDA, you will be able to access the data. Please, note that you will also be asked to sign a receipt of delivery.
Use this form to submit the proposals of your project work (http://bit.ly/WebSci2018SubmitPrjWork). The proposal includes the members of your group, a title and a short description of your proposal and the dataset you intend to use. Sending the form is not enough. Please, do not start working on your proposal straight away; wait for an email from prof. Emanuele Della Valle, which confirms the adequacy of your proposal.
The deadline for those that come to the lectures is October the 16th, 2017. For all the others, make sure that you contact prof. Emanuele Della Valle soon enough w.r.t. the exam session you want to come to. You have to submit the project 1 week before the exam session you want to attend.
Prof. Marco Brambilla obtained several Azure Passes from Microsoft. Please, refer to him to obtain them. The passes last 3 months, so do not ask for it unless you intend to start the project.