Internet measurement and data analysis
2011 Fall Semester, Wednesday (14:45-16:15)
Faculty: Kenjiro Cho (kjc at sfc.keio.ac.jp)
TA: Yohei Kuga (sora at sfc.wide.ad.jp)
SA: Midori Kato (katoon at ht.sfc.keio.ac.jp), Ryo Nakamura (upa at sfc.wide.ad.jp)
Class home page: http://web.sfc.keio.ac.jp/~kjc/classes/sfc2011f-measurement/
Class support mail (Faculty, TA and SA): imda at sfc.wide.ad.jp
Now that the Internet has become a social infrastructure, it becomes
increasingly important to understand the current usage and behavior of
the Internet and predict the future, not only for technical aspects
but also for investment decisions and policy making.
However, it is challenging to grasp the Internet that is gigantic
and complex systems; while it is not realistic to perform large-scale
measurement covering the entire Internet, it is often the case that
traditional sampling methods cannot be applied. Moreover, there are
various technical, social, economical, and legal constraints, and we
need to solve problems under these constraints.
In this class, you will learn about the overview of Internet
measurement and large-scale data analysis, and basic skills for the
forthcoming information society to obtain new knowledge from massive
Theme, Goals, Methods
In this class, you will learn about Internet measurement and data
analysis methods, to obtain knowledge and understanding of networking
technologies and large-scale data analysis.
Each class will provide specific topics where you will learn problems,
constraints, and solutions. At the same time, you will learn
technical and theoretical backgrounds of the topics such as networking
technologies, statistics, and algorithms.
Each class consists of a lecture, and exercises on data analysis.
The lecture slide materials will be provided online.
 Mark Crovella and Balachander Krishnamurthy.
Internet measurement: infrastructure, traffic, and applications.
 Antonio Nucci and Konstantina Papagiannaki.
Design, Measurement and Management of Large-Scale IP Networks:
Bridging the Gap Between Theory and Practice.
Cambridge University Press, 2008.
 Pang-Ning Tan, Michael Steinbach and Vipin Kumar.
Introduction to Data Mining.
Addison Wesley, 2006.
 Raj Jain.
The art of computer systems performance analysis.
2 assignments and a final report.
The prerequisites for the class are basic programming skills and basic
knowledge about statistics.
In the exercises and assignments, you will need to write programs to
process large data sets, using the Ruby scripting language and the
Gnuplot plotting tool.
To understand the theoretical aspects, you will need basic knowledge
about algebra and statistics. However, the focus of the class is to
understand how mathematics is used for engineering applications.
Class 1 Introduction (9/28)
Network measurement and Internet measurement,
network management tools,
network measurement tools,
exercise: introduction of Ruby scripting language,
Class 2 Measuring the size of the Internet (10/5)
the number of users and hosts,
the number of web pages,
precision, errors, significant digit,
how to make good graphs,
exercise: graph plotting by Gnuplot(
Class 3 Data recording and log analysis (10/12)
log analysis methods,
exercise: log data and regular expression
(access log data,
Class 4 Measuring the speed of the Internet (10/19)
inferring available bandwidth,
mean, standard deviation,
exercise: mean, standard deviation, linear regression,
Class 5 Measuring the structure of the Internet (10/26)
exercise: topology analysis
Class 6 Measuring the characteristics of the Internet (11/2)
delay, packet loss, jitter,
correlation and multivariate analysis,
principal component analysis,
exercise: correlation analysis,
Class 7 Measuring the diversity and complexity of the Internet (11/9)
exercise: histogram, CDF,
Class 8 Distributions (11/16)
normal distribution and other distributions,
exercise: generating distributions, confidence intervals,
Class 9 Discussion (11/18, Friday) 9:25-10:55 e11
Class 10 Discussion (11/18, Friday) 11:10-12:40 e11
Class 11 Measuring time series of the Internet (11/30)
Internet and time,
network time protocol,
time series analysis,
exercise: time series analysis,
NO Class (12/7)
Class 12 Measuring anomalies of the Internet (12/14)
exercise: anomaly detection,
Class 13 Data mining (12/21)
(k-means.rb, km-1.txt, km-2.txt, km-3.txt),
Class 14 Scalable measurement and analysis (1/11)
distributed parallel processing,
Class 15 Summary (1/18)
summary of the class,
Internet measurement and privacy issues
$Date: 2012/01/10 14:12:00 $