Internet measurement and data analysis
-
2011 Fall Semester, Wednesday (14:45-16:15)
-
Faculty: Kenjiro Cho (kjc at sfc.keio.ac.jp)
-
TA: Yohei Kuga (sora at sfc.wide.ad.jp)
-
SA: Midori Kato (katoon at ht.sfc.keio.ac.jp), Ryo Nakamura (upa at sfc.wide.ad.jp)
-
Class home page: http://web.sfc.keio.ac.jp/~kjc/classes/sfc2011f-measurement/
-
Class support mail (Faculty, TA and SA): imda at sfc.wide.ad.jp
Overview
Now that the Internet has become a social infrastructure, it becomes
increasingly important to understand the current usage and behavior of
the Internet and predict the future, not only for technical aspects
but also for investment decisions and policy making.
However, it is challenging to grasp the Internet that is gigantic
and complex systems; while it is not realistic to perform large-scale
measurement covering the entire Internet, it is often the case that
traditional sampling methods cannot be applied. Moreover, there are
various technical, social, economical, and legal constraints, and we
need to solve problems under these constraints.
In this class, you will learn about the overview of Internet
measurement and large-scale data analysis, and basic skills for the
forthcoming information society to obtain new knowledge from massive
information.
Syllabus
Theme, Goals, Methods
In this class, you will learn about Internet measurement and data
analysis methods, to obtain knowledge and understanding of networking
technologies and large-scale data analysis.
Each class will provide specific topics where you will learn problems,
constraints, and solutions. At the same time, you will learn
technical and theoretical backgrounds of the topics such as networking
technologies, statistics, and algorithms.
Each class consists of a lecture, and exercises on data analysis.
Textbooks, References
The lecture slide materials will be provided online.
ruby: http://www.ruby-lang.org/
gnuplot: http://gnuplot.info/
[1] Mark Crovella and Balachander Krishnamurthy.
Internet measurement: infrastructure, traffic, and applications.
Wiley, 2006.
[2] Antonio Nucci and Konstantina Papagiannaki.
Design, Measurement and Management of Large-Scale IP Networks:
Bridging the Gap Between Theory and Practice.
Cambridge University Press, 2008.
[3] Pang-Ning Tan, Michael Steinbach and Vipin Kumar.
Introduction to Data Mining.
Addison Wesley, 2006.
[4] Raj Jain.
The art of computer systems performance analysis.
Wiley, 1991.
Evaluation
2 assignments and a final report.
Prerequisites
The prerequisites for the class are basic programming skills and basic
knowledge about statistics.
In the exercises and assignments, you will need to write programs to
process large data sets, using the Ruby scripting language and the
Gnuplot plotting tool.
To understand the theoretical aspects, you will need basic knowledge
about algebra and statistics. However, the focus of the class is to
understand how mathematics is used for engineering applications.
Schedule
Class 1 Introduction (9/28)
Network measurement and Internet measurement,
network management tools,
network measurement tools,
exercise: introduction of Ruby scripting language,
lecture slides
Class 2 Measuring the size of the Internet (10/5)
the number of users and hosts,
the number of web pages,
precision, errors, significant digit,
how to make good graphs,
exercise: graph plotting by Gnuplot(
marathon,
stock prices),
lecture slides
Class 3 Data recording and log analysis (10/12)
data format,
log analysis methods,
exercise: log data and regular expression
(access log data,
test data,
scripts)
lecture slides
Class 4 Measuring the speed of the Internet (10/19)
bandwidth measurement,
inferring available bandwidth,
mean, standard deviation,
linear regression,
exercise: mean, standard deviation, linear regression,
assignment 1
lecture slides
Class 5 Measuring the structure of the Internet (10/26)
Internet architecture,
network layers,
topologies,
graph theory,
exercise: topology analysis
(dijkstra.rb,
topology.txt),
lecture slides
Class 6 Measuring the characteristics of the Internet (11/2)
delay, packet loss, jitter,
correlation and multivariate analysis,
principal component analysis,
exercise: correlation analysis,
(correlation.rb),
lecture slides
Class 7 Measuring the diversity and complexity of the Internet (11/9)
sampling,
statistical analysis,
histogram,
exercise: histogram, CDF,
lecture slides
Class 8 Distributions (11/16)
normal distribution and other distributions,
confidence intervals,
statistical tests,
exercise: generating distributions, confidence intervals,
assignment 2,
lecture slides
Class 9 Discussion (11/18, Friday) 9:25-10:55 e11
ref-1
ref-2
Class 10 Discussion (11/18, Friday) 11:10-12:40 e11
Class 11 Measuring time series of the Internet (11/30)
Internet and time,
network time protocol,
time series analysis,
exercise: time series analysis,
(autocorr.rb, autocorr_5min_data.txt),
lecture slides
NO Class (12/7)
Class 12 Measuring anomalies of the Internet (12/14)
anomaly detection,
spam filters,
Bayes' theorem,
exercise: anomaly detection,
lecture slides
Class 13 Data mining (12/21)
pattern extraction,
classification,
clustering,
exercise: clustering
(k-means.rb, km-1.txt, km-2.txt, km-3.txt),
lecture slides
Class 14 Scalable measurement and analysis (1/11)
distributed parallel processing,
cloud technology,
lecture slides
Class 15 Summary (1/18)
summary of the class,
Internet measurement and privacy issues
$Date: 2012/01/10 14:12:00 $