1. (2 points) Confidentiality, integrity, and availability form what is often referred to as the ___. Answer: CIA triad. 2. (2 points) ___ security, also called infrastructure security, protects the information systems that contain data and the people who use, operate, and maintain the systems. Answer: Physical 3. (2 points) When a distribution is negatively skewed, the mean is pulled in a right direction. (True/False) Answer: False 4. (2 points)Histogram are used with numeric data rather than with categorical data. (True/False) Answer: True 5. Write the R code that creates a vector of numerics from 0 to 100 that increment by 1. Answer: seq(0,100, by=1)
Answer short question:
1. (5 points) What will be the output of the following R code?
Answer: the numeric summary of the “Height” column in the
“01_heights_weights_genders1.csv” dataset. (1’) The numeric summary includes the minimum, 1st quartile, median, mean, 3rd quartile, maximum number. (4’)
2. (5 points) What is security analytics? List and explain three common
approaches used for Twitter spam detection. Answer: We define security analytics as the adaptation of techniques from data science to security challenges. (2’)
Spam detection using data analytics approach:
(1) Detection based on syntax analysis: The detection methods based on syntax analysis can be categorized into two parts: 1) key segment and2) tweet content. Key segment methods collect indicative segment such as keywords, username patterns and URLs to represent the context of tweets and posters. Tweet content methods focus on the text of the Tweet, there are currently three major techniques to represent textual content of tweets: TD-IDF (Term Frequency -Inverse Document Frequency), bag-of-words and sparse learning. (1’) (2) Detection based on feature analysis: The feature analysis-based detection methods include statistic information and social graph information. Statistic information is extracted from tweet statistic information, account statistic information and campaign statistic information. The social graph information is extracted from the macroscopic attribute of graph nodes as well as the relationships of graph nodes. (1’) (3) Detection based on blacklist: Blacklist detection methods rely on the third-party blacklisting techniques. (1’)
3. (5 points) How do you read a Boxplot in statistics?
Answer: Minimum number, Q1, Median, Q3, maximum number and outliers. (5’)