学堂在线数据科学理论与应用期末考试答案

wangke 学堂在线答案 2025-03-20 15:04:43 4

数据科学理论与应用 - 南京大学 - 学堂在线

1.单选题 (3分)

Which of the following areas of knowledge is NOT required of data scientists?

AComputer science and information technology

BMath and statistics

CBusiness knowledge

DBiomedical Engineering

正确答案：D

2.单选题 (3分)

Which of the following is Not a basic analytical approach to data science?

ARegression

BClassification

CDescriptive Statistics

DCluster

正确答案：C

3.单选题 (3分)

What is the main difference between supervised learning and unsupervised learning?

ASupervised learning is done using ground truth

BUnsupervised learning does not have labeled outputs

CSupervised learning aims to learn a function

DUnsupervised learning can infer the natural structure present in a set of data points

正确答案：A

4.单选题 (3分)

How many columns of a 10*10 long data type table will be converted to a wide data type for a column?

A17

B18

C19

D20

正确答案：B

5.单选题 (3分)

Which of the following descriptions of the weighted mean is incorrect.

Acalculated by multiplying the weight (or probability)

Bassociated with a particular event or outcome with its associated quantitative outcome

Cvery useful when calculating a theoretically expected outcome

Deach outcome has a different probability of occurring

正确答案：D

6.单选题 (3分)

The univariate visualizations don't include______.

Aboxplot

Bhistogram

Cline chart

Ddensity estimate

正确答案：C

7.单选题 (3分)

What data type does the code "a=as.vector(c(list('a',1),list('afo',222)))" will assign to a?

Avector

Blist

CNULL

DThe code will be Error

正确答案：B

8.单选题 (3分)

What output does the code "as.integer(as.factor(c(0,1)))" will have?

A[1] 0 1

B[1] 1 0

C[1] 1 2

D[1] 2 1

正确答案：C

9.单选题 (3分)

Which of the following statements about Tidy Data is incorrect

Aevery column is variable

Bevery row is an observation

Cevery cell is a single numerical value

DAll the above descriptions about Tidy Data are correct

正确答案：C

10.单选题 (3分)

For splitting the data by one or two categorical variables, what is most suitable for us?

Atheme()

Bgeom_bar()

Cfacet_grid()

Dfacet_wrap()

正确答案：C

11.单选题 (3分)

Which layer can provide a new perspective of data interpretation for visual analysis?

AThe facets layer

BThe theme layer

CThe coordinate layer

DThe statistics layer

正确答案：C

12.单选题 (3分)

If we want to change individual elements, such as the background color or font of our title, what functions can we use?

Ageom_bar()

Btheme()

Cfacet_grid()

DFacet_wrap()

正确答案：B

13.单选题 (3分)

In regression analysis, there are _____ main hypothesis tests.

Aone

Btwo

Cthree

Dfour

正确答案：D

14.单选题 (3分)

For example, the significance level is 0.05; the corresponding confidence level is( ) .

A93%

B94%

C95%

D96%

正确答案：C

15.单选题 (3分)

Which of the following code can present the result of regression？

AAnova()

BSummary()

CConfint()

DPredict()

正确答案：B

16.单选题 (3分)

Which of the following algorithms is not a decision tree algorithm?

AID3

BYolo v5

CCART

DC4.5

正确答案：B

17.单选题 (3分)

Which of the following algorithms is not the example of an eager learner?

AK-Nearest Neighbors

BLogistic regression

CDecision tree

DNaive bayes

正确答案：A

18.单选题 (3分)

The attribute selection measure used by CART is ______.

Ainformation gain

Binformation gain ratio

Cbasic information entropy

Dgini Index

正确答案：D

19.单选题 (3分)

In which type of clustering, do you need to use the concept of dendrogram?

APrototype-based clustering

BDensity-based clustering

CHierarchical clustering

DPartitioning clustering

正确答案：C

20.单选题 (3分)

Which strategy or algorithm below belongs to Hierarchical clustering?

AAGNES

BK-means

CDBSCAN

DSMC

正确答案：A

21.单选题 (3分)

What should be alerted when you use a collaborative filtering strategy?

AIt determines the features of items that can be used to measure their similarity.

BIt could be useless at the beginning since the records you have are not enough.

CIt won't recommend an item that hasn't been bought before.

DThe "over-specialization" problem still exists.

正确答案：B

22.多选题 (4分)

What skills do data scientists need to use to deal with data?

AThe machine learning algorithms

BThe knowledge of programming languages

CProcessing of financial statements

DData visualization knowledge

正确答案：A,B,D (少选不得分)

23.多选题 (4分)

In general, histograms are plotted such that

Aempty bins are included in the graph.

Bbins are equal in width.

Cthe number of bins is up to the user.

Dbars are contiguous. That is, no empty space shows between bars unless there is an empty bin

正确答案：A,B,C,D (少选不得分)

24.多选题 (4分)

The common problems we can find with raw data can be______.

Anamely missing data

Bnoisy data

Cunstructured data

Dinconsistent data

正确答案：A,B,D (少选不得分)

25.多选题 (4分)

Which belong to auxiliary layers of the ggplot2 package?

AData

BFacets

CStatistics

DGeometries

正确答案：B,C (少选不得分)

26.多选题 (4分)

Which belongs to the classical OLS assumptions for linear regression?( )

Athe regression model is linear in the coefficients and the error term.

Ball independent variables are uncorrelated with the error term.

Cthe error term has a constant variance.

Dthe error term is normally distributed.

正确答案：A,B,C,D (少选不得分)

27.多选题 (4分)

Which of the following are the advantages of the decision tree algorithms?

AHard to overfit

BDifferent attribute division methods have different preferences for attribute selection

CAbility to fit data with irrelevant features and missing value

DEasy to understand, explain and visually analyze

正确答案：C,D (少选不得分)

28.多选题 (4分)

User-based and item-based filtering have different performances in different situations. Which choices below are correct?

AUser-based filtering is more suitable for time-sensitive items like news.

BItem-based filtering is more suitable when items are simple and relatively stable.

CUser-based filtering is more suitable when the number of users is more significant than the items.

DItem-based filtering is more suitable for tailoring to personal taste.

正确答案：A,B,D (少选不得分)

29.判断题 (1分)

Raw data is the original data provided by the users or collected through some techniques, such as crawlers.

正确答案：错误

30.判断题 (1分)

We can only talk about the correlation between the two variables.

正确答案：错误

31.判断题 (1分)

If an analysis requires data preprocessing, it must be done before data analysis.

正确答案：正确

32.判断题 (1分)

When we create a plot skeleton, we first need to think about how to map the data variables to the aesthetics in the graph.

正确答案：错误

33.判断题 (1分)

Hypothesis testing helps you prove if your data is statistically significant and unlikely to have occurred by chance alone.

正确答案：正确

34.判断题 (1分)

To address this concern, nearest-neighbor methods often use weighted voting or similarity moderated voting such that each neighbor's contribution is scaled by its similarity.

正确答案：正确

35.判断题 (1分)

In hierarchical clustering, you can choose the number of clusters depending on the dendrogram it produces, and can always turn back after making the wrong decision.

正确答案：错误

36.判断题 (1分)

We can use correlation analysis to predict a driver's travel time by using miles traveled and number of deliveries.（ )

正确答案：错误

37.判断题 (1分)

In the narrow sense, a data science product is a product facilitated with a particular data science technique.

正确答案：正确