学堂在线大数据分析(全英文)期末考试答案 - 北京理工大学
您已经看过
[清空]
    fa-home|fa-star-o
    当前位置:网课宝盒>学堂在线答案>学堂在线大数据分析(全英文)期末考试答案 - 北京理工大学

    学堂在线大数据分析(全英文)期末考试答案 - 北京理工大学

    大数据分析(全英文) - 北京理工大学 - 学堂在线

    1.判断题 (2分)

    Item based collaborative Filtering algorithm calculate the item similarity according to item features. ( )

    2.判断题 (2分)

    Database provides the physical storage structure; ( )

    3.判断题 (2分)

    we can find one kind of tool to deal with all the data manage problems of the Database. ( )

    4.判断题 (2分)

    we can find one kind of tool to deal with all the data manage problems of the Big data. ( )

    5.判断题 (2分)

    The data in HDFS is immutable. ( )

    6.判断题 (2分)

    In HDFS, each storage file is first divided into multiple data blocks with a flexible length according to the data size. ( )

    7.判断题 (2分)

    Hadoop is the only big data architecture.

    8.判断题 (2分)

    HDFS support batch reading, writing operation and updating operation. ( )

    9.判断题 (2分)

    DFS distributed file system provides the logical storage structure of the data. ( )

    10.判断题 (2分)

    In customer Collaborative filtering, the similar users are defined based on the common items they purchased ( )

    11.主观题 (10分)

    Recommendation System – matrix decomposition Task:

    You are given a dataset collected through the MovieLens web site (movielens.umn.edu) during the seven-month period from September 19th, 1997 through April 22nd, 1998. It includes 100,000 ratings (1-5) from 943 users on 1682 movies.

    You need to use the matrix decomposition method to predict the missing values of the rating matrix to complete a recommendation system so that you can recommend movies to a user based on the predicted ratings.

    Submission Requirement:

    Put submit following 3 screenshots of the 3-fold cross validation results of rating prediction in a single PDF file -- ID_NAME.PDF

    matrix decomposition DATA.rar

    代码语言

    字数统计

    文档将自动保存

    添加附件

    ( 可上传1个附件,文件不超过100M) ?

    上传附件,允许上传一个附件,100M以内

    占位

    12.单选题 (2分)

    The two main components of big data are ( ) and ( ). ()

    ADistributed Storage, Distributed Processing

    BDistributed Collection, Distributed Processing

    CDistributed Collection, Distributed Storage

    DDistributed Collection, Distributed application

    13.单选题 (2分)

    MPP (Massively Parallel Processing) improves performance through ( ) parallelism. ( ) coordinates work with ( ) , ( ) coordinates work with one or more ( ). ( ) process queries in parallel. ( ) have their own CPU disk memory in shared nothing architecture. High speed interconnect for continuous pipelining of data processing. ( )

    Asegment hosts, Master, segment host, Segment host, segment instances, Segment instances, Segment hosts

    Bsegment instance, Master, segment host, Segment host, segment instances, Segment instances, Segment hosts

    Csegment instance, Master, segment host, Segment host, segment instances, Segment instances, Segment instances

    Dsegment hosts, Master, segment host, Segment host, segment instances, Segment hosts, Segment hosts

    14.单选题 (2分)

    Which of following description about the search interface of deep web is NOTcorrect ()

    Ahas complex interfaces

    Bsupports queries on several attributes

    Cextracts contents from databases

    Deasy to find

    15.单选题 (2分)

    The history progress of harnessing data is that () 

    1) ()reporting and human analysis can be made on historical data 

    2) () can analyze the current data to improve business transaction 

    3) () Real-Time Analytics Processing to make the Realtime decision and improve Realtime business response

    AOLAP: Online Analytical Processing; OLTP: Online Transaction Processing; RTAP: Real-Time Analytics Processing;

    BOLTP: Online Transaction Processing; OLAP: Online Analytical Processing; RTAP: Real-Time Analytics Processing;

    COLAP: Online Analytical Processing; RTAP: Real-Time Analytics Processing; OLTP: Online Transaction Processing;

    DOLTP: Online Transaction Processing; RTAP: Real-Time Analytics Processing; OLAP: Online Analytical Processing;

    16.单选题 (2分)

    Based on the requirement we can build the business model, it includes ( ) and ( ). ( )

    AConceptual Model, Logic Model

    BLogic Model, Physical Model

    CProcess model, Data model

    DProcess model, logical model

    17.单选题 (2分)

    Which of the following big data characters best describes Data in Many Forms? ( )

    AVolume

    BVariety

    CVeracity

    DVelocity

    18.单选题 (2分)

    The most often used internal data acquisition tool is ( )

    ADatawarehouse

    BETL (Extract, Transform, load)

    CData Trigger

    DIncremental data extraction

    19.单选题 (2分)

    Deep web content includes () 

    1 Pages that are not referred to by search engines due to lack of directed links 

    2 Non-web files accessible on the web, such as picture files, Pdf and word documents, etc. 

    3 A dynamic page obtained by querying the back-end online database by filling in the form. 

    4 Content that requires registration or other restrictions to access.

    A1234

    B124

    C123

    D234

    20.单选题 (2分)

    TensorFlow allows developers to create ( )-structures that describe how data moves through a ( ), or a series of processing nodes. Each node in the graph represents a ( ), Each connection or Edge between nodes is a ( ).

    ADataflow Graphs, Graph (DAG), multidimensional data array or tensor, mathematical operation

    BGraph (DAG), Dataflow Graphs, mathematical operation, multidimensional data array or tensor

    CDataflow Graphs, Graph (DAG), mathematical operation, multidimensional data array or tensor

    DGraph (DAG), Dataflow Graphs, mathematical operation, multidimensional data array or tensor

    21.单选题 (2分)

    Database connection programming interfaces such as ( ) can support SQL access by applications to the database, but they cannot provide complex functions such as transaction management, concurrent scheduling, buffer management, heterogeneous database conversion and inheritance in a distributed computing environment. This introduces the ( ). It is a layer of software that provides data exchange functions on top of the database. When the system is extended and need to access cross-platform heterogeneous databases, OS could be UNIX, Linux or Windows, forms could be mails, XML documents, EJB components, Web services, images, audio/video files or For other unstructured data, And the technology of the big data application layer is also diversified and various standards. The design of the ( ) needs to be compatible with various standard technologies and products, which introduces the ( ). 

    AODBC and JDBC; DAL data access layer; Unified data access interface; Unified data access interface;

    BODBC and JDBC; DAL data access layer; DAL data access layer; Unified data access interface;

    CDAL data access layer; ODBC and JDBC; DAL data access layer; Unified data access interface;

    DODBC and JDBC; DAL data access layer; Unified data access interface; DAL data access layer;

    22.单选题 (2分)

    Which of the following is NOT the dimensionality reduction? ()

    AWavelet transformation

    BAttribute subset selection

    CPrincipal component analysis

    DData Cube Aggregation

    23.单选题 (2分)

    The ( ) annotation transparently translates your Python programs into TensorFlow graphs. ()

    ATf.keras

    Btf.function

    CPremade Estimators

    Dtf.data

    24.单选题 (2分)

    Web crawler crawling process is (B)

    a) A list of uniform resource addresses called seed URL and use it as the link entry for crawling. When the crawler visits these seed URL s, it identifies all the needed links on the page and adds them to the queue to be crawled.

    b) Put the already downloaded URL into the crawled URL list

    c) Extract the new URL into the URL queue to be crawled and put them in the to be crawled URL queue according to strategy

    d) The webpage links are taken out from the queue to be crawled, then Read URL, do the DNS resolution, and web pages were download into the Downloaded web library.

    e) all the process will end until the queue for crawling is empty.

    Aabcde

    Badbce

    Cacbde

    Dadcbe

    25.单选题 (2分)

    Which of the following statement of data reduction is NOT right? ( )

    AData reduction (subtraction) technology is used to help obtain a condensed data set from the original huge data set, and make this condensed data set maintain the integrity of the original data set

    BData analysis on the condensed data set is obviously efficient higher, and the results of analysis are basically the same as those obtained by using the original data set

    CThe time spent on data reduction could exceed or "offset" the time saved by analysis on the reduced data.

    DThe data obtained by the reduction is much smaller than the original data, but can produce the same or almost the same analysis results.

    26.单选题 (2分)

    The execution model is based on BSP (Bulk Synchronous Processing) model. In this model, there are multiple processing units proceeding in parallel in a sequence of "Supersteps".Within each "Superstep", the processing sequence will be ()

    a)each processing units first receive all messages delivered to them from the preceding "superstep",

    b)When all the processing unit finishes the message delivery (hence the synchronization point)

    c)may queue up the message that it intends to send to other processing units.

    d)The queued up message will be delivered to the destined processing units but won't be seen until the next "superstep".

    e)manipulate their local data

    f) the next superstep can be started,

    g)the cycle repeats until the termination condition has been reached.

    Aaedcbfg

    Baecdbfg 

    Cacedbfg

    Dadecbfg

    27.单选题 (2分)

    ( ) extract new or modified data in the database since the last extraction, at the same time, it normally would not have a big impact on the running business system. ()

    AIncremental data extraction

    BFull extraction

    CTimestamp Extraction

    DTrigger

    28.单选题 (2分)

    HANA improved the data analysis performance in data warehouse, Not because () 

    AIt eliminates unnecessary complexity and latency

    BAccelerate through simplification

    CLeveraging the power of in-memory computing allows HANA to bring OLTP, transaction processing, and OLAP, data analytics, back together in one database.

    DSpecialized data warehouses for reporting and analytics required the moving, transformation and pre-processing of transactional data, which introduces a huge complexity: sometimes an enterprise may hold three different copies of the same data

    29.单选题 (2分)

    The process begins by the ( ) issuing a query that is then passed to the ( ) . The ( ) contains information, such as the data dictionary and session information, which it uses to generate an ( )designed to retrieve the needed information from each underlying Node. Parallel Execution represents the implementation of the ( ) through the parallel computing of Node 1 to Node n. And the query results return to master node. ()

    AClient, Master Node, Master Node, execution plan, execution plan

    BMaster Node, Client, Master Node, execution plan, storing plan

    CClient, Master Node, Client, execution plan, execution plan

    DMaster Node, Client, Master Node, execution plan, storing plan

    30.单选题 (2分)

    In the many components of Spark, which is designed for Machine Learning? ( )

    ASpark SQL

    BSpark streaming

    CMLlib

    DGraph X

    31.单选题 (2分)

    Which description is not sure about Jim Gray ( )

    ARelational database founder

    BNautical sport enthusiast

    CDivided scientific research into four types of paradigms

    DBig data scientist

    32.单选题 (2分)

    The right order of reading data in HDFS. 

    a)Distributed Filesystem makes an RPC call to the namenode to determine location of datanodes where files are stored in form of blocks. For each block, the namenode returns address of datanodes (metadata of blocks and datanodes) that have a copy of block. Datanodes are sorted according to proximity (depending of network topology information).

    b)The client opens the file by calling open () method on Distributed Filesystem.

    c)The client then calls read () on the stream. DFSInputStream, which has stored the datanode addresses for the first few blocks in the file, then connects to the first (closest) datanode for the first block in the file.

    d)The Distributed Filesystem returns an FSDataInputStream (an input stream that supports file seeks) to the client for it to read data from. FSDataInputStream in turn wraps a DFSInputStream, which manages the datanode and namenode I/O.

    e)Data is streamed from the datanode back to the client (in the form of packets) and read () is repeatedly called on the stream by client.

    f) When the client has finished reading, it calls close () on the FSDataInputStream

    g)When the end of the block is reached, DFSInputStream will close the connection to the datanode, then find the best datanode for the next block.

    4-2-2.jpg

    AABDCEGF

    BBADCEGF

    CBADCEFG

    DBACDEGF

    33.单选题 (2分)

    Which of the following stage is the main reason of big data? ( )

    AOperation and business system

    BUser-generated content

    CPerception stage

    Dsocial media

    34.单选题 (2分)

    About Data Modeling design levels descriptions: Which one is correct matching?( C )

    1)Based on the user's data function requirements. functions and association relationships are obtained, Entity Class corresponding to the business elements and functions.

    2)More details of data entities, including primary keys, foreign keys, attributes, indexes, relationships, constraints, and even views, with data tables, data columns, value ranges, object-oriented classes, XML tags and other forms to describe.

    3)The storage implementation of data, including data partition, data table space, and data integration.

    A1-Conceptual model design 2-physical model design3- logical model design

    B1- Logical model design 2-Physical model design3- Conceptual model design

    C1-Conceptual model design 2- logical model design3- Physical model design

    D1- Physical model design2- Conceptual model design3- logical model design

    35.单选题 (2分)

    Spark has several components to facilitate different type of computing tasks, like streaming,Graph etc. the components include ( )

    1)Spark Core API2)Resilient distributed dataset (RDD),

    3)Spark SQL 4)Spark topology

    5)Spark Streaming 6)MLlib (Machine Learning Library)

    7)GraphX8)Sklearn

    A12345

    B13456

    C13567

    D13578

    36.单选题 (2分)

    The correct big data lifecycle is ( )

    Adata governance data collecting, data storing and data analyzing

    Bdata collecting, data governance, data storing and data analyzing

    Cdata collecting, data storing, data governance and data analyzing

    Ddata collecting, data storing, data analyzing and data governance

    37.单选题 (2分)

    Data cleaning technology does not include ( )

    AData transformation

    BCleaning of missing data

    CDeduplication of data

    DPerform anomaly detection on the data set

    38.单选题 (2分)

    Which of the following is NOT the numerosity reduction? ()

    APrincipal component analysis

    BData Cube Aggregation

    CClustering

    DSampling

    39.单选题 (2分)

    In the execution of Graph Parallel Computing, which describes the roles of the master?( )

    1)coordinate the execute of supersteps in sequence

    2)signals the beginning of a new superstep to all workers after knowing all of them has completed the previous one

    3)pings each worker to know their processing status

    4)periodically issue "checkpoint" command to all workers who will then save its partition to a persistent graph store

    A123

    B134

    C124

    D1234

    40.单选题 (2分)

    Among the following which one is about idea, learning, notion, concept, synthesized, compared, thought-out, discussed? ()

    AData

    BInformation

    CWisdom

    DKnowledge

    41.单选题 (2分)

    In the following, which one is shared nothing architecture. ( )

    ASMP

    BNUMA

    CMPP

    DNone of them

    42.单选题 (2分)

    There are only 2 kinds of operation of RDD (Resilient Distributed Dataset), ( ). In ( ), data can be filter, joined map, reduced but no calculation is executed, only in ( ) the calculation can be done, and the value result can be generated. ( )

    Amap and reduce, map, reduce

    Btransformations and action, action, transformation

    Ctransformations and action, transformation, action

    Dmap and reduce, reduce, map

    43.单选题 (2分)

    What attributes subset selection method showed in the diagram? ( ) 

    AForward Stepwise Attributes subset selection

    BBackward Stepwise Attributes subset selection

    CCombine forward selection and backward deletion

    DDecision tree (decision tree) induction

    44.单选题 (2分)

    Relational databases and NoSQL databases have their own advantages and disadvantages and cannot be replaced by each other, () application scenarios: Key business systems in telecommunications, banking and other fields need to ensure strong transaction consistency; () application scenarios: Non-critical business (such as data analysis) of Internet companies, traditional companies. 

    ANoSQL database Relational database;

    BRelational database; NoSQL database

    CNoSQL database, NoSQL database;

    DRelational database; Relational database

    45.单选题 (2分)

    ( )is a user-friendly API standard for machine learning, will be the central high-level API used to build and train models. ( )

    ASaveModel

    BTensorFlowHub

    CPremade Estimators

    DTf.keras

    46.单选题 (2分)

    1.Which of the following are the choices of attributes subset selection methods?( C )

    1)Forward Stepwise Attributes subset selection

    2)Backward Stepwise Attributes subset selection

    3)Combine forward selection and backward deletion

    4)Principal component analysis

    5)Reduction based on statistical analysis

    6) Decision tree (decision tree) induction

    A12346

    B12345

    C12356

    D123456

    47.单选题 (2分)

    According to Gartner, there is estimated 20% data of organization is ( ) data, the other majority is ( ) data. ()

    Astructured, unstructured

    Bunstructured, structured

    Cstructured, semi-structured

    Dunstructured, semi-structured

    48.单选题 (2分)

    In the following picture, what are the right terms for each number ?

    test 1-6.jpg

    AData sources, Data storage, Data collection, Data Processing, Data Visualization, Report monitoring

    BData sources, Data collection, Data storage, Data Visualization, Data Processing, Report monitoring

    CData sources, Data collection, Data storage, Data Processing, Data Visualization, Report monitoring

    DData sources, Data collection, Data storage, Data Processing, Report monitoring, Data Visualization

    49.单选题 (2分)

    How to deal with fan-out URLs in seed URLs, which is the links of the link, which involves web crawler crawling strategies. Which one is not the often used Crawling strategies ( )

    ADepth first

    BBreadth first

    CFirst In-First out

    DPartial PageRank Strategy

    50.单选题 (2分)

    ( ) is responsible for resource monitoring and job scheduling, ( ) monitors the health status of all ( ) and Jobs, and if it finds a failure, it will transfer the corresponding tasks to other nodes. ( ) will track the task execution progress, resource usage, and other information, and inform the ( ), and ( ) will select the appropriate task to use these resources when resources become free. ( )

    AJobTracker, JobTracker, TaskTrackers, JobTracker ,TaskScheduler, TaskScheduler

    BJobTracker, TaskTrackers, JobTracker, JobTracker ,TaskScheduler, TaskScheduler

    CJobTracker, JobTracker, JobTracker , TaskTrackers,TaskScheduler, TaskScheduler

    DJobTracker, JobTracker, TaskTrackers, TaskScheduler, JobTracker ,TaskScheduler

    51.单选题 (2分)

    Which of the following is NOT data transform component? ( )

    AField mapping

    BData calculation

    CData split

    DEliminate duplication


    学堂在线大数据分析(全英文)期末考试答案 - 北京理工大学》由《网课宝盒》整理呈现,请在转载分享时带上本文链接,谢谢!

    电大答案

    支持Ctrl+Enter提交
    网课宝盒 © All Rights Reserved.  联系我们:QQ 997755178
    蜀ICP备18035410号-3|网站标签|站点地图|

    当前文章名称

    手机号用于查询订单,请认真核对

    支付宝
    立即支付

    请输入手机号或商家订单号

    商家订单号在哪里?点此了解

    你输入的数据有误,请确认!

    如已购买,但查不到

    可联系客服QQ 55089918 进行核实