学堂在线Big Data Analysis 期末考试答案
您已经看过
[清空]
    fa-home|fa-star-o
    当前位置:网课宝盒>学堂在线答案>学堂在线Big Data Analysis 期末考试答案

    学堂在线Big Data Analysis 期末考试答案

    Big Data Analysis - 北京理工大学 - 学堂在线

    1.单选题(2分)In big data general architecture, there are three parts in data processing system, which one best describes them? ( ) 

    Data storage, Data processing algorithm, computing engine and platform

    Data storage, computing model, computing engine and platform

    C

    Data processing algorithm, computing model, computing engine and platform

    Data processing algorithm, computing engine, platform

    正确答案:C

    2.单选题(2分)Based on the requirement we can build the business model, it includes ( ) and ( ). ( )

    Conceptual Model, Logic Model

    Logic Model, Physical Model

    C

    Process model, Data model

    Process model, logical model

    正确答案:C

    3.单选题(2分)( ) extract new or modified data in the database since the last extraction, at the same time, it normally would not have a big impact on the running business system. () 

    A

    Incremental data extraction

    Full extraction

    Timestamp Extraction

    Trigger

    正确答案:A

    4.单选题(2分)Which of the following big data characters best describes Data in Doubt (which means Uncertainty due to data inconsistency and incompleteness, ambiguities, latency, deception, model approximations)? ( ) 

    Volume

    Variety

    C

    Veracity

    Velocity

    正确答案:C

    5.单选题(2分)Which of the following is NOT the Advantages of NoSQL database? 

    Can support ultra-large-scale data storage

    Flexible data model can well support Web2.0 applications

    strong horizontal expansion capabilities

    D

    Mathematical theoretical foundation

    正确答案:D

    6.单选题(2分)MPP(Massively Parallel Processing) improves performance through ( )parallelism. ( ) coordinates work with ( ) , ( ) coordinates workwith one or more ( ). ( ) process queries in parallel. ( ) havetheir own CPU disk memory in shared nothing architecture. High speedinterconnect for continuous pipelining of data processing. ( ) 

    segmenthosts, Master, segment host, Segment host, segment instances,Segment instances, Segment hosts

    B

    segmentinstance, Master, segment host, Segment host, segment instances,Segment instances, Segment hosts

    segmentinstance, Master, segment host, Segment host, segment instances,Segment instances, Segment instances

    segmenthosts, Master, segment host, Segment host, segment instances,Segment hosts, Segment hosts

    正确答案:B

    7.单选题(2分)The right order of reading data in HDFS. 

    a)Distributed Filesystem makes an RPC call to the namenode to determine location of datanodes where files are stored in form of blocks. For each block, the namenode returns address of datanodes (metadata of blocks and datanodes) that have a copy of block. Datanodes are sorted according to proximity (depending of network topology information).

    b)The client opens the file by calling open () method on Distributed Filesystem.

    c)The client then calls read () on the stream. DFSInputStream, which has stored the datanode addresses for the first few blocks in the file, then connects to the first (closest) datanode for the first block in the file. 

    d)The Distributed Filesystem returns an FSDataInputStream (an input stream that supports file seeks) to the client for it to read data from. FSDataInputStream in turn wraps a DFSInputStream, which manages the datanode and namenode I/O.

    e)Data is streamed from the datanode back to the client (in the form of packets) and read () is repeatedly called on the stream by client.

    f)When the client has finished reading, it calls close () on the FSDataInputStream

    g)When the end of the block is reached, DFSInputStream will close the connection to the datanode, then find the best datanode for the next block.

    4-2-2.jpg

    ABDCEGF

    B

    BADCEGF

    BADCEFG

    BACDEGF

    正确答案:B

    8.单选题(2分)What attributes subset selection method showed in the diagram?

    Test 3-5-2.jpg

    A

    Forward Stepwise Attributes subset selection

    Backward Stepwise Attributes subset selection

    Combine forward selection and backward deletion

    Decision tree (decision tree) induction

    正确答案:A

    9.单选题(2分)According to organization boundary, data resources can be divided into 2 categories. ( ) 

    online data and offline data.

    organization data and government data.

    C

    internal data and external data.

    system data and IoT data.

    正确答案:C

    10.单选题(2分)( )is a user-friendly API standard for machine learning, will be the central high-level API used to build and train models. ( )

    SaveModel

    TensorFlowHub

    PremadeEstimators

    D

    Tf.keras

    正确答案:D

    11.单选题(2分)Distributed computing ‘s idea is to use the ( ) to achieve the ( ) () 

    A

    redundancy, reliability;

    reliability, redundancy;

    redundancy, performance;

    reliability, performance;

    正确答案:A

    12.单选题(2分)Superstep execution process is ( )

    1) Send messages to other nodes causing them active; 

    2) Modify node and arc properties;

    3) Remove the existing or creating new arcs;

    4) Receive message from inbox;

    5) Halt self until new message received;

    A

    42513

    12345

    42135

    42351

    正确答案:A

    13.单选题(2分)Which of the following big data characters like Panning for gold in the sand? ( ) 

    A

    Value

    Variety

    Veracity

    Velocity

    正确答案:A

    14.单选题(2分)About Big data term, which description is not suitable ( ) 

    Big data can be analyzed for insights of better decisions and strategic business moves

    B

    Just large

    Both structured and unstructured

    Hard-to-manage volumes of data

    正确答案:B

    15.单选题(2分)In big data general architecture, there are four parts in data storage system, which one best describes them? ( ) 

    A

    Data collection, data modeling, data storage including distributed file system and distributed Database, Unified Data Access Interface

    Data collection, data preprocessing, data storage including distributed file system and distributed Database, Unified Data Access Interface

    Data preprocessing, data modeling, data storage including distributed file system and distributed Database, Unified Data Access Interface

    Data preprocessing, data modeling, distributed file system; distributed Database

    正确答案:A

    16.单选题(2分)Inthe following picture, what are the right terms for each number ?

    test 1-6.jpg

    Data sources, Data storage, Data collection, Data Processing, Data Visualization, Report monitoring

    Data sources, Data collection, Data storage, Data Visualization, Data Processing, Report monitoring

    C

    Data sources, Data collection, Data storage, Data Processing, Data Visualization, Report monitoring

    Data sources, Data collection, Data storage, Data Processing, Report monitoring, Data Visualization

    正确答案:C

    17.单选题(2分)( ) is responsible for resource monitoring and job scheduling, ( ) monitors the health status of all ( ) and Jobs, and if it finds a failure, it will transfer the corresponding tasks to other nodes. ( ) will track the task execution progress, resource usage, and other information, and inform the ( ), and ( ) will select the appropriate task to use these resources when resources become free. ( ) 

    A

    JobTracker, JobTracker, TaskTrackers, JobTracker ,TaskScheduler, TaskScheduler

    JobTracker, TaskTrackers, JobTracker, JobTracker ,TaskScheduler, TaskScheduler

    JobTracker, JobTracker, JobTracker , TaskTrackers,TaskScheduler, TaskScheduler

    JobTracker, JobTracker, TaskTrackers, TaskScheduler, JobTracker ,TaskScheduler

    正确答案:A

    18.单选题(2分)According to Gartner, there is estimated 20% data of organization is ( ) data, the other majority is ( ) data. () 

    A

    structured, unstructured

    unstructured, structured

    structured, semi-structured

    unstructured, semi-structured

    正确答案:A

    19.单选题(2分)Which of the following is NOTthe dimensionality reduction? () 

    Wavelettransformation

    Attributesubset selection

    Principal component analysis

    D

    Data Cube Aggregation

    正确答案:D

    20.单选题(2分)Redundant and repeated records belongs to the data quality category () 

    Single data resource, model level

    B

    Single data resource, instance level

    Multiple data resource, model level

    Multiple data resource, instance level

    正确答案:B

    21.单选题(2分)Stormis a Native Stream Processing System, that is, the processing ofstream data is based on each piece of data, and its parallelcalculation is implemented based on a directed topology graph.Topology composed of data source- ( ) and processing unit- ( ).Topology Defines the ( ) of parallel computing, that is, designs thecalculation steps and processes from the perspective of function andarchitecture. ( ) 

    Bolt,Spout, physical model

    Spout,Bolt, physical model

    Bolt, Spout, logical model

    D

    Spout, Bolt, logical model

    正确答案:D

    22.单选题(2分)Why do we say data is like Crude oil? Which is not the reason? () 

    It is valuable

    It needs to be refined

    One data set can be adapted to be used for different purpose

    D

    It can be sold

    正确答案:D

    23.单选题(2分)Among the following which one is about organized, structured, categorized, useful, condensed, calculated data? () 

    Data

    B

    Information

    Wisdom

    Knowledge

    正确答案:B

    24.单选题(2分)Which of the following description about the architecture of the Graph Parallel Computing is NOT correct ()

    The whole graph is broken down into multiple "partitions"

    Each partition contains a large number of nodes

    Partition is a unit of execution and typically has an execution thread associated with it

    D

    A "worker" machine host one "partitions"

    正确答案:D

    25.单选题(2分)Spark has several components to facilitate different type of computing tasks, like streaming,Graph etc. the components include ()

    1)Spark Core API2)Resilient distributed dataset (RDD),

    3)Spark SQL4)Spark topology

    5)Spark Streaming6)MLlib (Machine Learning Library)

    7)GraphX8)Sklearn

    12345

    13456

    C

    13567

    13578

    正确答案:C

    26.单选题(2分)Data modeling could include defining.() 

    1)Metadata 

    2)Data structure 

    3)Attributes 

    4)Value range

    5)Association relationship

    6)Consistency

    7)Timeliness

    12345

    B

    1234567

    134567

    123567

    正确答案:B

    27.单选题(2分)Among the following which one is about idea, learning, notion, concept, synthesized, compared, thought-out, discussed? () 

    Data

    Information

    Wisdom

    D

    Knowledge

    正确答案:D

    28.单选题(2分)Spark advantages includes ()

    1)Fast processing 

    2)Flexibility 

    3)In-memory computing 

    4)Real-time processing 

    5)Better analytics 

    6)Fault tolerance

    7)Need extra persistent storage

    123567

    B

    123456

    134567

    234567

    正确答案:B

    29.单选题(2分)Thedifference between machine learning and deep learning, machinelearning algorithms employ ( ) for pattern recognition, Deeplearning is modeled using ( ), both can learn in a supervised orunsupervised way. ( ) 

    A

    Statisticalanalysis techniques, neural networks

    Neuralnetworks, statistical analysis techniques

    Statisticalanalysis techniques, Statistical analysis techniques

    Neuralnetworks, Neural networks

    正确答案:A

    30.单选题(2分)Among the following which one is about understanding, integration, applied, reflected upon, actionable, accumulated, principles, patterns, decision-making progress? ( ) 

    Data

    Information

    C

    Wisdom

    Knowledge

    正确答案:C

    31.单选题(2分)The execution model is based on BSP (Bulk Synchronous Processing) model. In this model, there are multiple processing units proceeding in parallel in a sequence of "Supersteps".Within each "Superstep", the processing sequence will be ()

    a)each processing units first receive all messages delivered to them from the preceding "superstep", 

    b)When all the processing unit finishes the message delivery (hence the synchronization point)

    c)may queue up the message that it intends to send to other processing units. 

    d)The queued up message will be delivered to the destined processing units but won't be seen until the next "superstep". 

    e)manipulate their local data 

    f)the next superstep can be started, 

    g)the cycle repeats until the termination condition has been reached.

    aedcbfg

    B

    aecdbfg 

    acedbfg

    adecbfg

    正确答案:B

    32.单选题(2分)In HDFS, the name node and the data node have their own responsibilities, select the responsibilities of name node and data node respectively. Name nodes ( ),Data nodes ( ) 

    1)Realize the mapping of data blocks to the local file system of the data node

    2)Manage file system namespace

    3)Store file data block

    4)Save “file to data block to data node” mapping relationship

    5)Scheduling client access to files

    6)store the Data blocks on the local disk

    7)Store the Metadata in memory for quick access

    1237, 456

    B

    2457, 136

    1245, 367

    2456, 137

    正确答案:B

    33.单选题(2分)( ) Evaluate the changed data in data extraction through the DB's own log. () 

    A

    Log comparison

    Timestamp

    Triggers

    Full table comparison

    正确答案:A

    34.单选题(2分)1.Which of the following are the choices of attributes subset selection methods?( C )

    1)Forward Stepwise Attributes subset selection

    2)Backward Stepwise Attributes subset selection

    3)Combine forward selection and backward deletion

    4)Principal component analysis

    5)Reduction based on statistical analysis

    6)Decision tree (decision tree) induction

    12346

    12345

    C

    12356

    123456

    正确答案:C

    35.单选题(2分)Databaseconnection programming interfaces such as ( ) can support SQL accessby applications to the database, but they cannot provide complexfunctions such as transaction management, concurrent scheduling,buffer management, heterogeneous database conversion and inheritancein a distributed computing environment. This introduces the ( ). Itis a layer of software that provides data exchange functions on topof the database. When the system is extended and need to accesscross-platform heterogeneous databases, OS could be UNIX, Linux orWindows, forms could be mails, XML documents, EJB components, Webservices, images, audio/video files or For other unstructured data,And the technology of the big data application layer is alsodiversified and various standards. The design of the ( ) needs to becompatible with various standard technologies and products, whichintroduces the ( ).

    ODBC and JDBC; DAL dataaccess layer; Unified data access interface; Unified data accessinterface;

    B

    ODBC and JDBC; DAL dataaccess layer; DAL data access layer; Unified data access interface;

    DALdata access layer; ODBC and JDBC; DAL data access layer; Unifieddata access interface;

    ODBCand JDBC; DAL data access layer; Unified data access interface; DALdata access layer;

    正确答案:B

    36.单选题(2分)( ) uses ( ) to divide the amount of resources (CPU, memory, etc.) on this node. A Task has a chance to run after it gets a ( ), and the role of the ( ) is to allocate idle ( ) on each ( ) to the Task. ( )

    JobTracker,slot, slot, Hadoop scheduler, slots, TaskTracker;

    B

    TaskTracker,slot, slot, Hadoop scheduler, slots, TaskTracker;

    TaskTracker,slot, slot, Task scheduler, slots, TaskTracker;

    TaskTracker,slot, slot, Hadoop scheduler, task, TaskTracker;

    正确答案:B

    37.单选题(2分)Attribute dependence belongs to the data quality category ( ) 

    A

    Single data resource, model level

    Single data resource, instance level

    Multiple data resource, model level

    Multiple data resource, instance level

    正确答案:A

    38.单选题(2分)The Machine Learning Pipeline in Spark MLlib is()

    A

    1.Load/Clean Data, 2. Transformer, 3. Estimator and 4. Evaluator

    1.Load/clean data, 2. Feature extraction, 3. Model training and 4. Model evaluation

    1.Load/clean data, 2. Feature extraction, 3. Estimator and 4. Model evaluation

    1.Load/Clean Data, 2. Transformer, 3. Model training and 4. Evaluator

    正确答案:A

    39.单选题(2分)The correct Chronologically order of the four Paradigms is ( ) 

    A

    Empirical – Theoretical – Computational - Data exploration

    Theoretical - Empirical - Computational - Data exploration

    Empirical - Computational - Theoretical -Data exploration

    Empirical - Theoretical -Data exploration - Computational

    正确答案:A

    40.单选题(2分)The right order of writing to the Datanodes in HDFS is ( ) 

    a)DistributedFileSystem makes an RPC call to the namenode to create a new file in the filesystem’s namespace, with no blocks associated with it.

    b)The client creates the file by calling create() method on DistributedFileSystem.

    c)The list of datanodes forms a pipeline, and default replication level is three, so there are three nodes in the pipeline. The DataStreamer streams the packets to the first datanode in the pipeline, which stores the packet and forwards it to the second datanode in the pipeline.

    d)The namenode performs various checks to make sure the file doesn’t already exist and the client has the right permissions to create the file. If all these checks pass, the namenode makes a record of the new file; otherwise, file creation fails and the client is thrown an IOException. 

    e)TheDistributedFileSystem returns an FSDataOutputStream for the client to start writing data to datanode. FSDataOutputStream wraps a DFSOutputStream which handles communication with the datanodes and namenode.

    f)As the client writes data, DFSOutputStream splits it into packets, which it writes to an internal queue, called the data queue. The data queue is consumed by the DataStreamer, which is responsible for asking the namenode to allocate new blocks by picking a list of suitable datanodes to store the replicas.

    g)Similarly, the second datanode stores the packet and forwards it to the third (and last) datanode in the pipeline.

    h)DFSOutputStream also maintains an internal queue of packets that are waiting to be acknowledged by datanodes, called the ack queue. A packet is removed from the ack queue only when it has been acknowledged by all the datanodes in the pipeline.

    i)When the client has finished writing data, it calls close() on the stream. It flushes all the remaining packets to the datanode pipeline and waits for acknowledgments before contacting the namenode to signal that the file is complete.

    j)The namenode already knows which blocks the file is made up of , so it only has to wait for blocks to be minimally replicated before returning successfully.

    4-2-1.jpg

    A

    badefcghij

    abdefcghij

    badefchgij

    badefcgihj

    正确答案:A

    41.主观题(10分)Dynamic Web Crawler :

    Shrimp Shopping website Crawler Requirement

    Task:

    You need to crawl kitchen knife information from shrimp skin website and store them insearch_keyword.csv. These information includes itemid-shopid-catid,commodity name, price, rating_star and so on.

    Shrimp skin website: https://xiapi.xiapibuy.com/

    Reference: Manual and codes. 

    Submission requirement:

    Please upload a screenshot of the crawled result -- search_keyword.csv, the file name should be ID_NAME.PNG.

    Dynamic Crawler 1.jpg

    我的答案

    查看解析 

    42.判断题(2分)Thedata in HDFS is immutable. () 

    正确答案:正确

    43.判断题(2分)Incustomer Collaborative filtering, the similar users are definedbased on the common items they purchased () 

    正确答案:正确

    44.判断题(2分)InHDFS, each storage file is first divided into multiple data blockswith a flexible length according to the data size. ()

    正确答案:错误

    45.判断题(2分)we can find one kind of tool to deal with all the data manage problems of the Big data. () 

    正确答案:错误

    46.判断题(2分)HDFSsupport batch reading, writing operation and updating operation. ()

    正确答案:错误

    47.判断题(2分)Itembased collaborative Filtering algorithm calculate the itemsimilarity according to item features. ()

    正确答案:错误

    48.判断题(2分)DFSdistributed file system provides the logical storage structure ofthe data. ()

    正确答案:错误

    49.判断题(2分)we can find one kind of tool to deal with all the data manage problems of the Database. () 

    正确答案:正确

    50.判断题(2分)Databaseprovides the physical storage structure; ()

    正确答案:错误

    51.判断题(2分)Hadoop is the only big data architecture.

    正确答案:错误


    学堂在线Big Data Analysis 期末考试答案》由《网课宝盒》整理呈现,请在转载分享时带上本文链接,谢谢!

    电大答案

    支持Ctrl+Enter提交
    网课宝盒 © All Rights Reserved.  联系我们:QQ 997755178
    蜀ICP备18035410号-3|网站标签|站点地图|

    当前文章名称

    手机号用于查询订单,请认真核对

    支付宝
    立即支付

    请输入手机号或商家订单号

    商家订单号在哪里?点此了解

    你输入的数据有误,请确认!

    如已购买,但查不到

    可联系客服QQ 55089918 进行核实