4_Big_data_methodical_l1_v1 (1)

  • docx
  • 09.05.2020
Публикация на сайте для учителей

Публикация педагогических разработок

Бесплатное участие. Свидетельство автора сразу.
Мгновенные 10 документов в портфолио.

Иконка файла материала 4_Big_data_methodical_l1_v1 (1).docx

Big data

"Big data" is a field that treats ways to analyze, systematically extract information from, or otherwise deal with data sets that are too large or complex to be dealt with by traditional data-processingapplication software. Data with many cases (rows) offer greater statistical power, while data with higher complexity (more attributes or columns) may lead to a higher false discovery rate.[2] Big data challenges include capturing datadata storagedata analysis, search, sharingtransfervisualizationquerying, updating, information privacy and data source. Big data was originally associated with three key concepts: volumevariety, and velocity.[3] Other concepts later attributed with big data are veracity (i.e., how much noise is in the data) [4] and value.[5]

Current usage of the term big data tends to refer to the use of predictive analyticsuser behavior analytics, or certain other advanced data analytics methods that extract value from data, and seldom to a particular size of data set. "There is little doubt that the quantities of data now available are indeed large, but that's not the most relevant characteristic of this new data ecosystem."[6] Analysis of data sets can find new correlations to "spot business trends, prevent diseases, combat crime and so on."[7] Scientists, business executives, practitioners of medicine, advertising and governments alike regularly meet difficulties with large data-sets in areas including Internet searchfintechurban informatics, and business informatics. Scientists encounter limitations in e-Science work, including meteorologygenomics,[8] connectomics, complex physics simulations, biology and environmental research.[9]

Data sets grow rapidly- in part because they are increasingly gathered by cheap and numerous information- sensing Internet of things devices such as mobile devices, aerial (remote sensing), software logs, cameras, microphones, radio-frequency identification (RFID) readers and wireless sensor networks.[10][11] The world's technological per-capita capacity to store information has roughly doubled every 40 months since the 1980s;[12] as of 2012, every day 2.5 exabytes (2.5×1018) of data are generated.[13] Based on an IDC report prediction, the global data volume will grow exponentially from 4.4 zettabytes to 44 zettabytes between 2013 and 2020.[14] By 2025, IDC predicts there will be 163 zettabytes of data.[15] One question for large enterprises is determining who should own big-data initiatives that affect the entire organization.[16]

Relational database management systems, desktop statistics and software packages used to visualize data often have difficulty handling big data. The work may require "massively parallel software running on tens, hundreds, or even thousands of servers".[17] What qualifies as being "big data" varies depending on the capabilities of the users and their tools, and expanding capabilities make big data a moving target. "For some organizations, facing hundreds of gigabytes of data for the first time may trigger a need to reconsider data management options. For others, it may take tens or hundreds of terabytes before data size becomes a significant consideration."[18]

 

Definition

The term has been in use since the 1990s, with some giving credit to John Mashey for popularizing the term.[19][20] Big data usually includes data sets with sizes beyond the ability of commonly used software tools to capturecurate, manage, and process data within a tolerable elapsed time.[21] Big data philosophy encompasses unstructured, semi-structured and structured data, however the main focus is on unstructured data.[22] Big data "size" is a constantly moving target, as of 2012 ranging from a few dozen terabytes to many exabytes of data.[23] Big data requires a set of techniques and technologies with new forms of integration to reveal insights from datasets that are diverse, complex, and of a massive scale.[24]

A 2016 definition states that "Big data represents the information assets characterized by such a high volume, velocity and variety to require specific technology and analytical methods for its transformation into value".[25] Similarly, Kaplan and Haenlein define big data as "data sets characterized by huge amounts (volume) of frequently updated data (velocity) in various formats, such as numeric, textual, or images/videos (variety)."[26] Additionally, a new V, veracity, is added by some organizations to describe it,[27] revisionism challenged by some industry authorities.[28] The three Vs (volume, variety and velocity) have been further expanded to other complementary characteristics of big data:[29][30]

·         Machine learning: big data often doesn't ask why and simply detects patterns[31]

·         Digital footprint: big data is often a cost-free byproduct of digital interaction[30][32][better source needed]

A 2018 definition states "Big data is where parallel computing tools are needed to handle data", and notes, "This represents a distinct and clearly defined change in the computer science used, via parallel programming theories, and losses of some of the guarantees and capabilities made by Codd's relational model." [33]

The growing maturity of the concept more starkly delineates the difference between "big data" and "Business Intelligence":[34]

·         Business Intelligence uses descriptive statistics with data with high information density to measure things, detect trends, etc.

·         Big data uses inductive statistics and concepts from nonlinear system identification[35] to infer laws (regressions, nonlinear relationships, and causal effects) from large sets of data with low information density[36] to reveal relationships and dependencies, or to perform predictions of outcomes and behaviors.

Characteristics

Shows the growth of big data's primary characteristics of volume, velocity, and variety

Big data can be described by the following characteristics:

Volume

The quantity of generated and stored data. The size of the data determines the value and potential insight, and whether it can be considered big data or not.

Variety

The type and nature of the data. This helps people who analyze it to effectively use the resulting insight. Big data draws from text, images, audio, video; plus it completes missing pieces through data fusion.

Velocity

In this context, the speed at which the data is generated and processed to meet the demands and challenges that lie in the path of growth and development. Big data is often available in real-time. Compared to small data, big data are produced more continually. Two kinds of velocity related to big data are the frequency of generation and the frequency of handling, recording, and publishing.

Veracity

It is the extended definition for big data, which refers to the data quality and the data value. The data quality of captured data can vary greatly, affecting the accurate analysis.[40]

Data must be processed with advanced tools (analytics and algorithms) to reveal meaningful information. For example, to manage a factory one must consider both visible and invisible issues with various components. Information generation algorithms must detect and address invisible issues such as machine degradation, component wear, etc. on the factory floor.


 

Скачано с www.znanio.ru