随着大数据分析市场的快速扩展到各行业务,哪些大数据技术是刚需?哪些技术有极大的潜在价值?弗雷斯特研究公司发布的TechRadar: Big Data, Q1 2016报告评估了数据生命周期中22项大数据技术的成熟度和发展状态。本文介绍了排名靠前的10项大数据技术,可为大数据从业者的职业发展方向提供参考。
- 这22项技术分别处在5个发展阶段:
- creation初创期、survival快速发展期、growth平稳成长期、equilibrium停滞开发期、decline衰落期
- 5个发展阶段的版本更替时间基本上呈现逐渐降低的规律,creation期版本更新最快,equilibrium期技术已经成熟,因而版本更新最慢。
- 从商业价值角度来看,22项技术处在4个层次
- negative、low、medium、high
- 越成熟的技术,技术更新时间越慢、商业价值越高
- 从技术的成功与否来看,22项技术被划分成三个水平:
- significant success非常成功有重大突破、moderate success基本成功、minimal success效果微乎其微
- 预测分析Predictive analytics: 随着现在硬件和软件解决方案的成熟,许多公司利用大数据技术来收集海量数据、训练模型、优化模型,并发布预测模型来提高业务水平或者避免风险;
- NoSQL数据库NoSQL databases: 非关系型数据库包括Key-value型(Redis)数据库、文档型(MonogoDB)数据库、图型(Neo4j)数据库;
- 搜索和知识发现Search and knowledge discovery: 支持信息的自动抽取,可以从多数据源洞察结构化数据和非结构化数据,各种源包括文件系统, 数据库, 流, APIs和任何平台、应用app.
- 流式分析Stream analytics: 软件可以对多个高吞吐量的数据源进行实时的清洗、聚合和分析;
- 内存数据结构In-memory data fabric: 通过动态随机内存访问(dynamic random access memory,DRAM)、Flash和SSD等分布式存储系统提供海量数据的低延时访问和处理;
- 分布式存储系统Distributed file stores: 分布式存储是指存储节点大于一个、数据保存多副本以及高性能的计算网络;
- 数据可视化Data virtualization: 数据可视化技术是指对各类型数据源(包括Hadoop上的海量数据以及实时和接近实时的分布式数据)进行显示;
- 数据整合Data integration: 通过亚马逊弹性MR(EMR)、Hive、Pig、Spark、MapReduce、Couchbase、Hadoop和MongoDB等软件进行业务数据整合;
- 数据预处理Data preparation: 对数据源进行清洗、裁剪,并共享多样化数据来加快数据分析;
- 数据校验Data quality:对分布式存储系统和数据库上的海量、高频率数据集进行数据校验,去除非法数据,补全缺失。
以上10项技术均处在significant success非常成功的轨迹线上,而且都处于技术发展中的成熟阶段,前8项技术处在growth平稳成长期,后两项技术处在survival快速发展期。前两项技术被评估为具有较高的商业价值,第三四项被评估为中等商业价值,剩下的6项技术被评估为较低的商业价值,不用怀疑,因为后6项新兴技术还在快速发展且并不十分成熟可靠。
Why did I add to the list of hottest technologies two that are still in the Survival phase—data preparation and data quality? In the same report, Forrester also provides the following data from its Q4 2015 survey of 63 big data vendors:
What is the level of customer interest in each of the following capabilities? (% answering “very high”)
Data preparation and discovery 52%
Data integration 48%
Advanced analytics 46%
Customer analytics 46%
Data security 38%
In-memory computing 37%
While Forrester predicts that a few standalone vendors of data preparation will survive, it believes this is “an essential capability for achieving democratization of data,” or rather, its analysis, letting data scientists spend more time on modeling and discovering insights and allowing more business users to have fun with data mining. Data Quality includes data security from the table above, in addition to other features ensuring decisions are based on reliable and accurate data. Forrester “expects that data quality will have significant success in the coming years as firms formalize a data certification process. Data certification efforts seek to guarantee that data meets expected standards for quality; security; and regulatory compliance supporting business decision-making, business performance, and business processes.”
“Big Data” as a topic of conversation has reached mainstream audiences probably far more than any other technology buzzword before it. That did not help the discussion of this amorphous term, defined for the masses as “the planet’s nervous system” (see my rant here) or as “Hadoop” for technical audiences. Forrester’s report helps clarify the term, defining big data as the ecosystem of 22 technologies, each with its specific benefits for enterprises and, through them, consumers.
Big data, specifically one its attributes, big volume, has recently gave rise to a new general topic of discussion, Artificial Intelligence. The availability of very large data sets is one of the reasons Deep Learning, a sub-set of AI, has been in the limelight, from identifying Internet cats to beating a Go champion. In its turn, AI may lead to the emergence of new tools for collecting and analyzing data.
Says Forrester: “In addition to more data and more computing power, we now have expanded analytic techniques like deep learning and semantic services for context that make artificial intelligence an ideal tool to solve a wider array of business problems. As a result, Forrester is seeing a number of new companies offering tools and services that attempt to support applications and processes with machines that mimic some aspects of human intelligence.”
Prediction is difficult, especially about the future, but it’s a (relatively) safe bet that the race to mimic elements of human intelligence, led by Google, Facebook, Baidu, Amazon, IBM, and Microsoft, all with very deep pockets, will change what we mean by “big data” in the very near future.