Current location - Trademark Inquiry Complete Network - Futures platform - What are the tools for big data analysis?
What are the tools for big data analysis?
1、Hadoop

Hadoop is a software framework that can distribute large amounts of data. But Hadoop is handled in a reliable, efficient and extensible way. Hadoop is reliable because it assumes that computing elements and storage will fail, so it maintains multiple copies of working data to ensure that processing can be redistributed for failed nodes. Hadoop is efficient because it works in parallel, which speeds up the processing. Hadoop is also extensible and can handle PB-level data. In addition, Hadoop relies on community servers, so the cost is relatively low and anyone can use it.

2、HPCC

Abbreviation for HPCC, High Performance Computing and Communication. 1993, submitted to the congress by the federal science, engineering and technology coordination Committee of the United States. Main challenges: high performance computing and communication? The report, also known as the HPCC project, is a scientific strategic project of the President of the United States, aiming at solving a series of important scientific and technological challenges by strengthening research and development. HPCC is a plan to implement the information superhighway in the United States. The implementation of this plan will cost tens of billions of dollars. Its main goal is to develop scalable computing systems and related software to support the transmission performance of Ethernet, develop gigabit network technology, and expand research and education institutions and network connection capabilities.

3. Storm

Storm is a free open source software, a distributed and fault-tolerant real-time computing system. Storm can handle huge data streams very reliably and can be used to handle batch data of Hadoop. Storm is simple, supports multiple programming languages and is very interesting to use.

4. Apache exercise

In order to help enterprise users find more effective ways to speed up Hadoop data query, Apache Software Foundation recently launched a project called? Drill? Open source project. Apache Drill implements Google's Dremel.

According to Tomer Shiran, product manager of Hadoop manufacturer MapR Technologies, drilling? It has been run as an Apache incubator project and will continue to be promoted to software engineers all over the world.

5、RapidMiner

RapidMiner is the world's leading data mining solution, which adopts advanced technology to a great extent. Its data mining task covers a wide range, including various data arts, which can simplify the design and evaluation of data mining process.

6、Pentaho BI

Pentaho BI platform is different from traditional BI products. It is a process-centered and solution-oriented framework. Its purpose is to integrate a series of enterprise BI products, open source software, API and other components to facilitate the development of business intelligence applications. Its appearance enables Jfree, Quartz and a series of independent products oriented to business intelligence to be integrated to form a complex and complete business intelligence solution.