Is there a way to load the Native Hadoop Library for Mac using some third-party software? Find(and kill) process locking port 3000 on Mac. Hadoop “Unable to load native-hadoop library for your platform” warning asked Oct 21, 2019 in Big Data Hadoop & Spark by Kartik12234 ( 11.9k points) hadoop. Hadoop Warning: unable to load native-hadoop library for your platform. Problem: I am new in java. I'm running Hadoop 2.2.0. Hadoop “Unable to load native-hadoop library for your platform” warning 79 Asked by BellaBlake in Big Data Hadoop, Asked on Apr 19, 2021 I installed Hadoop on the server running CentOs.
- Hadoop Native Library 64 Bit Download
- Hadoop Native Library For Sale
- Hadoop Native Library Windows
- Hadoop Native Library For Dummies
Saving CPU! Using Native Hadoop Libraries for CRC computation in HBase
Apache Hadoop 3.0.0-alpha4 Release Notes. These release notes cover new developer and user-facing incompatibilities, important issues, features, and major improvements. The hadoop-azure-datalake file system now supports configuration of the Azure Data Lake Store account credentials using the standard Hadoop Credential Provider API.
by Apekshit Sharma, HBase contributor and Cloudera Engineer
TL;DR Use Hadoop Native Library calculating CRC and save CPU!
Checksum are used to check data integrity. HDFS computes and stores checksums for all files on write. One checksum is written per chunk of data (size can be configured using bytes.per.checksum) in a separate, companion checksum file. When data is read back, the file with corresponding checksums is read back as well and is used to ensure data integrity. However, having two files results in two disk seeks reading any chunk of data. For HBase, the extra seek while reading HFileBlock results in extra latency. To work around the extra seek, HBase inlines checksums. HBase calculates checksums for the data in a HFileBlock and appends them to the end of the block itself on write to HDFS (HDFS then checksums the HBase data+inline checksums). On read, by default HDFS checksum verification is turned off, and HBase itself verifies data integrity.
Can we then get rid of HDFS checksum altogether? Unfortunately no. While HBase can detect corruptions, it can’t fix them, whereas HDFS uses replication and a background process to detect and *fix* data corruptions if and when they happen. Since HDFS checksums generated at write-time are also available, we fall back to them when HBase verification fails for any reason. If the HDFS check fails too, the data is reported as corrupt.
The related hbase configurations are hbase.hstore.checksum.algorithm, hbase.hstore.bytes.per.checksum and hbase.regionserver.checksum.verify. HBase inline checksums are enabled by default.
Calculating checksums is computationally expensive and requires lots of CPU. When HDFS switched over to JNI + C for computing checksums, they witnessed big gains in CPU usage.
This post is about replicating those gains in HBase by using Native Hadoop Libraries (NHL). See HBASE-11927
We switched to use the Hadoop DataChecksum library which under-the-hood uses NHL if available, else we fall back to use the Java CRC implementation. Another alternative considered was the ‘Circe’ library. The following table highlights the differences with NHL and makes the reasoning for our choice clear.
Native code supports both crc32 and crc32c | Native code supports only crc32c |
Adds dependency on hadoop-common which is reliable and actively developed | Adds dependency on external project |
Interface supports taking in stream of data, stream of checksums, chunk size as parameters and compute/verify checksums considering data in chunks. | Only supports calculation of single checksum for all input data. |
Both libraries supported use of the special x86 instruction for hardware calculation of CRC32C if available (defined in SSE4.2 instruction set). In the case of NHL, hadoop-2.6.0 or newer version is required for HBase to get the native checksum benefit.
However, based on the data layout of HFileBlock, which has ‘real data’ followed by checksums on the end, only NHL supported the interface we wanted. Implementing the same in Circe would have been significant effort. So we chose to go with NHL.
Since the metric to be evaluated was CPU usage, a simple configuration of two nodes was used. Node1 was configured to be the NameNode, Zookeeper and HBase master. Node2 was configured to be DataNode and RegionServer. All real computational work was done on Node2 while Node1 remained idle most of the time. This isolation of work on a single node made it easier to measure impact on CPU usage.
Configuration
Ubuntu 14.04.2 LTS (GNU/Linux 3.13.0-24-generic x86_64)
CPU: Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz
Socket(s) : 1
Core(s) per socket : 1
Thread(s) per core : 4
Logical CPU(s) : 4
Number of disks : 1
Memory : 8 GB
HBase Version/Distro *: 1.0.0 / CDH 5.4.0
*Since trunk resolves to hadoop-2.5.1 which does not have HDFS-6865, it was easier to use a CDH distro which already has HDFS-6865 backported.
Procedure
We chose to study the impact on major compactions, mainly because of the presence of CompactionTool which a) can be used offline, b) allowed us to profile only the relevant workload. PerformanceEvaluation (bin/hbase pe) was used to build a test table which was then copied to local disk for reuse.
% ./bin/hbase pe --nomapred --rows=150000 --table='t1' --valueSize=10240 --presplit=10 sequentialWrite 10
Hadoop Native Library 64 Bit Download
Table size: 14.4G
Number of rows: 1.5M
Number of regions: 10
Row size: 10K
Total store files across regions: 67
For profiling, Lightweight-java-profiler was used and FlameGraph was used to generate graphs.
For benchmarking, the linux ‘time’ command was used. Profiling was disabled during these runs. A script repeatedly executed following in order:
delete hdfs:///hbase
copy t1 from local disk hdfs:///hbase/data/default
run compaction tool on t1 and time it
Profiling
CPU profiling of HBase not using NHL (figure 1) shows that about 22% cpu is used for generating and validating checksums, whereas, while using NHL (figure 2) it takes only about 3%.
Figure 1: CPU profile - HBase not using NHL (svg)
Hadoop Native Library For Sale
Figure 2: CPU profile - HBase using NHL (svg)
Benchmarking
Hadoop Native Library Windows
Benchmarking was done for three different cases: (a) neither HBase nor HDFS use NHL, (b) HDFS uses NHL but not HBase, and (c) both HDFS and HBase use NHL. For each case, we did 5 runs. Observations from table 1:
Within a case, while real time fluctuates across runs, user and sys times remain same. This is expected as compactions are IO bound.
Using NHL only for HDFS reduces CPU usage by about 10% (A vs B)
Further, using NHL for HBase checksums reduces CPU usage by about 23% (B vs C).
Hadoop Native Library For Dummies
All times are in seconds. This stackoverflow answer provides a good explaination of real, user and sys times.
run # | no native for HDFS and HBase (A) | no native for HBase (B) | native (C) |
1 | real 469.4 user110.8 sys30.5 | real 422.9 user 95.4 sys 30.5 | real414.6 user67.5 sys30.6 |
2 | real384.3 user111.4 sys30.4 | real400.5 user96.7 sys30.5 | real393.8 user 67.6 sys30.6 |
3 | real400.7 user111.5 sys30.6 | real398.6 user95.8 sys30.6 | real 392.0 user 66.9 sys 30.5 |
4 | real396.8 user111.1 sys30.3 | real379.5 user96.0 sys30.4 | real 390.8 user 67.2 sys 30.5 |
5 | real389.1 user111.6 sys30.3 | real377.4 user96.5 sys30.4 | real 381.3 user 67.6 sys 30.5 |
Table 1
Native Hadoop Library leverages the special processor instruction (if available) that does pipelining and other low level optimizations when performing CRC calculations. Using NHL in HBase for heavy checksum computation, allows HBase make use of this facility, saving significant amounts of CPU time checksumming.Posted at 12:14AM Jun 06, 2015 by stack in General | |