Konstantin V. Shvachko: Hadoop

Presentations

HDFS texts

The Hadoop Distributed File System,
Apache Hadoop: the Scalability Update,
Scalability of the Hadoop Distributed File System, (html)
Scaling Hadoop to 4000 nodes at Yahoo!
The Hadoop Distributed File System requirements.

Favorite Issues

NameNode Fine-Grained Locking via Metadata Partitioning. HDFS-14703.
Consistent Reads from Standby Node. HDFS-12943.
Segmented block reports proposal. HDFS-11313.
Interleaving block reports race. HDFS-10301.
HDFS truncate. HDFS-3107,
Snapshot support for truncate. HDFS-7056.
Coordinated replication of the namespace using ConsensusNode. HDFS-6469.
Introduce Coordination Engine. HADOOP-10641.
Warm HA NameNode going Hot. HDFS-2064.
Stress Test and Live Data Verification (S-Live) design. HDFS-708.
Sequential generation of block ids. HDFS-898.
Appending to an HDFS file. HDFS-265.
BackupNode maintains the up-to-date state of the namespace by receiving edits from the NameNode. HADOOP-4539.
DFSIO - a MapReduce based benchmark to measures performance of writes, appends, and sequential and random reads.
MAPREDUCE-4651, HDFS-663. HADOOP-193,
Slot utilization measures the actual job load on a map-reduce cluster and characterizes the overall cluster productivity.
The utilization is measured by analysing job history logs. HDFS-459.
File size distribution analysis. HDFS-461.
Quadruple memory size reduction for the name-node by
redesigning memory data structures HADOOP-1687.
and removing checksum files from the name-node. HADOOP-1134.
Distributed cluster upgrade framework. HADOOP-1286.
Faster cluster startup. HADOOP-3022.
File system snapshots.
A snapshot of the previous state of the file system is taken during software upgrades in order to avoid data loss caused by software bugs or administrators mistakes. HADOOP-702.
NNThroughputBenchmark - a pure name-node benchmark.
HADOOP-2149, HADOOP-3860.
Chain reaction caused by simultaneous failure of a few DataNodes.
HADOOP-572.
Safe mode is a read-only state of the name-node. HADOOP-306.
Integrity of HDFS cluster components. HADOOP-124.

Hadoop Release Management

Apache Hadoop 2.7.6 - 16 April, 2018

Apache Hadoop 2.7.5 - 14 December, 2017

Apache Hadoop 2.7.4 - 04 August, 2017

Apache Hadoop 0.22.0 - 10 December, 2011