HCatalog Manual
Apache Hive : HCatalog
Dec 12, 2024
Apache Hive : HCatalog HCatalog is a table and storage management layer for Hadoop that enables users with different data processing tools — Pig, MapReduce — to more easily read and write data on the grid.
This is the HCatalog manual. Using HCatalog Installation from Tarball HCatalog Configuration Properties Load and Store Interfaces Input and Output Interfaces Reader and Writer Interfaces Command Line Interface Storage Formats Dynamic Partitioning Notification Storage Based Authorization The old HCatalog wiki page has many other documents including additional user documentation, further information on HBase integration, and resources for contributors.
Apache Hive : HCatalog Authorization
Dec 12, 2024
Apache Hive : HCatalog Authorization Apache Hive : HCatalog Authorization Storage Based Authorization Default Authorization Model of Hive Storage-System Based Authorization Model Minimum Permissions Unused DDL for Permissions Configuring Storage-System Based Authorization Creating New Tables or Databases Known Issues Storage Based Authorization Default Authorization Model of Hive The default authorization model of Hive supports a traditional RDBMS style of authorization based on users, groups and roles and granting them permissions to do operations on database or table.
Apache Hive : HCatalog CLI
Dec 12, 2024
Apache Hive : HCatalog Command Line Interface Apache Hive : HCatalog Command Line Interface Set Up HCatalog CLI Owner Permissions Hive CLI HCatalog DDL Create/Drop/Alter Table Create/Drop/Alter View Show/Describe Create/Drop Index Create/Drop Function “dfs” Command and “set” Command Other Commands CLI Errors Authentication Error Log Set Up The HCatalog command line interface (CLI) can be invoked as HIVE_HOME=hive_home hcat_home/bin/hcat where hive_home is the directory where Hive has been installed and hcat_home is the directory where HCatalog has been installed.
Apache Hive : HCatalog Configuration Properties
Dec 12, 2024
Apache Hive : HCatalog Configuration Properties Apache HCatalog’s behaviour can be modified through the use of a few configuration parameters specified in jobs submitted to it. This document details all the various knobs that users have available to them, and what they accomplish. Apache Hive : HCatalog Configuration Properties Setup Storage Directives Cache Behaviour Directives Input Split Generation Behaviour Data Promotion Behaviour HCatRecordReader Error Tolerance Behaviour Setup The properties described in this page are meant to be job-level properties set on HCatalog through the jobConf passed into it.
Apache Hive : HCatalog DynamicPartitions
Dec 12, 2024
Apache Hive : HCatalog Dynamic Partitioning Apache Hive : HCatalog Dynamic Partitioning Overview External Tables Hive Dynamic Partitions Usage with Pig Usage from MapReduce Overview When writing data in HCatalog it is possible to write all records to a single partition. In this case the partition column(s) need not be in the output data.
The following Pig script illustrates this:
Apache Hive : HCatalog InputOutput
Dec 12, 2024
Apache Hive : HCatalog Input and Output Interfaces Apache Hive : HCatalog Input and Output Interfaces Set Up HCatInputFormat API HCatOutputFormat API HCatRecord Running MapReduce with HCatalog Authentication Read Example Filter Operators Scan Filter Write Filter Set Up No HCatalog-specific setup is required for the HCatInputFormat and HCatOutputFormat interfaces.
Note: HCatalog is not thread safe.
Apache Hive : HCatalog InstallHCat
Dec 12, 2024
Apache Hive : HCatalog Installation from Tarball Apache Hive : HCatalog Installation from Tarball HCatalog Installed with Hive HCatalog Command Line HCatalog Client Jars HCatalog Server HCatalog Installed with Hive Version
HCatalog is installed with Hive, starting with Hive release 0.11.0.
Hive installation is documented here.
HCatalog Command Line If you install Hive from the binary tarball, the hcat command is available in the hcatalog/bin directory.
Apache Hive : HCatalog LoadStore
Dec 12, 2024
Apache Hive : HCatalog Load and Store Interfaces Apache Hive : HCatalog Load and Store Interfaces Set Up Running Pig HCatLoader Usage HCatLoader Data Types Running Pig with HCatalog Load Examples HCatStorer Usage Store Examples HCatStorer Data Types Data Type Mappings Primitive Types Complex Types Set Up The HCatLoader and HCatStorer interfaces are used with Pig scripts to read and write data in HCatalog-managed tables.
Apache Hive : HCatalog Notification
Dec 12, 2024
Apache Hive : HCatalog Notification Apache Hive : HCatalog Notification Overview Notification for a New Partition Notification for a Set of Partitions Server Configuration Enable JMS Notifications Topic Names Overview Since version 0.2, HCatalog provides notifications for certain events happening in the system. This way applications such as Oozie can wait for those events and schedule the work that depends on them.
Apache Hive : HCatalog ReaderWriter
Dec 12, 2024
Apache Hive : HCatalog Reader and Writer Interfaces Apache Hive : HCatalog Reader and Writer Interfaces Overview HCatReader HCatWriter Complete Example Program Overview HCatalog provides a data transfer API for parallel input and output without using MapReduce. This API provides a way to read data from a Hadoop cluster or write data into a Hadoop cluster, using a basic storage abstraction of tables and rows.
Apache Hive : HCatalog StorageFormats
Dec 12, 2024
Apache Hive : HCatalog Storage Formats Apache Hive : HCatalog Storage Formats SerDes and Storage Formats Usage from Hive CTAS Issue with JSON SerDe SerDes and Storage Formats HCatalog uses Hive’s SerDe class to serialize and deserialize data. SerDes are provided for RCFile, CSV text, JSON text, and SequenceFile formats. Check the SerDe documentation for additional SerDes that might be included in new versions.
Apache Hive : HCatalog Streaming Mutation API
Dec 12, 2024
Apache Hive : HCatalog Streaming Mutation API A Java API focused on mutating (insert/update/delete) records into transactional tables using Hive’s ACID feature. It is introduced in Hive 2.0.0 (HIVE-10165).
Apache Hive : HCatalog Streaming Mutation API Background Structure Data Requirements Streaming Requirements Record Layout Connection and Transaction Management Writing Data Dynamic Partition Creation Reading Data Example Attachments: Background In certain data processing use cases it is necessary to modify existing data when new facts arrive.
Apache Hive : HCatalog Streaming Mutation API (Copy)
Dec 12, 2024
Apache Hive : HCatalog Streaming Mutation API (Copy) A Java API focused on mutating (insert/update/delete) records into transactional tables using Hive’s ACID feature. It is introduced in Hive 2.0.0 (HIVE-10165).
Apache Hive : HCatalog Streaming Mutation API (Copy) Background Structure Data Requirements Streaming Requirements Record Layout Connection and Transaction Management Writing Data Dynamic Partition Creation Reading Data Example Attachments: Background In certain data processing use cases it is necessary to modify existing data when new facts arrive.
Apache Hive : HCatalog UsingHCat
Dec 12, 2024
Apache Hive : HCatalog Usage Apache Hive : HCatalog Usage Version information Overview HCatalog Architecture Interfaces Data Model Data Flow Example First: Copy Data to the Grid Second: Prepare the Data Third: Analyze the Data HCatalog Web API Attachments: Version information HCatalog graduated from the Apache incubator and merged with the Hive project on March 26, 2013.