Table Of ContentApache Hive Guide
ImportantNotice
©2010-2021Cloudera,Inc.Allrightsreserved.
Cloudera,theClouderalogo,andanyotherproductor
servicenamesorsloganscontainedinthisdocumentaretrademarksofClouderaand
itssuppliersorlicensors,andmaynotbecopied,imitatedorused,inwholeorinpart,
withoutthepriorwrittenpermissionofClouderaortheapplicabletrademarkholder.If
thisdocumentationincludescode,includingbutnotlimitedto,codeexamples,Cloudera
makesthisavailabletoyouunderthetermsoftheApacheLicense,Version2.0,including
anyrequirednotices.AcopyoftheApacheLicenseVersion2.0,includinganynotices,
isincludedherein.AcopyoftheApacheLicenseVersion2.0canalsobefoundhere:
https://opensource.org/licenses/Apache-2.0
HadoopandtheHadoopelephantlogoaretrademarksoftheApacheSoftware
Foundation.Allothertrademarks,registeredtrademarks,productnamesandcompany
namesorlogosmentionedinthisdocumentarethepropertyoftheirrespectiveowners.
Referencetoanyproducts,services,processesorotherinformation,bytradename,
trademark,manufacturer,supplierorotherwisedoesnotconstituteorimply
endorsement,sponsorshiporrecommendationthereofbyus.
Complyingwithallapplicablecopyrightlawsistheresponsibilityoftheuser.Without
limitingtherightsundercopyright,nopartofthisdocumentmaybereproduced,stored
inorintroducedintoaretrievalsystem,ortransmittedinanyformorbyanymeans
(electronic,mechanical,photocopying,recording,orotherwise),orforanypurpose,
withouttheexpresswrittenpermissionofCloudera.
Clouderamayhavepatents,patentapplications,trademarks,copyrights,orother
intellectualpropertyrightscoveringsubjectmatterinthisdocument.Exceptasexpressly
providedinanywrittenlicenseagreementfromCloudera,thefurnishingofthisdocument
doesnotgiveyouanylicensetothesepatents,trademarkscopyrights,orother
intellectualproperty.ForinformationaboutpatentscoveringClouderaproducts,see
http://tiny.cloudera.com/patents.
Theinformationinthisdocumentissubjecttochangewithoutnotice.Clouderashall
notbeliableforanydamagesresultingfromtechnicalerrorsoromissionswhichmay
bepresentinthisdocument,orfromuseofthisdocument.
Cloudera,Inc.
395PageMillRoad
PaloAlto,CA94306
[email protected]
US:1-888-789-1488
Intl:1-650-362-0488
www.cloudera.com
ReleaseInformation
Version:ClouderaEnterprise5.8.x
Date:February3,2021
Table of Contents
Using Apache Hive with CDH....................................................................................7
UseCasesforHive................................................................................................................................................7
Installation............................................................................................................................................................8
Upgrading.............................................................................................................................................................8
Configuration.......................................................................................................................................................8
The Metastore Database......................................................................................................................................8
HiveServer2..........................................................................................................................................................9
Hive on Spark.......................................................................................................................................................9
Hive and HBase....................................................................................................................................................9
Transaction(ACID)SupportinHive......................................................................................................................9
Managing Hive...................................................................................................................................................10
Ingesting Data with Hive....................................................................................................................................10
Tuning.................................................................................................................................................................10
HighAvailability..................................................................................................................................................11
Replication..........................................................................................................................................................11
Security..............................................................................................................................................................11
Troubleshooting.................................................................................................................................................12
Upstream Information for Hive..........................................................................................................................12
Hive Installation.....................................................................................................13
AboutHive..........................................................................................................................................................13
HiveServer2..........................................................................................................................................................................13
Upgrading Hive...................................................................................................................................................13
ChecklisttoHelpEnsureSmoothUpgrades.........................................................................................................................13
UpgradingHivefromaLowerVersionofCDH5..................................................................................................................14
Installing Hive.....................................................................................................................................................17
HeapSizeandGarbageCollectionforHiveComponents.....................................................................................................17
Configuration for WebHCat.................................................................................................................................................19
ConfiguringtheHiveMetastore.........................................................................................................................19
Metastore Deployment Modes............................................................................................................................................19
Supported Metastore Databases.........................................................................................................................................21
MetastoreMemoryRequirements.......................................................................................................................................22
Configuringthe Metastore Database..................................................................................................................................22
Configuring HiveServer2....................................................................................................................................32
HiveServer2 Memory Requirements....................................................................................................................................32
TableLockManager(Required)...........................................................................................................................................32
hive.zookeeper.client.port.....................................................................................................................................33
JDBC driver...........................................................................................................................................................................33
Authentication.....................................................................................................................................................................33
RunningHiveServer2andHiveServerConcurrently.............................................................................................................33
Starting the Metastore.......................................................................................................................................34
File System Permissions.....................................................................................................................................34
Starting,Stopping,andUsingHiveServer2.........................................................................................................35
UsingtheBeelineCLI............................................................................................................................................................35
StartingHiveServer1andtheHiveConsole........................................................................................................36
Using Hive with HBase.......................................................................................................................................36
UsingtheHiveSchemaTool...............................................................................................................................37
Schema Version Verification................................................................................................................................................37
Using schematool................................................................................................................................................................37
InstallingtheHiveJDBCDriveronClients..........................................................................................................39
Setting HADOOP_MAPRED_HOME....................................................................................................................40
ConfiguringtheMetastoretoUseHDFSHighAvailability..................................................................................40
Viewing the Hive Documentation......................................................................................................................40
Managing Hive.......................................................................................................41
Hive Roles...........................................................................................................................................................41
Hive Execution Engines......................................................................................................................................41
ManagingHiveUsingClouderaManager...........................................................................................................42
Running Hive on Spark.......................................................................................................................................43
Configuring Hive on Spark...................................................................................................................................................43
TroubleshootingHiveonSpark............................................................................................................................................43
HiveServer2 Web UI...........................................................................................................................................45
AccessingtheHiveServer2WebUI.......................................................................................................................................45
HiveServer2WebUIConfiguration......................................................................................................................................45
HiveTableStatistics............................................................................................................................................46
ManagingUser-DefinedFunctions(UDFs)withHiveServer2.............................................................................46
User-DefinedFunctions(UDFs)withHiveServer2UsingClouderaManager.......................................................................46
User-DefinedFunctions(UDFs)withHiveServer2UsingtheCommandLine.......................................................................48
Tuning Hive............................................................................................................50
HeapSizeandGarbageCollectionforHiveComponents...................................................................................50
MemoryRecommendations.................................................................................................................................................50
ConfiguringHeapSizeandGarbageCollection...................................................................................................................51
HiveServer2PerformanceTuningandTroubleshooting.....................................................................................52
SymptomsDisplayedWhenHiveServer2HeapMemoryisFull............................................................................................52
HiveServer2PerformanceBestPractices.............................................................................................................................55
Tuning Hive on Spark..............................................................................................59
YARN Configuration............................................................................................................................................59
SparkConfiguration............................................................................................................................................59
Hive Configuration.............................................................................................................................................61
Pre-warmingYARNContainers.............................................................................................................................................62
Hive Metastore High Availability............................................................................63
EnablingHiveMetastoreHighAvailabilityUsingClouderaManager.................................................................63
EnablingHiveMetastoreHighAvailabilityUsingtheCommandLine................................................................63
Configuring HiveServer2 High Availability in CDH...................................................66
EnablingHiveServer2HighAvailabilityUsingClouderaManager......................................................................66
ConfiguringHiveServer2toLoadBalanceBehindaProxyonUnmanagedClusters..........................................66
UnmanagedClusterswithKerberosEnabled.......................................................................................................................67
UnmanagedClustersWITHOUTKerberos............................................................................................................................68
Hive Replication.....................................................................................................71
NetworkLatencyandReplication.......................................................................................................................71
HostSelectionforHiveReplication....................................................................................................................71
HiveTablesandDDLCommands........................................................................................................................71
PerformanceandScalabilityLimitations............................................................................................................72
HiveReplicationinDynamicEnvironments........................................................................................................72
Configuring Replication of Hive Data.................................................................................................................72
ViewingReplicationSchedules...........................................................................................................................75
Enabling,Disabling,orDeletingAReplicationSchedule......................................................................................................77
Viewing Replication History...............................................................................................................................77
Hive Authentication...............................................................................................80
HiveServer2 Security Configuration...................................................................................................................80
EnablingKerberosAuthenticationforHiveServer2..............................................................................................................80
UsingLDAPUsername/PasswordAuthenticationwithHiveServer2....................................................................................81
ConfiguringLDAPSAuthenticationwithHiveServer2..........................................................................................................83
Pluggable Authentication....................................................................................................................................................83
TrustedDelegationwithHiveServer2...................................................................................................................................84
HiveServer2Impersonation..................................................................................................................................................84
SecuringtheHiveMetastore................................................................................................................................................85
DisablingtheHiveSecurityConfiguration...........................................................................................................................85
HiveMetastoreServerSecurityConfiguration...................................................................................................86
UsingHivetoRunQueriesonaSecureHBaseServer........................................................................................87
Configuring Encrypted Communication Between HiveServer2 and Client Drivers....88
ConfiguringEncryptedClient/ServerCommunicationUsingTLS/SSL................................................................88
UsingClouderaManager.....................................................................................................................................................88
UsingtheCommandLine.....................................................................................................................................................89
ConfiguringEncryptedClient/ServerCommunicationUsingSASLQOP.............................................................89
Hive SQL Syntax for Use with Sentry.......................................................................91
Column-level Authorization...............................................................................................................................91
CREATEROLEStatement.....................................................................................................................................92
DROP ROLE Statement.......................................................................................................................................92
GRANT ROLE Statement.....................................................................................................................................93
REVOKEROLEStatement....................................................................................................................................93
GRANT<Privilege>Statement............................................................................................................................93
GRANT<Privilege>ONURIs(HDFSandS3A).....................................................................................................93
REVOKE <Privilege> Statement..........................................................................................................................94
GRANT<Privilege>...WITHGRANTOPTION......................................................................................................94
SETROLEStatement...........................................................................................................................................95
SHOW Statement...............................................................................................................................................95
Example:UsingGrant/RevokeStatementstoMatchanExistingPolicyFile......................................................96
Troubleshooting Hive.............................................................................................98
HiveServer2PerformanceTuningandTroubleshooting.....................................................................................98
SymptomsDisplayedWhenHiveServer2HeapMemoryisFull............................................................................................98
HiveServer2PerformanceBestPractices...........................................................................................................................101
Appendix: Apache License, Version 2.0.................................................................105
UsingApacheHivewithCDH
Using Apache Hive with CDH
Hivedatawarehousesoftwareenablesreading,writing,andmanaginglargedatasetsindistributedstorage.Usingthe
Hivequerylanguage(HiveQL),whichisverysimilartoSQL,queriesareconvertedintoaseriesofjobsthatexecuteon
aHadoopclusterthroughMapReduceorApacheSpark.
UserscanrunbatchprocessingworkloadswithHivewhilealsoanalyzingthesamedataforinteractiveSQLor
machine-learningworkloadsusingtoolslikeApacheImpalaorApacheSpark—allwithinasingleplatform.
AspartofCDH,Hivealsobenefitsfrom:
• UnifiedresourcemanagementprovidedbyYARN
• SimplifieddeploymentandadministrationprovidedbyClouderaManager
• SharedsecurityandgovernancetomeetcompliancerequirementsprovidedbyApacheSentryandCloudera
Navigator
Continuereading:
• UseCasesforHive
• Installation
• Upgrading
• Configuration
• TheMetastoreDatabase
• HiveServer2
• HiveonSpark
• HiveandHBase
• Transaction(ACID)SupportinHive
• ManagingHive
• IngestingDatawithHive
• Tuning
• HighAvailability
• Replication
• Security
• Troubleshooting
• UpstreamInformationforHive
Use Cases for Hive
BecauseHiveisapetabyte-scaledatawarehousesystembuiltontheHadoopplatform,itisagoodchoicefor
environmentsexperiencingphenomenalgrowthindatavolume.TheunderlyingMapReduceinterfacewithHDFSis
hardtoprogramdirectly,butHiveprovidesanSQLinterface,makingitpossibletouseexistingprogrammingskillsto
performdatapreparation.
HiveonMapReduceorSparkisbest-suitedforbatchdatapreparationorETL:
• YoumustrunscheduledbatchjobswithverylargeETLsortswithjoinstopreparedataforHadoop.Mostdata
servedtoBIusersinImpalaispreparedbyETLdevelopersusingHive.
• Yourundatatransferorconversionjobsthattakemanyhours.WithHive,ifaproblemoccurspartwaythrough
suchajob,itrecoversandcontinues.
• Youreceiveorprovidedataindiverseformats,wheretheHiveSerDesandvarietyofUDFsmakeitconvenientto
ingestandconvertthedata.Typically,thefinalstageoftheETLprocesswithHivemightbetoahigh-performance,
widelysupportedformatsuchasParquet.
ApacheHiveGuide|7
UsingApacheHivewithCDH
Installation
OnaclustermanagedbyClouderaManager,HivecomesalongwiththebaseCDHinstallationanddoesnotneedto
beinstalledseparately.WithClouderaManager,youcanenableordisabletheHiveservice,buttheHivecomponent
alwaysremainspresentonthecluster.
Onanunmanagedcluster,youcaninstallHivemanually,usingpackagesortarballswiththeappropriatecommandfor
youroperatingsystem.
InstalltheappropriateHivepackagesusingtheappropriatecommandforyourdistribution.
OS Command
RHEL-compatible $ sudo yum install <pkg1> <pkg2> ...
SLES $ sudo zypper install <pkg1> <pkg2> ...
UbuntuorDebian $ sudo apt-get install <pkg1> <pkg2> ...
Thepackagesare:
• hive–basepackagethatprovidesthecompletelanguageandruntime
• hive-metastore–providesscriptsforrunningthemetastoreasastandaloneservice(optional)
• hive-server2–providesscriptsforrunningHiveServer2
• hive-hbase-optional;installthispackageifyouwanttouseHivewithHBase.
SeeInstallingHiveonpage17fordetailsaboutinstallingandconfiguringHivecomponents.
ToaccesstheHiveserverwithJDBCclients,suchasBeeline,installtheJDBCdriverforHiveServer2thatisdefinedin
org.apache.hive.jdbc.HiveDriver.
SeeInstallingtheHiveJDBCDriveronClientsonpage39fordetailsaboutinstallingtheJDBCdriversandtheconnection
URLstousetoconnecttoHiveServer2fromHiveclients.
Upgrading
UpgradeHiveonallthehostsonwhichitisrunningincludingbothserversandclients.
SeeUpgradingHiveonpage13fordetailsaboutdeprecatedversions,upgradingbestpractices,andinformationabout
upgradingtheHivemetastoreschema.
Configuration
Hiveoffersanumberofconfigurationsettingsrelatedtoperformance,filelayoutandhandling,andoptionstocontrol
SQLsemantics.Dependingonyourclustersizeandworkloads,configureHiveServer2memory,tablelockingbehavior,
andauthenticationforconnections.SeeConfiguringHiveServer2onpage32fordetailsaboutrequiredconfiguration
changesthatyoumustperform.
TheHivemetastoreservice,whichstoresthemetadataforHivetablesandpartitions,mustalsobeconfigured.See
ConfiguringtheHiveMetastoreonpage19fordetailsaboutdeploymentmodes,informationaboutsupported
metastoredatabases,andspecificconfigurationsforMySQL,PostgreSQL,andOracle.
The Metastore Database
ThemetastoredatabaseisanimportantaspectoftheHiveinfrastructure.Itisaseparatedatabase,relyingona
traditionalRDBMSsuchasMySQLorPostgreSQL,thatholdsmetadataaboutHivedatabases,tables,columns,partitions,
andHadoop-specificinformationsuchastheunderlyingdatafilesandHDFSblocklocations.
8|ApacheHiveGuide
UsingApacheHivewithCDH
Themetastoredatabaseissharedbyothercomponents.Forexample,thesametablescanbeinsertedinto,queried,
altered,andsoonbybothHiveandImpala.Althoughyoumightseereferencestothe“Hivemetastore”,beaware
thatthemetastoredatabaseisusedbroadlyacrosstheHadoopecosystem,evenincaseswhereyouarenotusingHive
itself.
Themetastoredatabaseisrelativelycompact,withfast-changingdata.Backup,replication,andotherkindsof
managementoperationsaffectthisdatabase.SeeConfiguringtheHiveMetastoreonpage19fordetailsabout
configuringtheHivemetastore.
ClouderarecommendsthatyoudeploytheHivemetastore,whichstoresthemetadataforHivetablesandpartitions,
in“remotemode.”InthismodethemetastoreservicerunsinitsownJVMprocessandotherservices,suchas
HiveServer2,HCatalog,andApacheImpalacommunicatewiththemetastoreusingtheThriftnetworkAPI.
SeeStartingtheMetastoreonpage34fordetailsaboutstartingtheHivemetastoreservice.
HiveServer2
HiveServer2isaserverinterfacethatenablesremoteclientstosubmitqueriestoHiveandretrievetheresults.It
replacesHiveServer1,whichhasbeendeprecatedandwillberemovedinafuturereleaseofCDH.HiveServer2supports
multi-clientconcurrency,capacityplanningcontrols,Sentryauthorization,Kerberosauthentication,LDAP,SSL,and
providesbettersupportforJDBCandODBCclients.
HiveServer2isacontainerfortheHiveexecutionengine.Foreachclientconnection,itcreatesanewexecutioncontext
thatservesHiveSQLrequestsfromtheclient.ItsupportsJDBCclients,suchastheBeelineCLI,andODBCclients.Clients
connecttoHiveServer2throughtheThriftAPI-basedHiveservice.
SeeConfiguringHiveServer2onpage32fordetailsonconfiguringHiveServer2andseeStarting,Stopping,andUsing
HiveServer2onpage35fordetailsonstarting/stoppingtheHiveServer2serviceandinformationaboutusingthe
BeelineCLItoconnecttoHiveServer2.FordetailsaboutmanagingHiveServer2withitsnativewebuserinterface(UI),
seeHiveServer2WebUIonpage45.
Hive on Spark
HivetraditionallyusesMapReducebehindthescenestoparallelizethework,andperformthelow-levelstepsof
processingaSQLstatementsuchassortingandfiltering.HivecanalsouseSparkastheunderlyingcomputationand
parallelizationengine.SeeRunningHiveonSparkonpage43fordetailsaboutconfiguringHivetouseSparkasits
executionengineandseeTuningHiveonSparkonpage59fordetailsabouttuningHiveonSpark.
Hive and HBase
ApacheHBaseisaNoSQLdatabasethatsupportsreal-timeread/writeaccesstolargedatasetsinHDFS.SeeUsingHive
withHBaseonpage36fordetailsaboutconfiguringHivetouseHBase.ForinformationaboutrunningHivequeries
onasecureHBaseserver,seeUsingHivetoRunQueriesonaSecureHBaseServeronpage87.
Transaction (ACID) Support in Hive
HiveinCDHdoesnotsupporttransactions(HIVE-5317).Currently,transactionsupportinHiveisanexperimental
featurethatonlyworkswiththeORCfileformat.ClouderarecommendsusingtheParquetfileformat,whichworks
acrossmanytools.MergeupdatesinHivetablesusingexistingfunctionality,includingstatementssuchasINSERT,
INSERT OVERWRITE,andCREATE TABLE AS SELECT.
ApacheHiveGuide|9
UsingApacheHivewithCDH
Managing Hive
ClouderarecommendsusingClouderaManagertomanageHiveservices,whicharecalledmanageddeployments.If
yoursisnotamanageddeployment,configureHiveServer2WebUItomanageHiveservices.
UsingClouderaManagertoManageHive
ClouderaManagerusestheHivemetastore,HiveServer2,andtheWebHCatrolestomanagetheHiveserviceacross
yourcluster.UsingClouderaManager,youcanconfiguretheHivemetastore,theexecutionengine(eitherMapReduce
orSpark),andmanageHiveServer2.
SeeManagingHiveUsingClouderaManageronpage42
UsingHiveServer2WebUItoManageHive
TheHiveServer2webUIprovidesaccesstoHiveconfigurationsettings,locallogs,metrics,andinformationaboutactive
sessionsandqueries.TheHiveServer2webUIisenabledinnewlycreatedclustersrunningCDH5.7andhigher,and
thoseusingKerberosareconfiguredforSPNEGO.ClustersupgradedfromapreviousCDHversionmustbeconfigured
toenablethewebUI;seeHiveServer2WebUIConfigurationonpage45.
Ingesting Data with Hive
Hivecaningestdataintoseveraldifferentfileformats,suchasParquet,Avro,TEXTFILE,orRCFile.Ifyouaresettingup
adatapipelinewhereApacheImpalaisinvolvedonthequeryside,useParquet.Ifacustomfileformatisrequired,
youcanextendtheHiveSerDes.SeetheApacheHivewikiforinformationabouttheHiveSerDesandhowtowrite
yourownforHive.
Important:
Theconfigurationpropertyserialization.null.formatissetinHiveandImpalaenginesas
SerDesortablepropertiestospecifyhowtoserialize/deserializeNULLvaluesintoastorageformat.
Thisconfigurationoptionissuitablefortextfileformatsonly.Ifusedwithbinarystorageformatssuch
asRCFileorParquet,theoptioncausescompatibility,complexityandefficiencyissues.
ColumnandTableStatisticsforQueryOptimization
StatisticsforHivecanbenumbersofrowsoftablesorpartitionsandthehistogramsofinterestingcolumns.Statistics
areusedbythecostfunctionsofthequeryoptimizertogeneratequeryplansforthepurposeofqueryoptimization.
SeeHiveTableStatisticsonpage46fordetailsaboutcollectingstatisticsforHive.
Tuning
TuningHiveconsistsofconfiguringnumerousHiveparametersforbetterperformanceandscalability.Themost
importantamongthesesettingsisconfiguringsufficientmemoryforHiveServer2andtheHivemetastore.Thisincludes
allocatingmemoryforheapsizebaseduponthenumberofconcurrentconnectionsthataretypicalforyourdeployment.
Configuringgarbagecollectionlimitsandkeepingthenumberoftablepartitionsbelowrecommendedlimitsarealso
importantwhentuningHiveperformance.SeeTuningHiveonpage50fordetailsaboutrecommendedlimitsandbest
practices.IfyouareusingSparkasyourexecutionengine,seeTuningHiveonSparkonpage59.
10|ApacheHiveGuide
Description:Apache Hive Guide .. Upgrading Hive from CDH 4 to CDH 5. Upgrading Hive from a Lower Version of CDH 5.