Run hive queries, and collect job information
NickName:mathieu Ask DateTime:2013-01-30T17:55:03

Run hive queries, and collect job information

I would like to run a list of generated HIVE queries. For each, I would like to retrieve the MR job_id (or ids, in case of multiple stages). And then, with this job_id, collect statistics from job tracker (cumulative CPU, read bytes...)

How can I send HIVE queries from a bash or python script, and retrieve the job_id(s) ?

For the 2nd part (collecting stats for the job), we're using a MRv1 Hadoop cluster, so I don't have the AppMaster REST API. I'm about to collect data from the jobtracker web UI. Any better idea ?

Copyright Notice:Content Author:「mathieu」,Reproduced under the CC 4.0 BY-SA copyright license with a link to the original source and this disclaimer.
Link to original article:https://stackoverflow.com/questions/14601052/run-hive-queries-and-collect-job-information

More about “Run hive queries, and collect job information” related questions

Run hive queries, and collect job information

I would like to run a list of generated HIVE queries. For each, I would like to retrieve the MR job_id (or ids, in case of multiple stages). And then, with this job_id, collect statistics from job

Show Detail

hive job control information

We have some hive queries that we run. We want to capture information like how long the query took to run? How many records were selected etc. Any easy way to capture the info in hive tables?

Show Detail

Is it possible to install Beeline to run Hive queries without installing Hive?

I'm on a Mac OSX machine and I'd like to run queries against a Hadoop db on a CentOS 6.6 machine. I can log in to the CentOS machine and run hive queries there. But I need to be able to run queries...

Show Detail

hive query in Job tracker

Hi we are running hive queries in CDH 4 environment to which we recently upgraded. One thing I notice is that earlier in CDH 3 we were able to track our queries in Job tracker. The link similar to "

Show Detail

Is it possible to run Hive queries on HDFS without using YARN?

I've used Hive and Hadoop in pseudo distributed mode configuration with YARN and my queries ran correctly and gave the expected results. Now for my project, I have to run the same Hive query in pse...

Show Detail

Hive queries run from edge nodes

Are there any disadvantages running hive insert queries from edge node against running it from oozie workflows? Oozie docs says that running through oozie will distribute the workload to datanodes...

Show Detail

How to run batch hive queries in spark scala

I am executing multiple hive queries in a loop from my spark job using the following piece of code implicit val sparkSession = SparkSession .builder() .config(sparkConf) .

Show Detail

How do the hive sql queries are submitted as mr job from hive cli

I have deployed a CDH-5.9 cluster with MR as hive execution engine. I have a hive table named "users" with 50 rows. Whenever I execute the query select * from users works fine as follows : hive>

Show Detail

Run multiple queries in hive using Node-JS

I am using NodeJS for submitting hive queries.Here is code Error in connecting node.js application to hive, we connecting hive using JDBC with min pool size 10, as in node js async multiple request...

Show Detail

hive collect_set crashes query

I've got the following table: hive> describe tv_counter_stats; OK day string event string query_id string userid string headers string And I want to perform the

Show Detail