I would like to run a list of generated HIVE queries.
For each, I would like to retrieve the MR job_id (or ids, in case of multiple stages).
And then, with this job_id, collect statistics from job tracker (cumulative CPU, read bytes...)
How can I send HIVE queries from a bash or python script, and retrieve the job_id(s) ?
For the 2nd part (collecting stats for the job), we're using a MRv1 Hadoop cluster, so I don't have the AppMaster REST API. I'm about to collect data from the jobtracker web UI. Any better idea ?
Copyright Notice:Content Author:「mathieu」,Reproduced under the CC 4.0 BY-SA copyright license with a link to the original source and this disclaimer.
Link to original article:https://stackoverflow.com/questions/14601052/run-hive-queries-and-collect-job-information