Using FLUME to store data in Hadoop
NickName:Shivam Ask DateTime:2017-09-29T05:16:37

Using FLUME to store data in Hadoop

I have followed all the steps for hadoop installation and Flume from tutorials. I am a naive in Big Data tools. I am getting the following errors. I dont understand, where the problem is?

I have also read a lot of post on installation, but still I am facing this issue. My ultimate objective is to perform Twitter Sentiment Analysis using R.

17/09/29 02:25:39 INFO node.PollingPropertiesFileConfigurationProvider: Configuration provider starting
17/09/29 02:25:39 INFO node.PollingPropertiesFileConfigurationProvider: Reloading configuration file:/home/shivam/apache-flume-1.6.0-bin/conf/flume.conf
17/09/29 02:25:39 INFO conf.FlumeConfiguration: Processing:HDFS
17/09/29 02:25:39 INFO conf.FlumeConfiguration: Processing:HDFS
17/09/29 02:25:39 INFO conf.FlumeConfiguration: Processing:HDFS
17/09/29 02:25:39 INFO conf.FlumeConfiguration: Processing:HDFS
17/09/29 02:25:39 INFO conf.FlumeConfiguration: Processing:HDFS
17/09/29 02:25:39 INFO conf.FlumeConfiguration: Processing:HDFS
17/09/29 02:25:39 INFO conf.FlumeConfiguration: Added sinks: HDFS Agent: TwitterAgent
17/09/29 02:25:39 INFO conf.FlumeConfiguration: Processing:HDFS
17/09/29 02:25:39 INFO conf.FlumeConfiguration: Processing:HDFS
17/09/29 02:25:39 INFO conf.FlumeConfiguration: Processing:HDFS
17/09/29 02:25:39 INFO conf.FlumeConfiguration: Post-validation flume configuration contains configuration for agents: [TwitterAgent]
17/09/29 02:25:39 INFO node.AbstractConfigurationProvider: Creating channels
17/09/29 02:25:39 INFO channel.DefaultChannelFactory: Creating instance of channel MemChannel type memory
17/09/29 02:25:39 INFO node.AbstractConfigurationProvider: Created channel MemChannel
17/09/29 02:25:39 INFO source.DefaultSourceFactory: Creating instance of source Twitter, type org.apache.flume.source.twitter.TwitterSource
17/09/29 02:25:39 INFO twitter.TwitterSource: Consumer Key:        'fRw12aumIqkAWD6PP5ZHk7vva'
17/09/29 02:25:39 INFO twitter.TwitterSource: Consumer Secret:     'K9K0yL2pwngp3JXEdMGWUOEB7AaGWswXcq72WveRvnD4ZSphNQ'
17/09/29 02:25:39 INFO twitter.TwitterSource: Access Token:        '771287280438968320-XnbtNtBt40cs6gUOk6F9bjgmUABM0qG'
17/09/29 02:25:39 INFO twitter.TwitterSource: Access Token Secret: 'afUppGRqcRi2p9fzLhVdYQXkfMEm72xduaWD6uNs3HhKg'
17/09/29 02:25:39 INFO sink.DefaultSinkFactory: Creating instance of sink: HDFS, type: hdfs
17/09/29 02:25:39 INFO node.AbstractConfigurationProvider: Channel MemChannel connected to [Twitter, HDFS]
17/09/29 02:25:39 INFO node.Application: Starting new configuration:{ sourceRunners:{Twitter=EventDrivenSourceRunner: { source:org.apache.flume.source.twitter.TwitterSource{name:Twitter,state:IDLE} }} sinkRunners:{HDFS=SinkRunner: { policy:org.apache.flume.sink.DefaultSinkProcessor@3012a48f counterGroup:{ name:null counters:{} } }} channels:{MemChannel=org.apache.flume.channel.MemoryChannel{name: MemChannel}} }
17/09/29 02:25:39 INFO node.Application: Starting Channel MemChannel
17/09/29 02:25:39 INFO instrumentation.MonitoredCounterGroup: Monitored counter group for type: CHANNEL, name: MemChannel: Successfully registered new MBean.
17/09/29 02:25:39 INFO instrumentation.MonitoredCounterGroup: Component type: CHANNEL, name: MemChannel started
17/09/29 02:25:39 INFO node.Application: Starting Sink HDFS
17/09/29 02:25:39 INFO node.Application: Starting Source Twitter
17/09/29 02:25:39 INFO twitter.TwitterSource: Starting twitter source org.apache.flume.source.twitter.TwitterSource{name:Twitter,state:IDLE} ...
17/09/29 02:25:39 INFO twitter.TwitterSource: Twitter source Twitter started.
17/09/29 02:25:39 INFO twitter4j.TwitterStreamImpl: Establishing connection.
17/09/29 02:25:39 INFO instrumentation.MonitoredCounterGroup: Monitored counter group for type: SINK, name: HDFS: Successfully registered new MBean.
17/09/29 02:25:39 INFO instrumentation.MonitoredCounterGroup: Component type: SINK, name: HDFS started
17/09/29 02:25:42 INFO twitter4j.TwitterStreamImpl: Connection established.
17/09/29 02:25:42 INFO twitter4j.TwitterStreamImpl: Receiving status stream.
17/09/29 02:25:42 INFO hdfs.HDFSDataStream: Serializer = TEXT, UseRawLocalFileSystem = false
17/09/29 02:25:42 INFO hdfs.BucketWriter: Creating hdfs://localhost:9000/user/flume/tweets/FlumeData.1506632142370.tmp
17/09/29 02:25:42 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
17/09/29 02:25:44 WARN hdfs.HDFSEventSink: HDFS IO error
java.net.ConnectException: Call From maverick/127.0.1.1 to localhost:9000 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
    at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:792)
    at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:732)
    at org.apache.hadoop.ipc.Client.call(Client.java:1480)
    at org.apache.hadoop.ipc.Client.call(Client.java:1407)
    at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
    at com.sun.proxy.$Proxy13.create(Unknown Source)
    at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.create(ClientNamenodeProtocolTranslatorPB.java:296)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
    at com.sun.proxy.$Proxy14.create(Unknown Source)
    at org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate(DFSOutputStream.java:1623)
    at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1703)
    at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1638)
    at org.apache.hadoop.hdfs.DistributedFileSystem$7.doCall(DistributedFileSystem.java:448)
    at org.apache.hadoop.hdfs.DistributedFileSystem$7.doCall(DistributedFileSystem.java:444)
    at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
    at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:444)
    at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:387)
    at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:909)
    at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:890)
    at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:787)
    at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:776)
    at org.apache.flume.sink.hdfs.HDFSDataStream.doOpen(HDFSDataStream.java:86)
    at org.apache.flume.sink.hdfs.HDFSDataStream.open(HDFSDataStream.java:113)
    at org.apache.flume.sink.hdfs.BucketWriter$1.call(BucketWriter.java:246)
    at org.apache.flume.sink.hdfs.BucketWriter$1.call(BucketWriter.java:235)
    at org.apache.flume.sink.hdfs.BucketWriter$9$1.run(BucketWriter.java:679)
    at org.apache.flume.auth.SimpleAuthenticator.execute(SimpleAuthenticator.java:50)
    at org.apache.flume.sink.hdfs.BucketWriter$9.call(BucketWriter.java:676)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.ConnectException: Connection refused
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
    at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
    at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531)
    at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:495)
    at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:609)
    at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:707)
    at org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:370)
    at org.apache.hadoop.ipc.Client.getConnection(Client.java:1529)
    at org.apache.hadoop.ipc.Client.call(Client.java:1446)
    ... 34 more
17/09/29 02:25:45 INFO twitter.TwitterSource: Processed 100 docs
17/09/29 02:25:45 INFO hdfs.BucketWriter: Creating hdfs://localhost:9000/user/flume/tweets/FlumeData.1506632142371.tmp
17/09/29 02:25:45 WARN hdfs.HDFSEventSink: HDFS IO error
java.net.ConnectException: Call From maverick/127.0.1.1 to localhost:9000 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
17/09/29 02:25:48 INFO twitter.TwitterSource: Processed 200 docs
17/09/29 02:25:50 INFO twitter.TwitterSource: Processed 300 docs
17/09/29 02:25:50 INFO hdfs.BucketWriter: Creating hdfs://localhost:9000/user/flume/tweets/FlumeData.1506632142373.tmp
17/09/29 02:25:50 WARN hdfs.HDFSEventSink: HDFS IO error
java.net.ConnectException: Call From maverick/127.0.1.1 to localhost:9000 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused

Is there any complete solution to it. I can do it again from scratch.

Copyright Notice:Content Author:「Shivam」,Reproduced under the CC 4.0 BY-SA copyright license with a link to the original source and this disclaimer.
Link to original article:https://stackoverflow.com/questions/46478453/using-flume-to-store-data-in-hadoop

More about “Using FLUME to store data in Hadoop” related questions

Using FLUME to store data in Hadoop

I have followed all the steps for hadoop installation and Flume from tutorials. I am a naive in Big Data tools. I am getting the following errors. I dont understand, where the problem is? I have a...

Show Detail

retrieve google analytics data with Hadoop Flume

I would like to retrieve Google Analytics Data using the Google Analytics API. I want to do this with Hadoop Flume to store them on hdfs. For example, I want to retrieve the result of this http ca...

Show Detail

Data moving from RDBMS to Hadoop, using SQOOP and FLUME

I am in the process of learning Hadoop and stuck with few concepts on moving data from Relational database to Hadoop and vice versa. I have transferred files from MySQL to HDFS using SQOOP import q...

Show Detail

error in streaming twitter data to Hadoop using flume

I am using Hadoop-1.2.1 on Ubuntu 14.04 I am trying to stream data from twitter to HDFS by using Flume-1.6.0. I have downloaded flume-sources-1.0-SNAPSHOT.jar and included it in flume/lib folder. ...

Show Detail

FLUME [HADOOP_ORG.APACHE.FLUME.TOOLS.GETJAVAPROPERTY_USER: Bad substitution]

I am trying to run the typical Flume first example to get tweets and store them in HDFS using Apache FLume. [Hadoop version 3.1.3; Apache Flume 1.9.0] I have configured flume-env.sh: ` export

Show Detail

API data to hadoop via Flume

I have an API which returns data in xml format. I would like to run this on daily basis and store the returned data in Hadoop. Bit lost after going through documents of flume set up. Anyone has e...

Show Detail

How to write data to HA Hadoop QJM using Apache FLUME?

How flume will identify active namenode so that data will be written to HDFS? Without High Availability Hadoop we will have namenode ip configured in flume.conf so that the data will be easily dire...

Show Detail

Unable to load file to Hadoop using flume

Im using flume to move files to hdfs ... while moving file its showing this error.. please help me to solve this issue. 15/05/20 15:49:26 INFO instrumentation.MonitoredCounterGroup: Component type:

Show Detail

Apache Flume without hadoop

I am new to flume and hadoop. I have one doubt: Whether we can use flume without configuring hadoop? Can any one share their knowledge?

Show Detail

How to pull the data from mainframe files to hadoop HDFS using flume

I'm using cloudera CDH5 ,and there is a requirement to pull data from mainframe files to hadoop hdfs using flume . Could some one help me what are the steps need to take care ..

Show Detail