multiple flume twitter agents
NickName:Kevin Wincott Ask DateTime:2014-01-24T21:08:22

multiple flume twitter agents

im learning hadoop, flume etc and one of the projects I started was sentiment analysis, which is OK but now im trying to expand by collecting multiple sets of data, this is my flume.conf:

    TwitterAgent.sources = Twitter
    TwitterAgent.channels = MemChannel
    TwitterAgent.sinks = HDFS HDFS2
    TwitterAgent.sources.Twitter.type = com.cloudera.flume.source.TwitterSource
    TwitterAgent.sources.Twitter.channels = MemChannel
    TwitterAgent.sources.Twitter.consumerKey = xxx
    TwitterAgent.sources.Twitter.consumerSecret = xxxx
    TwitterAgent.sources.Twitter.accessToken = xxx
    TwitterAgent.sources.Twitter.accessTokenSecret = xxxx
    TwitterAgent.sources.Twitter.keywords = bbc
    TwitterAgent.sinks.HDFS.channel = MemChannel
    TwitterAgent.sinks.HDFS.type = hdfs
    TwitterAgent.sinks.HDFS.hdfs.path = hdfs://xxx:8020/user/flume/tweets/
    TwitterAgent.sinks.HDFS.hdfs.fileType = DataStream
    TwitterAgent.sinks.HDFS.hdfs.writeFormat = Text
    TwitterAgent.sinks.HDFS.hdfs.batchSize = 1000
    TwitterAgent.sinks.HDFS.hdfs.rollSize = 0
    TwitterAgent.sinks.HDFS.hdfs.rollCount = 10000
    TwitterAgent.channels.MemChannel.type = memory
    TwitterAgent.channels.MemChannel.capacity = 10000
    TwitterAgent.channels.MemChannel.transactionCapacity = 100

what im hoping to achieve is put all tweets about bbc in the above location but also use the following config to put tweets about liverpool into a seperate folder:

    TwitterAgent.sources.Twitter.keywords = liverpool
    TwitterAgent.sinks.HDFS2.channel = MemChannel
    TwitterAgent.sinks.HDFS2.type = hdfs
    TwitterAgent.sinks.HDFS2.hdfs.path = hdfs://xxx:8020/user/flume/tweets/liverpool/
    TwitterAgent.sinks.HDFS2.hdfs.fileType = DataStream
    TwitterAgent.sinks.HDFS2.hdfs.writeFormat = Text
    TwitterAgent.sinks.HDFS2.hdfs.batchSize = 1000
    TwitterAgent.sinks.HDFS2.hdfs.rollSize = 0
    TwitterAgent.sinks.HDFS2.hdfs.rollCount = 10000
    TwitterAgent.channels.MemChannel2.type = memory
    TwitterAgent.channels.MemChannel2.capacity = 10000
    TwitterAgent.channels.MemChannel2.transactionCapacity = 10

This isnt working and I cant work out why, can anyone point me in the right direction?

Copyright Notice:Content Author:「Kevin Wincott」,Reproduced under the CC 4.0 BY-SA copyright license with a link to the original source and this disclaimer.
Link to original article:https://stackoverflow.com/questions/21333532/multiple-flume-twitter-agents

More about “multiple flume twitter agents” related questions

multiple flume twitter agents

im learning hadoop, flume etc and one of the projects I started was sentiment analysis, which is OK but now im trying to expand by collecting multiple sets of data, this is my flume.conf:

Show Detail

hadoop flume extracting twitter data

I Have used this command while extracting twitter data using flume [cloudera@localhost bin]$ ./flume-ng agent --conf ./conf/ -f ../conf/flume.conf -Dflume.root.logger=DEBUG,console -n TwitterAgent...

Show Detail

Flume twitter stream

I am trying to execute flume to get data from twitter stream but received this error while executing the flume. [ERROR - org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(

Show Detail

How to monitor Apache Flume agents status?

I know the Enterprise (Cloudera for example) way, by using a CM (via browser) or by Cloudera REST API one can access monitoring and configuring facilities. But how to schedule (run and rerun) flume

Show Detail

Apache Flume multiple agent

I have tested Apache Flume to transfer files from local to HDFS. But if the source files from multiple servers (transfer files from different servers' local to HDFS), can I just run one Flume insta...

Show Detail

How to distribute Apache Flume 1.4

I looked at Apache Flume 1.4 documentation and its not clear about how to distribute the work accross nodes. I have to fetch data from multiple sources and multiple query terms for each source and ...

Show Detail

getting error in twitter streaming through flume

**While running the flume command i am getting the following error,i tried changing the envi variables in .bashrc along with classpath in flume.env.sh ,still no use Picked up JAVA_TOOL_OPTIONS: -

Show Detail

Flume-twitter streaming API

I am new to flume, I have used flume to stream data from twitter using the search API. But the twitter json has the "geo" key set to null. So is there a way to get the twitter data using Streaming ...

Show Detail

Issue while getting Twitter data in HDFS using Flume

I am trying to fetch the twitter data in HDFS but getting issue. Here is my flume.conf file TwitterAgent.sources= Twitter TwitterAgent.channels= MemChannel TwitterAgent.sinks=HDFS TwitterAgent.s...

Show Detail

Flume Fetching Twitter Data

While fetching the Twitter data through Flume, when I start the agent, I got the following error: Unable to start EventDrivenSourceRunner: { source:com.cloudera.flume.source.TwitterSource{name:Twi...

Show Detail