- Topology:Storm中运行的一个实时应用程序的名称。(拓扑)
-
Spout:在一个topology中获取源数据流的组件。
- 通常情况下spout会从外部数据源中读取数据,然后转换为topology内部的源数据。
- Bolt:接受数据然后执行处理的组件,用户可以在其中执行自己想要的操作。
- Tuple:一次消息传递的基本单元,理解为一组消息就是一个Tuple。
- Stream:表示数据的流向。
-
StreamGroup:数据分组策略
- Shuffle Grouping :随机分组,尽量均匀分布到下游Bolt中
- Fields Grouping :按字段分组,按数据中field值进行分组;相同field值的Tuple被发送到相同的Task
- All grouping:广播
- Global grouping :全局分组,Tuple被分配到一个Bolt中的一个Task,实现事务性的Topology。
- None grouping :不分组
- Direct grouping :直接分组 指定分组
二、流式计算一般框架图
- Flume用来获取数据。
- Kafka用来临时保存数据。
- Strom用来计算数据。
- Redis是个内存数据库,用来保存数据。
0.使用Maven管理工程,pom.xml需要添加的依赖
pom.xml
<!-- apache storm core -->
<dependency>
<groupId>org.apache.storm</groupId>
<artifactId>storm-core</artifactId>
<version>1.0.3</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.storm</groupId>
<artifactId>storm-rename-hack</artifactId>
<version>1.0.3</version>
</dependency>
<dependency>
<groupId>org.apache.storm</groupId>
<artifactId>storm-hbase</artifactId>
<version>1.0.3</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.apache.storm</groupId>
<artifactId>storm-redis</artifactId>
<version>1.0.3</version>
</dependency>
完整的pom.xml
1 <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 2 xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"> 3 <modelVersion>4.0.0</modelVersion> 4 <groupId>com.wulei</groupId> 5 <artifactId>Bigdata</artifactId> 6 <version>1.0.0</version> 7 8 9 <properties> 10 <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding> 11 <project.reporting.outputEncoding>UTF-8</project.reporting.outputEncoding> 12 <maven.compiler.encoding>UTF-8</maven.compiler.encoding> 13 <hadoop.version>2.7.3</hadoop.version> 14 </properties> 15 16 <dependencies> 17 <!-- Start-SQL connector --> 18 <dependency> 19 <groupId>mysql</groupId> 20 <artifactId>mysql-connector-java</artifactId> 21 <version>5.1.43</version> 22 </dependency> 23 24 <!-- Hadoop 2.7.3 --> 25 <dependency> 26 <groupId>org.apache.hadoop</groupId> 27 <artifactId>hadoop-client</artifactId> 28 <version>${hadoop.version}</version> 29 </dependency> 30 <dependency> 31 <groupId>org.apache.hadoop</groupId> 32 <artifactId>hadoop-common</artifactId> 33 <version>${hadoop.version}</version> 34 </dependency> 35 <dependency> 36 <groupId>org.apache.hadoop</groupId> 37 <artifactId>hadoop-hdfs</artifactId> 38 <version>${hadoop.version}</version> 39 </dependency> 40 41 <!-- HBase --> 42 <dependency> 43 <groupId>org.apache.hbase</groupId> 44 <artifactId>hbase</artifactId> 45 <version>1.3.1</version> 46 <type>pom</type> 47 </dependency> 48 49 <dependency> 50 <groupId>org.apache.hbase</groupId> 51 <artifactId>hbase-client</artifactId> 52 <version>1.3.1</version> 53 </dependency> 54 55 56 <dependency> 57 <groupId>org.apache.mrunit</groupId> 58 <artifactId>mrunit</artifactId> 59 <version>1.1.0</version> 60 <classifier>hadoop2</classifier> 61 <scope>test</scope> 62 </dependency> 63 64 <dependency> 65 <groupId>org.mockito</groupId> 66 <artifactId>mockito-all</artifactId> 67 <version>1.10.19</version> 68 <scope>test</scope> 69 </dependency> 70 71 <dependency> 72 <groupId>junit</groupId> 73 <artifactId>junit</artifactId> 74 <version>4.12</version> 75 <scope>test</scope> 76 </dependency> 77 78 <!-- apache storm core --> 79 <dependency> 80 <groupId>org.apache.storm</groupId> 81 <artifactId>storm-core</artifactId> 82 <version>1.0.3</version> 83 <scope>provided</scope> 84 </dependency> 85 86 <dependency> 87 <groupId>org.apache.storm</groupId> 88 <artifactId>storm-rename-hack</artifactId> 89 <version>1.0.3</version> 90 </dependency> 91 92 <dependency> 93 <groupId>org.apache.storm</groupId> 94 <artifactId>storm-hbase</artifactId> 95 <version>1.0.3</version> 96 <scope>test</scope> 97 </dependency> 98 99 <dependency> 100 <groupId>org.apache.storm</groupId> 101 <artifactId>storm-redis</artifactId> 102 <version>1.0.3</version> 103 </dependency> 104 105 106 107 </dependencies> 108 <build> 109 <plugins> 110 <plugin> 111 <groupId>org.apache.maven.plugins</groupId> 112 <artifactId>maven-compiler-plugin</artifactId> 113 <configuration> 114 <source>1.8</source> 115 <target>1.8</target> 116 </configuration> 117 </plugin> 118 119 <plugin> 120 <groupId>org.apache.maven.plugins</groupId> 121 <artifactId>maven-shade-plugin</artifactId> 122 <version>2.4.1</version> 123 <configuration> 124 <createDependencyReducedPom>false</createDependencyReducedPom> 125 </configuration> 126 <executions> 127 <execution> 128 <phase>package</phase> 129 <goals> 130 <goal>shade</goal> 131 </goals> 132 <configuration> 133 <transformers> 134 <transformer 135 implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer"> 136 <mainClass>com.bigdata.storm.WordCountTopology</mainClass> 137 </transformer> 138 </transformers> 139 </configuration> 140 </execution> 141 </executions> 142 </plugin> 143 144 </plugins> 145 </build> 146 </project>