使用哈夫曼树实现文本编码、解码_实验题目1-使用哈夫曼树实现文本编码、解码程序注 4.0 导入实验工程huffmantree.-程序员宅基地

技术标签： Java java 霍夫曼树数据结构

一、需求分析

现在许多实际问题抽象出来的数据结构往往都是二叉树的形式。哈夫曼编码可以对日常数据量很大的数据，进行数据压缩技术来实现存储和传输。

所以在本程序中，需要构造一棵二叉树来存储一大串字符串，对给构造出来的树进行编码，再由已经编好的哈夫曼编码对给定的字符串进行编码，之后对编码的字符串进行解码，最后比较编/解码前后字符串是否相同。

二、设计方案

1、流程

第一，统计频率。计算给定字符串字符出现的频率；结果用map来存储，其中key=字符，value=出现次数。

第二，构造二叉树。把字符出现的频率当作树的权重，构造一棵二叉树。

第三，编造哈夫曼编码。根据二叉树，对每个叶节点进行编码；结果用map来储，其中key=叶节点，value=编码。

第四，编码。根据哈夫曼编码，对给定字符进行编码，返回结果字符串。

第五，解码。对字符串的编码进行解码，返回结果字符串；比较前后数据。

2、方法列表

定义节点类	HTNode.java
统计字符串频率	public static Map<Character,Integer> computeCharCount(String text)
构造二叉树	public static HTNode buildTree(List<HTNode> nodes)
编造哈夫曼编码	public static Map<Character, String> getCode(HTNode tree)
编码	public static String encode(String text, Map<Character, String> code)
解码	public static String decode(String text, Map<Character, String> code)

三、关键算法实现

1、定义节点类（HTNode）

（1）设置节点属性。定义成员变量，存放节点的数据、编码、权重、左孩子、右孩子、是否为叶子节点。

（2）编写成员变量的get、set方法。因为成员变量为私有属性，在其他类里不能直接操作，要通过调用get、set操作。

2、统计字符串中字符出现的次数

（1）把字符串作为实参，传入函数

（2）new一个map对象。map.key=字符，map.value=出现次数

（3）遍历字符串，通过map.containsKey(key)方法，判定字符在map中是否存在。如果存在，让其value+1；否则，将字符和其个数(1)存放到map中。

3、构造二叉树

（1）对节点的属性进行初始化设置，将每个节点存入链表nodes中。把nodes作为实参，传入函数。

（2）根据节点的权重从小到大排序。

（3）每一次取权重最小的节点，定义一个parent节点，将这两个节点作为parent的左右孩子，设置parent不为叶节点，从链表中移除两个节点，将parent节点放入链表。

（4）最后，链表里只剩根节点结束循环，返回根节点。

4、计算哈夫曼编码

（1）将返回的根节点作为实参传入函数。

（2）创建队列，将根节点存放在队列中；创建map，key=叶节点，value=编码。

（3）遍历队列，队列不为空时，使用poll()方法获取并移除队列的头。

（4）判定该节点是否为叶子节点。如果是将叶节点的数据和编码存入map；否则，判断是否有左右孩子，左孩子编码+0，右孩子编码+1。将左右孩子节点放入队列。

（5）直至所以叶节点都被找出，循环结束，反面结果集map对象。

5、对给定字符进行编码

（1）将上一步返回的map对象（对照表：存放叶节点及其编码）和给定的字符串作为实参传入函数。

（2）遍历字符串。将字符串中的每一个字符，作为map的key，通过map.get(key)获取到对应的value，将每一个value值存入字符串str中，返回str。

6、对编码好的字符串，进行解码

（1）将字符串的编码和map对象（对照表：存放叶节点及其编码）作为实参传入函数。

（2）创建队列，将字符串每个字符存入队列。

（3）定义一个临时字符串tmp、结果字符串result。

（4）遍历队列，通过peek获取队列的头，将其中追加到临时变量tmp中。遍历对照表map，获取map中的key和value。

（5）判断tmp是否与对照表中的值相同。如果相同，将对照表中的key存入结果字符串result中，清空tmp，移除队列的头；如果不同，接着往tmp中缀加队列中的元素，和value进行比较，直到有相同时。

四、测试数据

1、统计字符出现频率

2、构造二叉树

3、每个字符对应的哈夫曼编码

4、对给定字符串进行编码

5、对编码的字符串进行解码

五、遇到的问题与解决方法

问题：按照节点的权重从小到大排序。nodes.sort(bull)报错。

原因：jdk1.6支持nodes.sort(null)这条语句，可以进行排序；但我的电脑装的是jdk1.7，所以要使用Collections包装类，调用其中的sort()方法才可以进行排序。

收获：为了解决这个，我上网搜了很多java关于排序的方法，明白了使用这个排序的原理。要想实现按权重排序的功能，首先需要实现Comparable接口；其次要重写compareTo方法，在这个方法里面设置排序规则。

源程序：

package 哈夫曼树;

import java.util.ArrayList;
import java.util.Collections;
import java.util.HashMap;
import java.util.Iterator;
import java.util.LinkedList;
import java.util.List;
import java.util.Map;
import java.util.Queue;


/**
 * 哈夫曼树实现
 *
 */
public class HuffmanTree {
	public static void main(String[] args){
		
		String data = "In computer science and information theory, "
				+ "a Huffman code is a particular type of optimal prefix code that is commonly used for lossless data compression. "
				+ "The process of finding and/or using such a code proceeds by means of Huffman coding, "
				+ "an algorithm developed by David A. Huffman while he was a Ph.D. student at MIT, and published in the 1952 paper "
				+ "\"A Method for the Construction of Minimum-Redundancy Codes\".[1] "
				+ "The output from Huffman's algorithm can be viewed as a variable-length code table for encoding a source symbol "
				+ "(such as a character in a file). The algorithm derives this table from the estimated probability or frequency of occurrence"
				+ " (weight) for each possible value of the source symbol. As in other entropy encoding methods, more common symbols are generally "
				+ "represented using fewer bits than less common symbols. Huffman's method can be efficiently implemented, "
				+ "finding a code in linear time to the number of input weights if these weights are sorted.[2] However, "
				+ "although optimal among methods encoding symbols separately, Huffman coding is not always optimal among all compression methods.";
		
		//首先对字符串中的字符出现次数进行统计
		Map<Character, Integer> chars = computeCharCount(data);
		System.out.println("【字母频率：】"+chars);
		
		ArrayList<HTNode> nodes = new ArrayList<>();
		
		//把每个节点存入链表nodes
		for(Character c : chars.keySet()){
			HTNode node = new HTNode();
			node.setData(c);
			node.setWeight(chars.get(c));
			node.setLchild(null);
			node.setRchild(null);
			node.setLeaf(true);
			nodes.add(node);
		}
		
		//建造二叉树，tree为返回的根节点
		HTNode tree = buildTree(nodes);
		System.out.println("【树结构：】"+tree.toString());
		System.out.println("【树权重：】"+tree.getWeight());
		
		//对节点进行编码
		Map<Character, String> map_code = getCode(tree);
		for(Character c : map_code.keySet()){
			System.out.println("字符'"+c+"'的哈夫曼编码："+map_code.get(c));
		}
		
		String text = "In computer science and information theory";
		
		//根据已编好码的字符，对上面字符串进行编码（a--->0111）
		String coded = encode(text,map_code);
		System.out.println("字符串:"+text);
		System.out.println("被编码为："+coded);
		
		//根据编码，进行解码（0111-->a）
		String oriText = decode(coded,map_code);
		System.out.println("编码："+coded);
		System.out.println("被解码为："+oriText);
		
		System.out.println("比较结果："+oriText.equals(text));
	}
	
	
	/**
	 * 根据初始的结点列表，建立哈夫曼树，
	 * 并生成哈夫曼编码，保存在当前类的code对象中，
	 * 生成的树根结点，被保存在当前类的tree对象中。
	 * 可以反复生成哈夫曼树，每次重新构建树，将更新编码
	 * @param nodes
	 * @return
	 */
	public static HTNode buildTree(List<HTNode> nodes) {
		while (nodes.size() > 1) {
			Collections.sort(nodes);
			// nodes.sort(null);
			HTNode first = nodes.get(0);
			HTNode second = nodes.get(1);
			HTNode parent = new HTNode();
			parent.setLchild(first);
			parent.setRchild(second);
			parent.setWeight(first.getWeight() + second.getWeight());
			parent.setLeaf(false);//设置不为叶节点			
			nodes.remove(0);
			nodes.remove(0);// 需要删除两次
			nodes.add(parent);
		}
		return nodes.get(0);
	}
	
	/**
	 * 根据已建立的哈夫曼树根结点，生成对应的字符编码，
	 * 字符编码应为0，1字符串
	 * @param tree
	 * @return
	 */
	public static Map<Character, String> getCode(HTNode tree) {// 获取字符编码
		// TODO
		Map<Character, String> code = new HashMap<Character, String>();
		Queue que = new LinkedList();// 创建队列链表
		que.add(tree);

		while (!que.isEmpty()) {//当队列不为空		
			HTNode front = (HTNode) que.poll();// 获取并移除此队列的头
			String front_huffman = front.getHuffmancode();// 获取对应哈夫曼节点的编码
			if (front.isLeaf()) {
				code.put(front.getData(), front_huffman);// 如果是叶节点则直接放入数据和编码
			} else {
				if (front.getLchild() != null) {
					front.getLchild().setHuffmancode(front_huffman + "0");
				}
				if (front.getRchild() != null) {
					front.getRchild().setHuffmancode(front_huffman + "1");
				}
				que.add(front.getLchild());
				que.add(front.getRchild());
			}
		}
		return code;
	}

	
	/**
	 * 统计字符串中字符出现的频率
	 * @param text
	 * @return
	 */
	public static Map<Character, Integer> computeCharCount(String text) {
		// TODO
		Map<Character, Integer> map = new HashMap<Character, Integer>();
		// 把字符串传入数组
		char[] chararray = text.toCharArray();
		for (int i = 0; i < chararray.length; i++) {
			if (map.containsKey(chararray[i])) {
				int value = map.get(chararray[i]);
				map.put(chararray[i], value + 1);
			} else {
				map.put(chararray[i], 1);
			}
		}
		return map;
	}
	
	
	/**
	 * 使用指定的huffman编码来对文本进行编码
	 * @return
	 */
	public static String encode(String text, Map<Character, String> code) {
		char[] chararray = text.toCharArray();
		String str = "";
		for (int i = 0; i < chararray.length; i++) {
			str += code.get(chararray[i]);
		}
		return str;
	}

	
	/**
	 * 使用预先建立好的huffman树，
	 * 对编码后的文本进行解码
	 * @param text
	 * @return
	 */
	public static String decode(String text, Map<Character, String> code) {
		Queue que = new LinkedList();
		char[] chararray = text.toCharArray();
		
		for (int i = 0; i < chararray.length; i++) {
			que.add(chararray[i]);
		}
		
		String tmp = "";
		String result = "";
		//当队列不为空时
		while (!que.isEmpty()) {
			String tou = "" + que.peek();
			tmp += tou;
			
			for(Character key : code.keySet()){
				String value = code.get(key);
				if (tmp.equals(value)) {
					result += key;
					tmp = "";
				}
			}
			que.poll();
		}
		return result;
	}
	
}

package 哈夫曼树;

public class HTNode implements Comparable<HTNode>{
	public enum Code{
		ZERO('0'), ONE('1');
		private char code;
		private Code(char c){
			this.code = c;
		}
		public char getCode(){
			return code;
		}
	}
	
	//哈夫曼树的叶子结点数据
	private char data;

	//结点的编码，只有0和1两种可能
	private Code code;
	private double weight;
	private HTNode lchild;
	private HTNode rchild;
	private boolean isLeaf;
	//存放编码的字符串
	private String huffmancode = "";
	
	public String getHuffmancode() {
		return huffmancode;
	}
	public void setHuffmancode(String huffmancode) {
		this.huffmancode = huffmancode;
	}
	public char getData() {
		return data;
	}
	public void setData(char data) {
		this.data = data;
	}
	public double getWeight() {
		return weight;
	}
	public void setWeight(double weight) {
		this.weight = weight;
	}
	public HTNode getLchild() {
		return lchild;
	}
	public void setLchild(HTNode lchild) {
		this.lchild = lchild;
	}
	public HTNode getRchild() {
		return rchild;
	}
	public void setRchild(HTNode rchild) {
		this.rchild = rchild;
	}
	public boolean isLeaf() {
		return isLeaf;
	}
	public void setLeaf(boolean isLeaf) {
		this.isLeaf = isLeaf;
	}
	public Code getCode() {
		return code;
	}
	public void setCode(Code code) {
		this.code = code;
	}
	
	@Override
	public int compareTo(HTNode o) {
		if(this.weight<o.weight){
			return -1;
		}else{
			return 1;
		}
	}
	
	@Override
	public String toString() {
		return "HTNode [data=" + data + ", code=" + code + ", weight=" + weight
				+ ", lchild=" + lchild + ", rchild=" + rchild + ", isLeaf="
				+ isLeaf + ", huffmancode=" + huffmancode + "]";
	}
}

本文链接：https://blog.csdn.net/roc13718296531/article/details/122110284

原作者删帖不实内容删帖广告或垃圾文章投诉

智能推荐

class和struct的区别-程序员宅基地

文章浏览阅读101次。4.class可以有⽆参的构造函数，struct不可以，必须是有参的构造函数，⽽且在有参的构造函数必须初始。2.Struct适⽤于作为经常使⽤的⼀些数据组合成的新类型，表示诸如点、矩形等主要⽤来存储数据的轻量。1.Class⽐较适合⼤的和复杂的数据，表现抽象和多级别的对象层次时。2.class允许继承、被继承，struct不允许，只能继承接⼝。3.Struct有性能优势，Class有⾯向对象的扩展优势。3.class可以初始化变量，struct不可以。1.class是引⽤类型，struct是值类型。

android使用json后闪退,应用闪退问题：从json信息的解析开始就会闪退-程序员宅基地

文章浏览阅读586次。想实现的功能是点击顶部按钮之后按关键字进行搜索，已经可以从服务器收到反馈的json信息，但从json信息的解析开始就会闪退，加载listview也不知道行不行public abstract class loadlistview{public ListView plv;public String js;public int listlength;public int listvisit;public..._rton转json为什么会闪退

如何使用wordnet词典，得到英文句子的同义句_get_synonyms wordnet-程序员宅基地

文章浏览阅读219次。如何使用wordnet词典，得到英文句子的同义句_get_synonyms wordnet

系统项目报表导出功能开发_积木报表多线程-程序员宅基地

文章浏览阅读521次。系统项目报表导出导出任务队列表 + 定时扫描 + 多线程_积木报表多线程

ajax 如何从服务器上获取数据？_ajax 获取http数据-程序员宅基地

文章浏览阅读1.1k次，点赞9次，收藏9次。使用AJAX技术的好处之一是它能够提供更好的用户体验，因为它允许在不重新加载整个页面的情况下更新网页的某一部分。另外，AJAX还使得开发人员能够创建更复杂、更动态的Web应用程序，因为它们可以在后台与服务器进行通信，而不需要打断用户的浏览体验。在Web开发中，AJAX（Asynchronous JavaScript and XML）是一种常用的技术，用于在不重新加载整个页面的情况下，从服务器获取数据并更新网页的某一部分。使用AJAX，你可以创建异步请求，从而提供更快的响应和更好的用户体验。_ajax 获取http数据

Linux图形终端与字符终端-程序员宅基地

文章浏览阅读2.8k次。登录退出、修改密码、关机重启_字符终端

随便推点

Python与Arduino绘制超声波雷达扫描_超声波扫描建模 python库-程序员宅基地

文章浏览阅读3.8k次，点赞3次，收藏51次。前段时间看到一位发烧友制作的超声波雷达扫描神器，用到了Arduino和Processing，可惜啊，我不会Processing更看不懂人家的程序，咋办呢？嘿嘿，所以我就换了个思路解决，因为我会一点Python啊，那就动手吧！在做这个案例之前先要搞明白一个问题：怎么将Arduino通过超声波检测到的距离反馈到Python端？这个嘛，我首先想到了串行通信接口。没错！就是串口。只要Arduino将数据发送给COM口，然后Python能从COM口读取到这个数据就可以啦！我先写了一个测试程序试了一下，OK！搞定_超声波扫描建模 python库

凯撒加密方法介绍及实例说明-程序员宅基地

文章浏览阅读4.2k次。端—端加密指信息由发送端自动加密，并且由TCP/IP进行数据包封装，然后作为不可阅读和不可识别的数据穿过互联网，当这些信息到达目的地，将被自动重组、解密，而成为可读的数据。不可逆加密算法的特征是加密过程中不需要使用密钥，输入明文后由系统直接经过加密算法处理成密文，这种加密后的数据是无法被解密的，只有重新输入明文，并再次经过同样不可逆的加密算法处理，得到相同的加密密文并被系统重新识别后，才能真正解密。2．使用时，加密者查找明文字母表中需要加密的消息中的每一个字母所在位置，并且写下密文字母表中对应的字母。_凯撒加密

工控协议--cip--协议解析基本记录_cip协议embedded_service_error-程序员宅基地

文章浏览阅读5.7k次。CIP报文解析常用到的几个字段：普通类型服务类型:[0x00], CIP对象:[0x02 Message Router], ioi segments:[XX]PCCC（带cmd和func）服务类型:[0x00], CIP对象:[0x02 Message Router], cmd:[0x101], fnc:[0x101]..._cip协议embedded_service_error

如何在vs2019及以后版本(如vs2022)上添加添加ActiveX控件中的MFC类_vs添加mfc库-程序员宅基地

文章浏览阅读2.4k次，点赞9次，收藏13次。有时候我们在MFC项目开发过程中，需要用到一些微软已经提供的功能，如VC++使用EXCEL功能，这时候我们就能直接通过VS2019到如EXCEL.EXE方式，生成对应的OLE头文件，然后直接使用功能，那么，我们上篇文章中介绍了vs2017及以前的版本如何来添加。但由于微软某些方面考虑，这种方式已被放弃。从上图中可以看出，这一功能，在从vs2017版本15.9开始，后续版本已经删除了此功能。那么我们如果仍需要此功能，我们如何在新版本中添加呢。_vs添加mfc库

frame_size (1536) was not respected for a non-last frame_frame_size (1024) was not respected for a non-last-程序员宅基地

文章浏览阅读785次。用ac3编码，执行编码函数时报错入如下：[ac3 @ 0x7fed7800f200] frame_size (1536) was not respected for anon-last frame (avcodec_encode_audio2)用ac3编码时每次送入编码器的音频采样数应该是1536个采样，不然就会报上述错误。这个数字并非刻意固定，而是跟ac3内部的编码算法原理相关。全网找不到，国内音视频之路还有很长的路，音视频人一起加油吧～......_frame_size (1024) was not respected for a non-last frame

Android移动应用开发入门_在安卓移动应用开发中要在活动类文件中声迷你一个复选框变量-程序员宅基地

文章浏览阅读230次，点赞2次，收藏2次。创建Android应用程序一个项目里面可以有很多模块，而每一个模块就对应了一个应用程序。项目结构介绍_在安卓移动应用开发中要在活动类文件中声迷你一个复选框变量