Tinker资源热修复中的压缩美学
回顾
Tinker资源热修复实际上还是采用了instant-run的方式,通过反射创建新的AssetManager的方式去进行,而Tinker资源热修复中的特点就是会生成补丁包,然后补丁包下发到客户端后进行合并,而这篇文章主要想分析Tinker中对zip文件独特的处理
在Tinker进行资源合并的时候,其实可以看到Tinker并没有采用ZipOutputStream、ZipFile等这些系统提供的对压缩文件的处理api,Tinker采取的是字节实现的TinkerZipFile、TinkerZipOutputStream等这些自己实现的接口,这些对压缩文件的处理有什么特色呢?为什么要这么做呢?
ResDiffPatchInternal-> extractResourceDiffInternals
回到Tinker中资源合并的逻辑中,简化出压缩文件处理的相关代码
TinkerZipOutputStream out = new TinkerZipOutputStream(
new BufferedOutputStream(
new FileOutputStream(resOutput)));
TinkerZipFile apk = new TinkerZipFile(apkPath);
TinkerZipEntry entry = apk.getEntry(entryName);
TinkerZipUtil.extractTinkerEntry(apk, entry, out);
# TinkerZipUtil-> extractTinkerEntry
public static void extractTinkerEntry(TinkerZipFile apk, TinkerZipEntry zipEntry, TinkerZipOutputStream outputStream) throws IOException {
InputStream in = null;
try {
in = apk.getInputStream(zipEntry);
outputStream.putNextEntry(new TinkerZipEntry(zipEntry));
byte[] buffer = new byte[BUFFER_SIZE];
for (int length = in.read(buffer); length != -1; length = in.read(buffer)) {
outputStream.write(buffer, 0, length);
}
outputStream.closeEntry();
} finally {
if (in != null) {
in.close();
}
}
}
其实从TinkerZipUtil来看,这个流程和原本用原生的压缩文件处理工具相差不大,都是从apk中拿到指定entry的输入流,然后从输入流读出,再用输出流回写进去
所以谜题的答案肯定就不在这个Util类里了,应该在TinkerZipFile、TinkerZipEntry、TinkerZipOutputStream中了
TinkerZipFile
# TinkerZipFile(File file)
public TinkerZipFile(File file) throws ZipException, IOException {
this(file, OPEN_READ);
}
# TinkerZipFile(File file, int mode)
public TinkerZipFile(File file, int mode) throws IOException {
filename = file.getPath();
// 处理打开模式的冲突
if (mode != OPEN_READ && mode != (OPEN_READ | OPEN_DELETE)) {
throw new IllegalArgumentException("Bad mode: " + mode);
}
// 单独处理OPEN_DELETE的打开模式
if ((mode & OPEN_DELETE) != 0) {
fileToDeleteOnClose = file;
fileToDeleteOnClose.deleteOnExit();
} else {
fileToDeleteOnClose = null;
}
// 创建一个RandomAccessFile,可读模式
raf = new RandomAccessFile(filename, "r");
// 读取中央目录的内容
readCentralDir();
}
主要逻辑为:
- 处理打开模式的冲突
- 单独处理OPEN_DELETE的打开模式
- 创建一个可读模式的RandomAccessFile
- 读取中央目录的内容
# TinkerZipFile-> readCentralDir
private void readCentralDir() throws IOException {
// 初次尝试获取中央目录的位置
long scanOffset = raf.length() - 22;
if (scanOffset < 0) {
throw new ZipException("File too short to be a zip file: " + raf.length());
}
// 获取zip文件的文件头标识,判断是否正确
raf.seek(0);
final int headerMagic = Integer.reverseBytes(raf.readInt());
// 0x04034b50L是Zip文件头的校验字段Local file header signature的值
if (headerMagic != 0x04034b50L) {
throw new ZipException("Not a zip archive");
}
// 设置往前停止的位置,其实也就是在变相的表示了Eocd中comment的最大长度
long stopOffset = scanOffset - 65536;
if (stopOffset < 0) {
stopOffset = 0;
}
// 不停地尝试,目的就是为了找到代表Eocd部分开始的一个签名
while (true) {
raf.seek(scanOffset);
// 0x06054b50L是Eocd部分的签名字段End of central directory signature
if (Integer.reverseBytes(raf.readInt()) == 0x06054b50L) {
break;
}
// 失败了就往前走一步
scanOffset--;
// 长度越界,抛出异常
if (scanOffset < stopOffset) {
throw new ZipException("End Of Central Directory signature not found");
}
}
// 把Eocd读到内存中,之所以是Eocd的长度是16,原因是因为签名字段已经读过了,就不再读进来了
byte[] eocd = new byte[22 - 4];
raf.readFully(eocd);
// 小端方式进行排列,然后读出我们需要的字段
BufferIterator it = HeapBufferIterator.iterator(eocd, 0, eocd.length, ByteOrder.LITTLE_ENDIAN);
// 当前的位置
int diskNumber = it.readShort() & 0xffff;
// 中央目录的位置
int diskWithCentralDir = it.readShort() & 0xffff;
// entry的数量
int numEntries = it.readShort() & 0xffff;
// entry总数
int totalNumEntries = it.readShort() & 0xffff;
it.skip(4); // Ignore centralDirSize.
// 中央目录的偏移量
long centralDirOffset = ((long) it.readInt()) & 0xffffffffL;
// 注释长度
int commentLength = it.readShort() & 0xffff;
// 对读出的值有效性进行校验
if (numEntries != totalNumEntries || diskNumber != 0 || diskWithCentralDir != 0) {
throw new ZipException("Spanned archives not supported");
}
// 读取注释内容
if (commentLength > 0) {
byte[] commentBytes = new byte[commentLength];
raf.readFully(commentBytes);
comment = new String(commentBytes, 0, commentBytes.length, StandardCharsets.UTF_8);
}
// Seek to the first CDE and read all entries.
// We have to do this now (from the constructor) rather than lazily because the
// public API doesn't allow us to throw IOException except from the constructor
// or from getInputStream.
// 读到中央目录的位置
RAFStream rafStream = new RAFStream(raf, centralDirOffset);
BufferedInputStream bufferedStream = new BufferedInputStream(rafStream, 4096);
byte[] hdrBuf = new byte[46]; // Reuse the same buffer for each entry.
for (int i = 0; i < numEntries; ++i) {
TinkerZipEntry newEntry = new TinkerZipEntry(hdrBuf, bufferedStream, StandardCharsets.UTF_8,
(false) /* isZip64 */);
if (newEntry.localHeaderRelOffset >= centralDirOffset) {
throw new ZipException("Local file header offset is after central directory");
}
String entryName = newEntry.getName();
if (entries.put(entryName, newEntry) != null) {
throw new ZipException("Duplicate entry name: " + entryName);
}
}
}
看完这个方法,顿时膜拜起了Tinker的作者,真的太厉害了
其实上面这段代码并不陌生,它被用在Walle、VasDolly等这种兼容V2签名的多渠道打包方式中,而其实现在从Tinker结合Walle、VasDolly来看这段代码,看到的是Tinker的作者对Zip文件格式的理解
方法流程:
- 从整个zip文件的大小计算出Eocd的位置,这个位置可能是不准确的,因为可能有comment的存在
- 判断Zip文件的文件头的签名是否是正确的
- 因为第一步中计算出的Eocd的位置可能是不对,所以需要一个字节一个字节的往前尝试,而这里定义了Comment字段的最大长度为65535
- 不断地尝试,直到找到Eocd的头部签名
- 把Eocd读到内存中来,数组长度为16原因是签名字段已经读过了,可以忽略不计
- 从Eocd这个数据段中读出需要用到的字段
- 读取注释字段的内容
- 读到第一个中央目录的位置,然后把这些数据构建成一个个TinkerZipEntry并用LinkedHashMap缓存起来
方法解读:
-
其实从TinkerZipFile的构造函数可以看到,TinkerZip、TinkerZipEntry、TinkerZipOutputStream这一套对压缩文件的处理工具的目的其实是为了不经过任何解压和重压缩的过程实现apk的合并
-
所以TinkerZipEntry并不是我们日常用于解压的ZipEntry,而仅仅代表的是Zip文件中的一段压缩数据流
-
而关于为什么TinkerZipEntry的初始化是在TinkerZipFile的初始化过程中进行,而不是懒加载,作者通过注释给出了他的答案:
因为受到了Java继承的限制,getEntry方法不能抛出比父类更多的异常,所以如果是懒加载从TinkerZipFile中读取出TinkerZipEntry就会没有角度去抛出IOException友好地提示调用者这个过程出现了问题,所以Tinker的作者就把这个解析的工作放在了构造函数里去执行了
-
而TinkerZipEntry的构造函数做的事情其实也是对签名进行校验以及中央目录数据结构的抽像
-
Tinker的作者把这套压缩文件处理工具设计得和系统API高度的相似,感觉让调用者会很舒服,可谓不管从Zip文件格式的理解、资源合并的极致优化还是外层的封装,都有着很深的理解
下一阶段源码阅读方向
上半段我们从TinkerZipFile中看到了TinkerZipFile独特的处理,而下半段源码阅读的方向,肯定就是这套优化方案的重头戏:TinkerZipOutputStream的重写入过程
# TinkerZipOutStream-> putNextEntry
public void putNextEntry(TinkerZipEntry ze) throws IOException {
if (currentEntry != null) {
closeEntry();
}
// Did this ZipEntry specify a method, or should we use the default?
// 定义压缩方式
int method = ze.getMethod();
if (method == -1) {
method = defaultCompressionMethod;
}
// If the method is STORED, check that the ZipEntry was configured appropriately.
if (method == STORED) {
if (ze.getCompressedSize() == -1) {
ze.setCompressedSize(ze.getSize());
} else if (ze.getSize() == -1) {
ze.setSize(ze.getCompressedSize());
}
if (ze.getCrc() == -1) {
throw new ZipException("STORED entry missing CRC");
}
if (ze.getSize() == -1) {
throw new ZipException("STORED entry missing size");
}
if (ze.size != ze.compressedSize) {
throw new ZipException("STORED entry size/compressed size mismatch");
}
}
checkOpen();
// checkAndSetZip64Requirements(ze);
//zhangshaowen edit here, we just want the same time and modDate
// Tinker的作者张绍文在这里进行了修改,定义了全部一致的time和modDate
ze.comment = null;
ze.extra = null;
ze.time = TIME_CONST;
ze.modDate = MOD_DATE_CONST;
nameBytes = ze.name.getBytes(StandardCharsets.UTF_8);
checkSizeIsWithinShort("Name", nameBytes);
entryCommentBytes = BYTE;
if (ze.comment != null) {
entryCommentBytes = ze.comment.getBytes(StandardCharsets.UTF_8);
// The comment is not written out until the entry is finished, but it is validated here
// to fail-fast.
checkSizeIsWithinShort("Comment", entryCommentBytes);
}
// def.setLevel(compressionLevel);
ze.setMethod(method);
currentEntry = ze;
currentEntry.localHeaderRelOffset = offset;
entries.add(currentEntry.name);
// Local file header.
// http://www.pkware.com/documents/casestudies/APPNOTE.TXT
int flags = (method == STORED) ? 0 : TinkerZipFile.GPBF_DATA_DESCRIPTOR_FLAG;
// Java always outputs UTF-8 filenames. (Before Java 7, the RI didn't set this flag and used
// modified UTF-8. From Java 7, when using UTF_8 it sets this flag and uses normal UTF-8.)
flags |= TinkerZipFile.GPBF_UTF8_FLAG;
writeLongAsUint32(out, LOCSIG); // Entry header
writeIntAsUint16(out, ZIP_VERSION_2_0); // Minimum version needed to extract.
writeIntAsUint16(out, flags);
writeIntAsUint16(out, method);
// zhangshaowen edit here, we just want the same time and modDate
// if (currentEntry.getTime() == -1) {
// currentEntry.setTime(System.currentTimeMillis());
// }
writeIntAsUint16(out, currentEntry.time);
writeIntAsUint16(out, currentEntry.modDate);
if (method == STORED) {
writeLongAsUint32(out, currentEntry.crc);
writeLongAsUint32(out, currentEntry.size);
writeLongAsUint32(out, currentEntry.size);
} else {
writeLongAsUint32(out, 0);
writeLongAsUint32(out, 0);
writeLongAsUint32(out, 0);
}
writeIntAsUint16(out, nameBytes.length);
if (currentEntry.extra != null) {
writeIntAsUint16(out, currentEntry.extra.length);
} else {
writeIntAsUint16(out, 0);
}
out.write(nameBytes);
if (currentEntry.extra != null) {
out.write(currentEntry.extra);
}
}
主要的逻辑就是对传入的TinkerZipEntry的内容进行进一步的矫正,然后按照Local file header的格式进行写回即可
对CDE的抽离和写入已经分析完了,但是这个流程还没走完,还差数据段的写入逻辑,所以我们先回到起点TinkerZipUtil
# TinkerZipUtil-> extractTinkerEntry
public static void extractTinkerEntry(TinkerZipFile apk, TinkerZipEntry zipEntry, TinkerZipOutputStream outputStream) throws IOException {
InputStream in = null;
try {
in = apk.getInputStream(zipEntry);
outputStream.putNextEntry(new TinkerZipEntry(zipEntry));
byte[] buffer = new byte[BUFFER_SIZE];
for (int length = in.read(buffer); length != -1; length = in.read(buffer)) {
outputStream.write(buffer, 0, length);
}
outputStream.closeEntry();
} finally {
if (in != null) {
in.close();
}
}
}
非常明显,玄机就出来在这InputStream的获取、InputStream的读取操作以及这个TinkerZipOutputStream的closeEntry了
# TinkerZipFile-> getInputStream
public InputStream getInputStream(TinkerZipEntry entry) throws IOException {
// 从LinkedHashMap中获取对应的Entry
entry = getEntry(entry.getName());
if (entry == null) {
return null;
}
// Create an InputStream at the right part of the file.
RandomAccessFile localRaf = raf;
synchronized (localRaf) {
// We don't know the entry data's start position. All we have is the
// position of the entry's local header.
// http://www.pkware.com/documents/casestudies/APPNOTE.TXT
// 找到entry对应的lfh段的位置
RAFStream rafStream = new RAFStream(localRaf, entry.localHeaderRelOffset);
DataInputStream is = new DataInputStream(rafStream);
// 校验这个签名是否正确
final int localMagic = Integer.reverseBytes(is.readInt());
if (localMagic != 0x04034b50) {
throwZipException(filename, localRaf.length(), entry.getName(), entry.localHeaderRelOffset, "Local File Header", localMagic);
}
is.skipBytes(2);
// At position 6 we find the General Purpose Bit Flag.
int gpbf = Short.reverseBytes(is.readShort()) & 0xffff;
if ((gpbf & TinkerZipFile.GPBF_UNSUPPORTED_MASK) != 0) {
throw new ZipException("Invalid General Purpose Bit Flag: " + gpbf);
}
// Offset 26 has the file name length, and offset 28 has the extra field length.
// These lengths can differ from the ones in the central header.
is.skipBytes(18);
// 读取文件名的长度
int fileNameLength = Short.reverseBytes(is.readShort()) & 0xffff;
// 读取扩展区的长度
int extraFieldLength = Short.reverseBytes(is.readShort()) & 0xffff;
is.close();
// Skip the variable-size file name and extra field data.
// 跳跃文件名和扩展区的位置
rafStream.skip(fileNameLength + extraFieldLength);
/*if (entry.compressionMethod == ZipEntry.STORED) {
rafStream.endOffset = rafStream.offset + entry.size;
return rafStream;
} else {
rafStream.endOffset = rafStream.offset + entry.compressedSize;
int bufSize = Math.max(1024, (int) Math.min(entry.getSize(), 65535L));
return new ZipInflaterInputStream(rafStream, new Inflater(true), bufSize, entry);
}*/
// 计算这个压缩文件结束位置,计算规则是当前的偏移加上文件的size
if (entry.compressionMethod == TinkerZipEntry.STORED) {
rafStream.endOffset = rafStream.offset + entry.size;
} else {
rafStream.endOffset = rafStream.offset + entry.compressedSize;
}
return rafStream;
}
}
主要流程:
- 从LinkedHashMap中获取对应的Entry
- 找到entry对应的local file header文件头
- 校验签名是否正确
- 跳过文件名以及扩展区
- 然后计算结束位置并保存起来,这样读取的时候,就可以读出数据区的内容
# RAFStream-> read
@Override public int read(byte[] buffer, int byteOffset, int byteCount) throws IOException {
synchronized (sharedRaf) {
final long length = endOffset - offset;
if (byteCount > length) {
byteCount = (int) length;
}
sharedRaf.seek(offset);
int count = sharedRaf.read(buffer, byteOffset, byteCount);
if (count > 0) {
offset += count;
return count;
} else {
return -1;
}
}
}
根据在getInputStream中完成计算的当前偏移量以及结尾偏移量读出TinkerZipEntry所对应的这个数据区的内容
然后写入到buffer中
RAFStream的设计其实是为了对RandomAccessFile做一个保证,因为RandomAccessFile是可以被共享的,所以需要通过这个RAFStream对访问进行一定的控制
# TinkerZipOutputStream-> closeEntry
public void closeEntry() throws IOException {
checkOpen();
if (currentEntry == null) {
return;
}
// 计算local file header的最终位置
long curOffset = 30;
// Write the DataDescriptor
// 写入数据描述符
if (currentEntry.getMethod() != STORED) {
curOffset += 16;
// Data descriptor signature and CRC are 4 bytes each for both zip and zip64.
writeLongAsUint32(out, 0x08074b50L);
/*writeLongAsUint32(out, currentEntry.crc = crc.getValue());
currentEntry.compressedSize = def.getBytesWritten();
currentEntry.size = def.getBytesRead();*/
writeLongAsUint32(out, currentEntry.crc);
/*if (currentEntryNeedsZip64) {
// We need an additional 8 bytes to store 8 byte compressed / uncompressed
// sizes.
curOffset += 8;
writeLongAsUint64(out, currentEntry.compressedSize);
writeLongAsUint64(out, currentEntry.size);
} else {
writeLongAsUint32(out, currentEntry.compressedSize);
writeLongAsUint32(out, currentEntry.size);
}*/
writeLongAsUint32(out, currentEntry.compressedSize);
writeLongAsUint32(out, currentEntry.size);
}
// Update the CentralDirectory
// http://www.pkware.com/documents/casestudies/APPNOTE.TXT
int flags = currentEntry.getMethod() == STORED ? 0 : TinkerZipFile.GPBF_DATA_DESCRIPTOR_FLAG;
// Since gingerbread, we always set the UTF-8 flag on individual files if appropriate.
// Some tools insist that the central directory have the UTF-8 flag.
// http://code.google.com/p/android/issues/detail?id=20214
flags |= TinkerZipFile.GPBF_UTF8_FLAG;
writeLongAsUint32(cDir, CENSIG);
writeIntAsUint16(cDir, ZIP_VERSION_2_0); // Version this file was made by.
writeIntAsUint16(cDir, ZIP_VERSION_2_0); // Minimum version needed to extract.
writeIntAsUint16(cDir, flags);
writeIntAsUint16(cDir, currentEntry.getMethod());
writeIntAsUint16(cDir, currentEntry.time);
writeIntAsUint16(cDir, currentEntry.modDate);
// writeLongAsUint32(cDir, crc.getValue());
writeLongAsUint32(cDir, currentEntry.crc);
if (currentEntry.getMethod() == DEFLATED) {
/*currentEntry.setCompressedSize(def.getBytesWritten());
currentEntry.setSize(def.getBytesRead());*/
curOffset += currentEntry.getCompressedSize();
} else {
/*currentEntry.setCompressedSize(crc.tbytes);
currentEntry.setSize(crc.tbytes);*/
curOffset += currentEntry.getSize();
}
/*if (currentEntryNeedsZip64) {
// Refresh the extended info with the compressed size / size before
// writing it to the central directory.
Zip64.refreshZip64ExtendedInfo(currentEntry);
// NOTE: We would've written out the zip64 extended info locator to the entry
// extras while constructing the local file header. There's no need to do it again
// here. If we do, there will be a size mismatch since we're calculating offsets
// based on the *current* size of the extra data and not based on the size
// at the point of writing the LFH.
writeLongAsUint32(cDir, Zip64.MAX_ZIP_ENTRY_AND_ARCHIVE_SIZE);
writeLongAsUint32(cDir, Zip64.MAX_ZIP_ENTRY_AND_ARCHIVE_SIZE);
} else {
writeLongAsUint32(cDir, currentEntry.getCompressedSize());
writeLongAsUint32(cDir, currentEntry.getSize());
}*/
writeLongAsUint32(cDir, currentEntry.getCompressedSize());
writeLongAsUint32(cDir, currentEntry.getSize());
curOffset += writeIntAsUint16(cDir, nameBytes.length);
if (currentEntry.extra != null) {
curOffset += writeIntAsUint16(cDir, currentEntry.extra.length);
} else {
writeIntAsUint16(cDir, 0);
}
writeIntAsUint16(cDir, entryCommentBytes.length); // Comment length.
writeIntAsUint16(cDir, 0); // Disk Start
writeIntAsUint16(cDir, 0); // Internal File Attributes
writeLongAsUint32(cDir, 0); // External File Attributes
/*if (currentEntryNeedsZip64) {
writeLongAsUint32(cDir, Zip64.MAX_ZIP_ENTRY_AND_ARCHIVE_SIZE);
} else {
writeLongAsUint32(cDir, currentEntry.localHeaderRelOffset);
}*/
writeLongAsUint32(cDir, currentEntry.localHeaderRelOffset);
cDir.write(nameBytes);
nameBytes = null;
if (currentEntry.extra != null) {
cDir.write(currentEntry.extra);
}
offset += curOffset;
if (entryCommentBytes.length > 0) {
cDir.write(entryCommentBytes);
entryCommentBytes = BYTE;
}
currentEntry = null;
/*crc.reset();
def.reset();
done = false;*/
}
在closeEntry方法中主要进行了数据描述符以及中央目录的写入工作
因此我们再次回到TinkerZipUtil做最后的总结
TinkerZipUtil-> extractTinkerEntry
public static void extractTinkerEntry(TinkerZipFile apk, TinkerZipEntry zipEntry, TinkerZipOutputStream outputStream) throws IOException {
InputStream in = null;
try {
// 获取了对apk中RandomAccessFile有访问控制的输入流
in = apk.getInputStream(zipEntry);
// 修正TinkerZipEntry的内容以及写入local file header
outputStream.putNextEntry(new TinkerZipEntry(zipEntry));
// 从输入流中根据数据结尾以及local file header的结束位置中读出压缩文件数据
// 然后用TinkerZipOutputStream进行写入操作
byte[] buffer = new byte[BUFFER_SIZE];
for (int length = in.read(buffer); length != -1; length = in.read(buffer)) {
outputStream.write(buffer, 0, length);
}
// 写入数据描述符以及中央目录部分
outputStream.closeEntry();
} finally {
if (in != null) {
in.close();
}
}
}
所以其实最后回过头来看这个TinkrZipUtil,它代表了整个压缩迁移操作的流程,流程为:
- 获取了对apk中RandomAccessFile有访问控制的输入流
- 修正TinkerZipEntry的内容以及写入local file header
- 从输入流中根据数据结尾以及local file header的结束位置中读出压缩文件数据,然后用TinkerZipOutputStream进行写入操作
- 写入数据描述符以及中央目录部分
总结
到这里其实Tinker中这套压缩文件的处理技术也就分析完了,主要的核心思想我觉得是作者向我们展示对zip文件结构的理解,通过对文件结构的分析和理解,可以做到从一个zip文件通过引流引至另一个zip文件,从而有效地规避了zip解压再进行重压缩的这么一个过程,这种优化除了可以用在资源热修复的资源合并之外,其实还能用于MultiDex的优化,总所周知MultiDex的两个最大的性能瓶颈:首次解压的耗时(当然还包含了一次无意义的解压重压缩)、dexopt的耗时,下一篇文章我会结合Tinker这一套对压缩的理解,从数据测试以及实际进行MultiDex性能优化的记录