【发布时间】:2018-12-28 05:59:48
【问题描述】:
我正在用 C++ 编写一个multipart/form-data 解析器,因为可用的选项似乎非常稀缺。
我最初的方法是使用istream::getline 一次缓冲一行(或部分行),以便检测边界。但是,虽然这适用于较小的文件,但不适用于较大的文件。对于大 (>50MB) 文件,cin 的坏位偶尔会被设置,在清除 istream 后,我注意到我会丢失字节。我不知道为什么,这就是这个问题的目的。
但是,如果我将缓冲区大小增加到 4MB 并使用 istream::read 将整个 multipart/form-data 请求转储到文件中,我不会丢失任何字节并且 cin 永远不会设置错误位。然后,我可以将转储文件重新打开为ifstream,而不是使用cin,并且我原来的小缓冲区getline 方法非常有效。
对这里发生的事情有任何见解吗?会不会是 FastCGI 或 Lighttpd 的副作用?
编辑:
以下是相关代码sn-ps:
#include <fcgio.h>
//...
int main()
{
//...
FCGX_Request request;
FCGX_Init();
FCGX_InitRequest(&request, 0, 0);
const size_t LEN = 1024;
vector<char> v(LEN); // Workaround for getting duplicates of every byte?
while (FCGX_Accept_r(&request) == 0) {
fcgi_streambuf cin_fcgi_streambuf(request.in, &v[0], v.size());
//... (eventually calls _parseMultipartFormFieldFile)
}
//...
}
/*
Extract a file from a multipart form section
istream should already have boundary and headers removed up throguh the final "\r\n"
Note that there are a lot of potential off-by-one errors here. Need to pay special attention
to gcount() and what is present in the buffer in each given scenario. Hence why you see:
gcount
gcount-1
gcount-2
These offsets are due to null terminator sometimes being appended, sometimes not, and/or '\r' being present or not.
It is possible for a few rare things to happen that will break this function:
1. Malicious content length
Client could lie about content length and send much more than we have room for. Should count bytes eventually, but easy enough to configure webserver to protect us.
*/
bool _parseMultipartFormFieldFile(
Request & req,
istream & input,
const string & name,
const string & upload_dir,
const string & boundary,
const string & end_boundary
)
{
static unsigned int file_id = 0; //used to generate unique file names
//Need fixed buffer size to prevent running out of RAM (malicious or not)
char buf[4096];
string file_name = upload_dir + ECPP_TMP_FILE + to_string(file_id++);
ofstream f(file_name, std::ofstream::out | std::ofstream::binary);
if (!f.is_open())
return false;
bool eof = false;
while (!eof) {
//Out of space in flash?
if (!f.good())
return false;
f.flush();
input.getline(buf, sizeof(buf));
unsigned int gcount = input.gcount();
if (input.bad()) {
//Crap! If we're here, we have most likely lost a few bytes...
input.clear();
continue;
}
else if (input.eof()) {
//If we are here, the multipart/form-data request was malformed
f.close();
remove(file_name.c_str()); //Delete malformed file
return false;
}
else if (input.fail()) {
//If we are in this condition, it means we encountered a line longer than our buffer
//There is no null terminator in this case, so write out what we have
f.write(buf, gcount);
input.clear(); //clear fail flag
continue;
}
if (gcount >= 2 && buf[gcount-2] == '\r') {
string peek = peekLine(input); //uses putback - modifies gcount()
if (peek == boundary || peek == end_boundary) {
//If we are in here, it means we encountered the last line in the section
//That means there is a trailing '\r' which we need to remove in addition to the null terminator
f.write(buf, gcount-2); // Remove null terminator and \r before writing
req.file[name] = file_name;
eof = true;
continue;
}
}
//If we are here it means we read in the entire line.
//Write out everything (minus the null terminator), and also add in the newline that was stripped by getline()
f.write(buf, gcount-1);
f.write("\n", 1);
}
return true;
}
所以,简而言之,问题是如果我将cin_fcgi_streambuf 传递给_parseMultipartFormFieldFile,我会丢失字节(触发坏位),但如果我不加选择地将cin_fcgi_streambuf 转储到带有char buf[4000000] 的文件中+ input.read(),然后将该文件的ifstream 传递给_parseMultipartFormFieldFile,就可以正常工作了。
【问题讨论】:
-
显示您的代码。 50MB 在现代系统上并不是很大,所以我认为您没有与大小相关的问题。 badbit 通常是不可恢复的,所以重置它是不安全的,除非你能处理可能丢失的读取。
-
您是使用库来处理 FastCGI 协议还是仅使用您自己的代码?
-
@FireLancer 我使用了来自(现已不复存在的)fastcgi.com 的“官方”FastCGI 库(我在 GitHub 中放了一份副本:github.com/RPGillespie6/FastCGI)
-
@FireLancer 我更新了(希望)相关代码。如果您认为这会有所帮助,我可以提供更多背景信息,但这只会使问题变成一堵巨大的代码墙。
-
@Gillespie 你检查了
fcgiapp.cline num aprx。 2215reqDataPtr->in = NewReader(reqDataPtr, 8192, 0);所以我认为你必须分配最大 8192 的缓冲区大小
标签: c++ multipartform-data fastcgi lighttpd