【问题标题】:Report all possible combinations of columns报告所有可能的列组合
【发布时间】:2018-02-14 07:13:55
【问题描述】:

我有一个关于一般组合的问题,但在一个相当复杂的情况下,我还没有找到任何帮助。我正在尝试找到一种方法来报告数据集中所有可能的组合。

土地变化文献调查的数据报告,并指出每篇文章中报告了哪些直接和潜在的驱动因素。因此,行表示单个文章,而列都表示最接近和潜在的驱动程序。有六种类型的邻近驱动因素和五种类型的潜在驱动因素。对于每篇文章,在该文章中标识的驱动程序的列中放置一个 1,在未标识的驱动程序的列中放置一个 0。表格大致如下:

key | d1 | d2 |...| d6 | i1 |...| i5 |
--------------------------------------
A1  | 1  | 0  |...| 1  | 1  |...| 0  |
A2  | 0  | 1  |...| 0  | 0  |...| 1  |

其中文章 A1 将 d1 和 d6 标识为直接驱动程序,将 i1 标识为间接驱动程序等。

我想做的是找出报告直接驱动因素、间接驱动因素以及直接和间接驱动因素的所有可能组合的文章数量。因此,例如,有多少文章标识了 d1、d2 和 i1;有多少识别 d1、d2 和 i2;等等?我的学生有一个 Excel 文件中的表格,我在想也许 Calc 或 Base 可能具有自动化该过程的功能。有人知道我该怎么做吗?

谢谢!

【问题讨论】:

  • 所以您想识别所有 2^11 个组合并计算每个组合有多少?那是 2048 种不同的组合。
  • 这就是我希望简化流程的原因。这个想法是确定文献中最常出现的驱动因素组合。
  • 使用一点 UDF(或连接二进制数字)将驱动程序组合成一个条件字符串。然后使用数据透视表计算每个组合字符串的数量。

标签: mysql excel bash combinations summary


【解决方案1】:

我终于放弃了,采用了蛮力的方法。我将表格导出为文本并将其拉入 MySQL,然后使用 bash 脚本遍历选项。如果其他人有类似的问题,这里是 bash 脚本:

    # Generate array containing factors
faclis1=( d_inf d_com d_inm d_ind d_agr d_bos i_dem i_eco i_tec i_pol i_cul );
#faclis=( "d_inf" "d_com" "d_inm" );
a=0
#echo ${faclis[@]};


# Remove output file if exists
if [ -e permcounts.txt ];
    then
    rm permcounts.txt;
    fi;

# Cycle through list of factors
for f1 in ${faclis1[@]};
do
# only proceed if factor not null
if [ ${f1} ];
then
# print remaining array just to be sure
echo "factor list is ${faclis1[@]}";
#echo ${faclis[@]};
echo "Now on factor ${f1}";
echo "FACTOR ${f1}" >> permcounts.txt;
mysql -u harvey -pdavid -e "select count(clave) from genfact where \
${f1} = 1;" metamorelia >> permcounts.txt;
# create sub array without current factor, 2 factors
faclis2=( ${faclis1[@]/${f1}/} );
#set sub-counter
b=0
#echo "${faclis2[@]}";
# loop through sub array, two factors
for f2 in ${faclis2[@]};
do
if [ ${f2} ] && \
[ "${f1}" != "${f2}" ];
then
echo "FACTOR ${f1} \
AND ${f2}" >> permcounts.txt;
mysql -u harvey -pdavid -e "select count(clave) from genfact where \
${f1} = 1 and \
${f2} = 1;" metamorelia >> permcounts.txt;

# next sub-array
faclis3=( ${faclis2[@]//${f2}} );
c=0
#echo "${faclis3[@]}";
# loop through sub-array
for f3 in ${faclis3[@]};
do
if [ ${f3} ] && \
[ "${f1}" != "${f3}" ] && \
[ "${f2}" != "${f3}" ];

then
echo "FACTOR ${f1} \
AND ${f2} \
AND ${f3}" >> permcounts.txt;
mysql -u harvey -pdavid -e "select count(clave) from genfact where \
${f1} = 1 and \
${f2} = 1 and \
${f3} = 1;" metamorelia >> permcounts.txt;

# next sub-array
faclis4=( ${faclis3[@]//${f3}} );
d=0
#echo "${faclis4[@]}";
# loop through sub-array
for f4 in ${faclis4[@]};
do
if [ ${f4} ] && \
[ "${f1}" != "${f4}" ] && \
[ "${f2}" != "${f4}" ] && \
[ "${f3}" != "${f4}" ];
then
echo "FACTOR ${f1} \
AND ${f2} \
AND ${f3} \
AND ${f4}" >> permcounts.txt;
mysql -u harvey -pdavid -e "select count(clave) from genfact where \
${f1} = 1 and \
${f2} = 1 and \
${f3} = 1 and \
${f4} = 1;" metamorelia >> permcounts.txt;

# next sub-array
faclis5=( ${faclis4[@]//${f4}} );
e=0
#echo "${faclis5[@]}";
# loop through sub-array
for f5 in ${faclis5[@]};
do
if [ ${f5} ] && \
[ "${f1}" != "${f5}" ] && \
[ "${f2}" != "${f5}" ] && \
[ "${f3}" != "${f5}" ] && \
[ "${f4}" != "${f5}" ];
then
echo "FACTOR ${f1} \
AND ${f2} \
AND ${f3} \
AND ${f4} \
AND ${f5}" >> permcounts.txt;
mysql -u harvey -pdavid -e "select count(clave) from genfact where \
${f1} = 1 and \
${f2} = 1 and \
${f3} = 1 and \
${f4} = 1 and \
${f5} = 1;" metamorelia >> permcounts.txt;

# next sub-array
faclis6=( ${faclis5[@]//${f5}} );
f=0
#echo "${faclis6[@]}";
# loop through sub-array
for f6 in ${faclis6[@]};
do
if [ ${f6} ] && \
[ "${f1}" != "${f6}" ] && \
[ "${f2}" != "${f6}" ] && \
[ "${f3}" != "${f6}" ] && \
[ "${f4}" != "${f6}" ] && \
[ "${f5}" != "${f6}" ];
then
echo "FACTOR ${f1} \
AND ${f2} \
AND ${f3} \
AND ${f4} \
AND ${f5} \
AND ${f6}" >> permcounts.txt;
mysql -u harvey -pdavid -e "select count(clave) from genfact where \
${f1} = 1 and \
${f2} = 1 and \
${f3} = 1 and \
${f4} = 1 and \
${f5} = 1 and \
${f6} = 1;" metamorelia >> permcounts.txt;

# next sub-array
faclis7=( ${faclis6[@]//${f6}} );
g=0
#echo "${faclis7[@]}";
# loop through sub-array
for f7 in ${faclis7[@]};
do
if [ ${f7} ] && \
[ "${f1}" != "${f7}" ] && \
[ "${f2}" != "${f7}" ] && \
[ "${f3}" != "${f7}" ] && \
[ "${f4}" != "${f7}" ] && \
[ "${f5}" != "${f7}" ] && \
[ "${f6}" != "${f7}" ];
then
echo "FACTOR ${f1} \
AND ${f2} \
AND ${f3} \
AND ${f4} \
AND ${f5} \
AND ${f6} \
AND ${f7}" >> permcounts.txt;
mysql -u harvey -pdavid -e "select count(clave) from genfact where \
${f1} = 1 and \
${f2} = 1 and \
${f3} = 1 and \
${f4} = 1 and \
${f5} = 1 and \
${f6} = 1 and \
${f7} = 1;" metamorelia >> permcounts.txt;

# next sub-array
faclis8=( ${faclis7[@]//${f7}} );
h=0
#echo "${faclis8[@]}";
# loop through sub-array
for f8 in ${faclis8[@]};
do
if [ ${f8} ] && \
[ "${f1}" != "${f8}" ] && \
[ "${f2}" != "${f8}" ] && \
[ "${f3}" != "${f8}" ] && \
[ "${f4}" != "${f8}" ] && \
[ "${f5}" != "${f8}" ] && \
[ "${f6}" != "${f8}" ] && \
[ "${f7}" != "${f8}" ];
then
echo "FACTOR ${f1} \
AND ${f2} \
AND ${f3} \
AND ${f4} \
AND ${f5} \
AND ${f6} \
AND ${f7} \
AND ${f8}" >> permcounts.txt;
mysql -u harvey -pdavid -e "select count(clave) from genfact where \
${f1} = 1 and \
${f2} = 1 and \
${f3} = 1 and \
${f4} = 1 and \
${f5} = 1 and \
${f6} = 1 and \
${f7} = 1 and \
${f8} = 1;" metamorelia >> permcounts.txt;

# next sub-array
faclis9=( ${faclis8[@]//${f8}} );
i=0
#echo "${faclis9[@]}";
# loop through sub-array
for f9 in ${faclis9[@]};
do
if [ ${f9} ] && \
[ "${f1}" != "${f9}" ] && \
[ "${f2}" != "${f9}" ] && \
[ "${f3}" != "${f9}" ] && \
[ "${f4}" != "${f9}" ] && \
[ "${f5}" != "${f9}" ] && \
[ "${f6}" != "${f9}" ] && \
[ "${f7}" != "${f9}" ] && \
[ "${f8}" != "${f9}" ];
then
echo "FACTOR ${f1} \
AND ${f2} \
AND ${f3} \
AND ${f4} \
AND ${f5} \
AND ${f6} \
AND ${f7} \
AND ${f8} \
AND ${f9}" >> permcounts.txt;
mysql -u harvey -pdavid -e "select count(clave) from genfact where \
${f1} = 1 and \
${f2} = 1 and \
${f3} = 1 and \
${f4} = 1 and \
${f5} = 1 and \
${f6} = 1 and \
${f7} = 1 and \
${f8} = 1 and \
${f9} = 1;" metamorelia >> permcounts.txt;

# next sub-array
faclis10=( ${faclis9[@]//${f9}} );
j=0
#echo "${faclis10[@]}";
# loop through sub-array
for f10 in ${faclis10[@]};
do
if [ ${f10} ] && \
[ "${f1}" != "${f10}" ] && \
[ "${f2}" != "${f10}" ] && \
[ "${f3}" != "${f10}" ] && \
[ "${f4}" != "${f10}" ] && \
[ "${f5}" != "${f10}" ] && \
[ "${f6}" != "${f10}" ] && \
[ "${f7}" != "${f10}" ] && \
[ "${f8}" != "${f10}" ] && \
[ "${f9}" != "${f10}" ];
then
echo "FACTOR ${f1} \
AND ${f2} \
AND ${f3} \
AND ${f4} \
AND ${f5} \
AND ${f6} \
AND ${f7} \
AND ${f8} \
AND ${f9} \
AND ${f10}" >> permcounts.txt;
mysql -u harvey -pdavid -e "select count(clave) from genfact where \
${f1} = 1 and \
${f2} = 1 and \
${f3} = 1 and \
${f4} = 1 and \
${f5} = 1 and \
${f6} = 1 and \
${f7} = 1 and \
${f8} = 1 and \
${f9} = 1 and \
${f10} = 1;" metamorelia >> permcounts.txt;

# next sub-array
faclis11=( ${faclis10[@]//${f10}} );
k=0
#echo "${faclis11[@]}";
# loop through sub-array
for f11 in ${faclis11[@]};
do
if [ ${f11} ] && \
[ "${f1}" != "${f11}" ] && \
[ "${f2}" != "${f11}" ] && \
[ "${f3}" != "${f11}" ] && \
[ "${f4}" != "${f11}" ] && \
[ "${f5}" != "${f11}" ] && \
[ "${f6}" != "${f11}" ] && \
[ "${f7}" != "${f11}" ] && \
[ "${f8}" != "${f11}" ] && \
[ "${f9}" != "${f11}" ] && \
[ "${f10}" != "${f11}" ];
then
echo "FACTOR ${f1} \
AND ${f2} \
AND ${f3} \
AND ${f4} \
AND ${f5} \
AND ${f6} \
AND ${f7} \
AND ${f8} \
AND ${f9} \
AND ${f10} \
AND ${f11}" >> permcounts.txt;
mysql -u harvey -pdavid -e "select count(clave) from genfact where \
${f1} = 1 and \
${f2} = 1 and \
${f3} = 1 and \
${f4} = 1 and \
${f5} = 1 and \
${f6} = 1 and \
${f7} = 1 and \
${f8} = 1 and \
${f9} = 1 and \
${f10} = 1 and \
${f11} = 1;" metamorelia >> permcounts.txt;

unset faclis11[k];
k=$((${k} + 1));
fi;
done;
unset faclis10[j];
j=$((${j} + 1));
fi;
done;
unset faclis9[i];
i=$((${i} + 1));
fi;
done;
unset faclis8[h];
h=$((${h} + 1));
fi;
done;
unset faclis7[g];
g=$((${g} + 1));
fi;
done;
unset faclis6[f];
f=$((${f} + 1));
fi;
done;
unset faclis5[e];
e=$((${e} + 1));
fi;
done;
unset faclis4[d];
d=$((${d} + 1));
fi;
done;
unset faclis3[c];
c=$((${c} + 1));
fi;
done;
# Remove analyzed factors from vector
unset faclis2[b];
b=$((${b} + 1));
fi;
done;
# remove nth item from array (progressively remove one item)
unset faclis1[a];
# increment n for next round
a=$((${a} + 1));
echo ${n};
fi;
done;

这个脚本效率有点低,我认为我包含了很多不必要的操作,但它完成了工作。 (我认为确实如此。我的学生必须浏览输出文件以确保所有内容都在那里。)

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 2011-12-09
    • 1970-01-01
    • 1970-01-01
    • 2017-01-22
    • 1970-01-01
    相关资源
    最近更新 更多