repartitionByRange


repartitionByRange(numPartitions, *cols) method of pyspark.sql.dataframe.DataFrame instance
    Returns a new :class:`DataFrame` partitioned by the given partitioning expressions. The
    resulting DataFrame is range partitioned.
    
    :param numPartitions:
        can be an int to specify the target number of partitions or a Column.
        If it is a Column, it will be used as the first partitioning column. If not specified,
        the default number of partitions is used.
    
    At least one partition-by expression must be specified.
    When no explicit sort order is specified, "ascending nulls first" is assumed.

 

 

begin = time.time()
df = merge_data
df.repartitionByRange(10,"probeset_id").write.format("delta").mode("append").save(f)
print(time.time()-begin)

 

相关文章:

  • 2021-09-09
  • 2021-07-23
  • 2021-07-01
  • 2022-12-23
  • 2021-10-06
  • 2021-06-16
  • 2021-08-29
  • 2022-12-23
猜你喜欢
  • 2022-12-23
  • 2021-06-09
  • 2021-07-31
  • 2022-12-23
  • 2021-08-08
  • 2022-12-23
相关资源
相似解决方案