About 175 results
Open links in new tab
  1. Sampling - Guide - Apache DataFu Pig

    Simple Random Sampling produces samples of a specific size, where each item has the same probability of being chosen. DataFu has scalable implementations of this that will generate samples …

  2. Guide - Apache DataFu Pig

    Sampling: simple random sample with/without replacement, weighted sample, sample by keys Hashing: SHA and MD5 Link Analysis: PageRank Assorted Macros: deduplication of tables, human-readable …

  3. Apache DataFu Pig - Getting Started

    Sampling Simple random sampling with or without replacement, weighted sampling. Link Analysis Run PageRank on a graph represented by a bag of nodes and edges. More Other useful methods like …

  4. SimpleRandomSample (datafu-pig 1.6.1 API)

    It takes a bag of n items and a sampling probability p as the inputs, and outputs a simple random sample of size exactly ceil (p*n) in a bag, with probability at least 99.99%.

  5. SimpleRandomSample (DataFu 1.1.0)

    It takes a sampling probability p as input and outputs a simple random sample of size exactly ceil (p*n) with probability at least 99.99%, where $n$ is the size of the population.

  6. datafu.pig.sampling (DataFu 1.2.0)

    Sampling UDFs, including weighted sample, reservoir sampling, sampling by key, etc.

  7. SampleByKey (datafu-pig 1.6.0 API)

    The method of sampling is to convert the key to a hash, derive a double value from this, and then test this against a supplied probability. The double value derived from a key is uniformly distributed …

  8. Class Hierarchy (datafu-pig 1.5.0 API)

    datafu.pig.sampling. ReservoirSample (implements org.apache.pig.Algebraic) datafu.pig.sampling.WeightedReservoirSample datafu.pig.sampling. WeightedReservoirSample …

  9. ReservoirSample (datafu-pig 1.6.1 API)

    java.lang.Object org.apache.pig.EvalFunc<T> org.apache.pig.AccumulatorEvalFunc<org.apache.pig.data.DataBag> …

  10. Index (DataFu 1.2.0)

    datafu.pig.sampling - package datafu.pig.sampling Sampling UDFs, including weighted sample, reservoir sampling, sampling by key, etc. datafu.pig.sessions - package datafu.pig.sessions