Msieve
msieve - Number Field Sieve implementation by Jason Papadopoulos
Install / Use
/learn @radii/MsieveREADME
MSIEVE: A Library for Factoring Large Integers Jason Papadopoulos
Introduction
Msieve is the result of my efforts to understand and optimize how integers are factored using the most powerful modern algorithms.
This documentation corresponds to version 1.46 of the Msieve library. Do not expect to become a factoring expert just by reading it. I've included a relatively complete list of references that you can and should look up if you want to treat the code as more than a black box to solve your factoring problems.
What Msieve Does
Factoring is the study (half math, half engineering, half art form) of taking big numbers and expessing them as the product of smaller numbers. If I find out 15 = 3 * 5, I've performed an integer factorization on the number 15. As the number to be factored becomes larger, the difficulty involved in completing its factorization explodes, to the point where you can invent secret codes that depend on the difficulty of factoring and reasonably expect your encrypted data to stay safe.
There are plenty of algorithms for performing integer factorization. The Msieve library implements most of them from scratch, and relies on optional external libraries for the rest of them. Trial division and Pollard Rho is used on all inputs; if the result is less than 25 digits in size, tiny custom routines do the factoring. For larger numbers, the code switches to the GMP-ECM library and runs the P-1, P+1 and ECM algorithms, expending a user-configurable amount of effort to do so. If these do not completely factor the input number, the library switches to the heavy artillery. Unless told otherwise, Msieve runs the self-initializing quadratic sieve algorithm, and if this doesn't factor the input number then you've found a library problem. If you know what you're doing, Msieve also contains a complete implementation of the number field sieve, that has helped complete some of the largest public factorization efforts known. Information specific to the quadratic sieve implementation is contained in Readme.qs, while the number field sieve variant is described in Readme.nfs
The maximum size of numbers that can be given to the library is hardwired at compile time. Currently the code can handle numbers up to 275 digits; however, you should bear in mind that I don't expect the library to be able to complete a factorization larger than about 120 digits by itself. The larger size inputs can only really be handled by the number field sieve, and only part of the NFS code (the final part) is efficient and robust enough to deal with problems that large.
Msieve was written with several goals in mind:
- To be as fast as possible. I claim (without proof) that for
completely factoring general inputs between 40 and 100 digits
in size, Msieve is faster than any other code implementing any
other algorithm. I realize that's a tall order, and that I'll
probably have to eat those words, but a *lot* of effort has gone
into making Msieve fast.
- To be as portable as possible. The code is written in C and is
completely self contained. It has its own basic multiple precision
library (which can be used in other applications) and is written
in as machine-independent a manner as possible. I've verified that
the source code compiles and runs correctly on 32- or 64-bit Intel
x86, 32- and 64-bit PowerPC, and 64-bit Alpha platforms. It's
reported to work in 32-bit mode on the RS6000. It works in Windows,
Linux (several flavors), Mac OS X, and AIX. Pretty much the only
requirement for building the code is that your compiler have a
native 64-bit data type.
- To be simple to use. The only input is the integer to be factored.
Everything else happens automatically.
- To be free (as in beer). The entire code base is released into the
public domain. This is hobby stuff for me, and the more it's used
the better.
If you choose to use Msieve, please let me know how it goes. I welcome bug reports, suggestions, optimizations, ports to new platforms, complaints, boasts, whatever.
Getting Msieve
The latest version of Msieve can be found on my web page, www.boo.net/~jasonp. A precompiled Windows binary using the latest source (optimized for the AMD athlon processor) is also available there.
The source distribution comes with a unix makefile you can use if you want to build msieve from source. If you have Microsoft Visual Studio 2007, Brian Gladman has kindly provided a set of build files that will generate Windows binaries.
Using Msieve
Just to be confusing, there are two things that I call 'Msieve' interchangeably. The source distribution builds a self-contained static library 'libmsieve.a', that actually performs factorizations, and also builds a 'msieve' demo application that uses the library. The library has a very lightweight inter- face defined in msieve.h, and can be used in other applications. While the demo application is (slightly) multithreaded, most the library is single- threaded and all of its state is passed in. The linear algebra code used in the quadratic- and number field sieve is multithread aware, and the entire library is supposed to be multithread-safe.
The demo application has only one job: to act as a delivery vehicle for integers to be factored. Numbers can come from a text file, from redirected input, from the command line directly, or can be manually typed in one at a time. Batch input is also supported, so that you can just point the application to a text file with a collection of numbers, one per line. By default, all output goes to a logfile and a summary goes to the screen. For the complete list of options, try 'msieve -h'.
Starting with v1.08, the inputs to msieve can be integer arithmetic expressions using any of the following operators:
-
- plus, minus (lowest priority) % integer remainder
- / multiply, integer divide ^ power ( ) grouping (highest priority)
Hence for example:
(10^53 - 1) / 9
gives the value:
11111111111111111111111111111111111111111111111111111
The integers used in an expression can be of any length but all intermediate results and the final result are restricted to 275 or less decimal digits.
While factoring an integer, the library can produce a very large amount of intermediate information. This information is stored in one or more auxiliary savefiles, and the savefiles can be used to restart an interrupted factorization. Note that factoring one integer and then another integer will overwrite the savefiles from the first integer.
The amount of memory that's needed will depend on the size of the number to be factored and the algorithm used. If running the quadratic sieve or the number field sieve, the memory requirements increase towards the end of a factorization, when all of the intermediate results are needed at the same time. For a 100-digit quadratic sieve factorization, most of the time Msieve needs 55-65MB of memory, with the last stage of the factorization needing 100-130MB. The final part of the number field sieve can use up incredible amounts of memory; for example, completing the factorization of a 512-bit number like an RSA key needs 2-3GB of memory.
Frequently Asked Questions
Q. I want to factor much bigger numbers. Can't Msieve solve problems larger than you say? Q. I'm trying to break my ex-girlfriend's RSA key with Msieve, and it's not working. What's wrong? A. The quadratic sieve really is not appropriate for factoring numbers over ~110 digits in size, and the number field sieve implementation isn't even close to done. On a fast modern CPU, a 110-digit factor- ization takes nearly 120 hours for Msieve, and the time increases steeply beyond that. If you have really big factorization needs, there are essentially only two packages that you can use: GGNFS and the NFS implementation by Chris Card. Both are hosted on SourceForge (see www.sf.net/projects/ggnfs and www.sf.net/projects/factor-by-gnfs). For the largest size problems, you have to use the number field sieve; in fact, you have to use GGNFS for the first part of the factorization and then msieve for the last part.
Q. Can you make Msieve network aware? Can you make it a client-server thingy? Can I use the internet to factor numbers? A. The demo application for the Msieve library is just that, a demo. I don't know anything about network programming and I'm not qualified to build a client-server application that's safe in the face of internet threats. If you have these kinds of smarts, you can use Msieve in your own code and I'll help as much as I can. The demo is good enough for people with a few machines on a small private LAN, and this is ~100% of the user community right now.
Q. How can I modify Msieve to work on a cluster? A. Distributed sieving is so easy that you don't need high-performance parallel programming techniques or message passing libraries to do it. If you're lucky enough to have a cluster then the batch scheduling system that comes with the cluster is more than enough to implement cluster-aware sieving. Of course if you have access to that much firepower you owe it to yourself to use an NFS package of some sort.
Q. Can you modify Msieve to run on multi-core processors? A. As described above, the really intensive part of the QS and NFS algorithms is the sieving, and it's a waste of effort to multithread that. You won't save any time compared to just running two copies of Msieve. The final stage can benefit from multithreading, and the intensive parts of that are already multithread-aware. This can be improved, but multithreading more parts of the library is a low priority for me.
Q. Why put Msieve into the public domain and not make it
