![]() |
|
#1
|
||||
|
||||
determinant algorithmsI'm running an algorithm that involves taking the determinants of 4 x 4 matrices with polynomial entries. The polynomials are in three variables and their inverses, over the finite field F2. I need to make it as fast as possible. The problem is that polynomials form a ring, not a field; In other words there is no division. Thus I can't use the standard algorithm of row reducing first and then taking the product of the diagonal. So far, the method of minors (breaking the matrix down into submatrices and taking their determinants) is the fastest thing I have tried. I'm wondering if anyone knows of a faster algorithm.
I've considered implementing the field of rational functions (ratios of polynomials), but then I would have to constantly get them in lowest terms to avoid winding up with huge denominators. I'm also aware that row reducing without division is possible, but much slower. Does anyone have any suggestions? I have been poking around on the internet for awhile, and I haven't found anything useful. |
||||
|
#2
|
||||
|
||||
Re: determinant algorithmsInteresting problem. Are you implementing this in C or C++? (That is, procedural or object-oriented?)
If the matrix size is fixed (4x4) and you do not need this solution to be extensible, why not hard-code the determinant formula into your algorithm? I.e., in pseudocode Code:
Matthew |
|
#3
|
||||
|
||||
Re: determinant algorithmsThat brings up another interesting issue that I forgot to mention. I multiplied out the determinant, and then exploited the relations between the polynomials (they're not just any polynomials) to cut the number of multiplications down to 23 by factoring and conjugating things. Then I wrote an algorithm to test the new method that counts the number of calls to all the relevant functions. The result was that everything (copy constructor calls, addition/subtraction calls, multiplication calls, conjugation calls, etc.) was significantly less. However, the number of iterations of the inner loop on the multiplication operator was much larger for all but the smallest polynomials. As a result, that method, even with reduced number of multiplications, was slower. That is another mystery that I have yet to solve.
I'm working in C++. I wrote a polynomial class with overloaded +, -, and * operators. It uses an stl map with the list of exponents as a key (sorted lexicographically) and the coefficient as a value. Thus inserting runs in logarithmic time. Thanks for the help. |
|
#4
|
||||
|
||||
Re: determinant algorithmsYou mean the inner loop of the polynomial multiplication operator, right? Because the method I suggested above requires no loop (other than that).
So in any case, by hard-coding the expansion you should still get some (small) speedup over actually implementing the method of minors, independent of whether you try to optimize the poly multiplications or not. The slowdown you mention in the poly multiplication is interesting... but rather hard to grasp without some more detail. Can you post code? or is your work restricted/proprietary? Matthew |
|
#5
|
||||
|
||||
Re: determinant algorithmsYes, I mean the inner loop of the * operator.
I have done what you suggested and hard coded the determinant. That was actually slower than the method of minors. I'm not entirely sure why. The new method I mentioned involved first multiplying out the determinant, and then factoring it to reduce the number of multiplications. Here's how it works: All 16 entries in the matrix are generated from 4 polynomials A, B, C, and D in variables a, b, c and their inverses. The formula for the determinant is: AAxAyA + BBxByBz + CCxCyCz + DDxDyDz - AxAyBBzc^-1 - AAzBxByc -AAxCyCza - AyAzCCxa^-1 - AAyDxDzb - AxAzDDyb^-1 - BByCxCzb - BxBzCCyb^-1 - BBxDyDza^-1 -ByBzDDxa - CxCyDDzc - CCzDxDyc^-1 + AxBCzDyc^-1 + AyBzCDxc^-1 + ABxCyDzc + AzByCxDc + AByCzDxab + AzBxCDya^-1b^-1 + AxBzCyDab^-1 + AyBCxDza^-1b For a polynomial P in a,b,c,a^-1,b^-1,c^-1, Px means invert the exponents on b and c but leave a, Py means invert the exponents on a and c but leave b, and Pz means invert the exponents on a and b but leave c. It is easy to varify that (Px)y = (Py)x = Pz, (Pz)x = (Px)z = Py, P(y)z = (Pz)y = Px, and (Px)x = (Py)y = (Pz)z = P Using this I can write the determinant as the sum of conjugates of 12 polynomials P0 - P11, defined as follows: AAxAyAz = P0 BBxByBz = P1 CCxCyCz = P2 DDxDyDz = P3 -AxAyBBzc-1 = P4x -AAzBxByc = P4 -AAxCyCza = P5y -AyAzCCxa-1 = P5 -AAyDxDzb = P6 -AxAzDDyb-1 = P6x -BByCxCzb = P7x -BxBzCCyb-1 = P7 -BBxDyDza-1 = P8 -ByBzDDxa = P8y -CxCyDDzc = P9x -CCzDxDyc-1 = P9 AxBCzDyc-1 = P10z AyBzCDxc-1 = P10 ABxCyDzc = P10y AzByCxDc = P10x AByCzDxab = P11z AzBxCDya-1b-1 = P11 AxBzCyDab-1 = P11y AyBCxDza-1b = P11x Thus the determinant is P0 + P1 + P2 + P3 + P4 + P4x + P5 + P5y + P6 + P6x + P7 + P7x + P8 + P8y + P9 + P9x + P10 + P10x + P10y + P10z + P11 + P11x + P11y + P11z Now, there is also redundancy in the formulas for the Pi's that I can exploit. Define the following 8 polynomials: T0 = AAx T1 = BBx T2 = CCx T3 = DDx T4 = AzBx T5 = CDy T6 = AyDx T7 = BzC Then I can write P0 through P11 as P0 = T0T0y P1 = T1T1y P2 = T2T2y P3 = T3T3y P4 = -T4T4zc P5 = -T0yT2a-1 P6 = -T6T6yb P7 = -T7T7yb-1 P8 = -T1T3ya-1 P9 = -T5T5zc-1 P10 = T6T7c-1 P11 = T4T5a-1b-1 Note that each Ti is used exactly twice. Finally, there is still more redundancy in the addition. Let Q0 = P10 + P11 Q1 = P4 + P6 + P7 + P9 + Q0 Q2 = P5 + P8 Then the determinant becomes P0 + P1 + P2 + P3 + Q1 + Q2 + Q1x + (Q0 + Q2)y + Q0z. That is how the "fast" method works. I could post the code if you want to see it, but I think that explanation will be more helpfull. |
|
#6
|
||||
|
||||
Re: determinant algorithmsThat does seem odd. Although, I assume that you are compiling with all speed optimizations on, in which case some of the redundancy that you manually eliminated should be likewise automatically eliminated by the compiler.
For example, if I write CPP / C++ / C Code:
CPP / C++ / C Code:
As for your other mystery, I can't say that I have any insight. If you can post code, I'd be interested in browsing through it. Mostly for my own curiosity. Matthew |
|
#7
|
||||
|
||||
Re: determinant algorithmsThe kind of optomization I'm doing is (for example) if I have
CPP / C++ / C Code:
then I calculate CPP / C++ / C Code:
and then use CPP / C++ / C Code:
Notice that the number of * operators is down from 9 to 5. Do you know if the compiler will do that automatically? Here's the code. The polynomial class: CPP / C++ / C Code:
and the determinant function: CPP / C++ / C Code:
|
|
#8
|
||||
|
||||
Re: determinant algorithmsHi, blake. Sorry for the delay. I think I can help you boost the speed of your code. Using g++ v4, I observed a 1.7 times speedup with a relatively few changes. I also have a reworked version of the code that runs about a little more than 2 times faster on my setup, but it includes some bigger changes.
My tests used the following polynomials: A = 2x + 1 B = 3y + 4z C = 5xy + 6yz D = 7xz + 8xyz There are three basic things I did to speed up the process. First, I eliminated the construction of temporary pair object wherever possible. For example, instead of CPP / C++ / C Code:
CPP / C++ / C Code:
Another important change is the use of operator+= instead of operator+, or operator*= instead of operator*. For example, instead of CPP / C++ / C Code:
CPP / C++ / C Code:
The third change involves using C++ references wherever possible in the determinant function. That is, instead of CPP / C++ / C Code:
CPP / C++ / C Code:
I'll post you full code with my changes at the end. For the >2 times speedup mentioned above, I change methods like invert() to return void and operator directly on the calling object, and created getInverse() which implements the existing behavior of making a temporary object. I also replaced std::map with std::set and made the expList class become a Term class which includes the coefficient in it. Here are some diagnostic outputs that illustrate where the speedup was achieved. Original: Code:
Code:
Code:
Here is the changed code: CPP / C++ / C Code:
CPP / C++ / C Code:
A final note: I removed the numTerms variable in Polynomial since the terms map already has a size() method. I also removed the PolynomialIterator class and replaced it with a simpler typedef. There are many similar style changes that can be done to make the code more readable and maintainable. Also, some of your for loops can probably be replaced by STL algorithms. The is already an algorithm for lexicographical comparison Cheers, Matthew P.S. Forgot to mention that I also made user of std::vector in your expList class. No sense in managing memory explicitly if you don't have to, I say. |
|
#9
|
||||
|
||||
Re: determinant algorithmsWow, that's an amazing speed up! Thank you VERY much for the help. It looks to me like it's time for another major overhaul of the polynomial class. This time I'll write a monomial class that stores the exponent, and use an stl set instead of a map as you suggested. I'll also define the * operator on monomials to replace the multByVars function, which assumes the exponent is one. I should probably add a special case in the * operator for polynomials that have one term as well, since that won't change the order (thus it will run in linear time instead of nlog(n)).
I think I'm also going to switch from lex to deg-lex ordering on the monomials. In deg-lex, for terms A and B, we say A < B if the total degree of A is less than the total degree of B. If A and B have the same total degree, then we do a lexicographic compare on the exponents. If I have a data member that keeps track of the total degree, I think that will be a little faster. Again, thank you so much for the help! |
|
#10
|
||||
|
||||
Re: determinant algorithmsMy pleasure.
The deg-lex ordering does sound like it could be faster; you'll get the biggest gain with polynomials in many variables. Also, you will certainly find out by testing, but I suspect a special case for multiplication by a single term will not give much speedup. True that std::map insertion is at most log(n) complexity, but you can ensure that it is merely constant time if the insertion order is the same as the sort order, by using insert(pos, x) where pos is simply end(). I suppose handling it as a special case would save n-1 logical comparisons. Probably worth it if you absolutely have to make this as fast as possible. A profiler could help pinpoint where the bulk of the processing is done (though you may already know that much). That'll tell you where to spend your efforts trying to improve efficiency. Another thing you might consider is using std::deque instead of std::set. You lose the automatic ordering and check for duplicates that set gives you, but by manually sorting and checking for duplicates only at those times when you know it needs to be done (e.g., it doesn't need to be done when multiplying by a monomial!) you might be able to save some cycles. Do you have a specific goal in mind for the execution time? When will you stop trying to optimize? Matthew |
Recent GIDBlog
Problems with the Navy (Enlisted) by crystalattice
| Thread Tools | Search this Thread |
| Rate This Thread | |
|
|
Network Sites: GIDNetwork · GIDWebHosts · GIDSearch · Learning Journal by J de Silva, The