Table Of ContentG3: Genes|Genomes|Genetics Early Online, published on August 1, 2018 as doi:10.1534/g3.118.200340
Recursive algorithms for modeling genomic ancestral
origins in a fixed pedigree
ChaozhiZheng∗,1,MartinP.Boer∗andFredA.vanEeuwijk∗
∗Biometris,WageningenUniversityandResearch,Wageningen,TheNetherlands
ABSTRACT The study of gene flow in pedigrees is of strong interest for the development of quantitative KEYWORDS
trait loci (QTL) mapping methods in multiparental populations. We developed a Markovian framework for Junctiontheory
modelingancestraloriginsalongtwohomologouschromosomeswithinindividualsinfixedpedigrees. Ahighly Identical by
beneficial property of our method is that the size of state space depends linearly or quadratically on the descent
numberofpedigreefounders,whereasthisincreasesexponentiallywithpedigreesizeinalternativemethods. Ancestral infer-
TocalculatetheparametervaluesoftheMarkovprocess,wedescribetwonovelrecursivealgorithmsthat ence
differwithrespecttothepedigreefoundersbeingassumedtobeexchangeableornot. Ouralgorithmsapply QTL mapping
equally to autosomes and sex chromosomes, another desirable feature of our approach. We tested the resolution
accuracyofthealgorithmsbyamillionsimulationsonapedigree. Wedemonstratedtwoapplicationsofthe Collaborative
recursivealgorithmsinmultiparentalpopulations: designabreedingschemeformaximizingtheoveralldensity Cross(CC)
ofrecombinationbreakpointsandthustheQTLmappingresolution,andincorporatepedigreeinformation Recombinant
intohiddenMarkovmodelsinancestralinferencefromgenotypicdata;theconditionalprobabilitiesandthe InbredLine(RIL)
recombinationbreakpointdataresultingfromancestralinferencecanfacilitatefollow-upQTLmapping. The Multiparent
results show that the generality of the recursive algorithms can greatly increase the application range of AdvancedGener-
geneticanalysissuchasancestralinferenceinmultiparentalpopulations. ation InterCross
(MAGIC)
INTRODUCTION
ManycomplicatedexperimentalcrosseshaverecentlybeenproducedformappingQTL,particularlyinplants
(e.g.Koveretal.2009;Sannemannetal.2015). Incontrasttotraditionalbiparentalpopulations,multipleparents
have been used to increase genetic diversity and thus QTL segregation probability, and many generations of
intercrossmatinghavebeenusedtoincreaseaccumulatedrecombinationbreakpointsinsampledoffspringin
thefinalgenerationandthusQTLmappingresolution. Theexpecteddensityofrecombinationpointsisoneof
keyquantitiesforoptimizingexperimentaldesignspriortocollectinggenotypicandphenotypicdata(Rockman
andKruglyak2008). Furthermore,theidentificationofrecombinationbreakpointsfromgenotypicdataprovides
usefulinformationforincreasingdetectionpowerandmappingresolution(Xu2013;Lietal.2015). Theprimary
Copyright ©2018ChaozhiZheng etal.
Manuscriptcompiled:Wednesday1stAugust,2018%
1Biometris,WageningenUniversityandResearch,POBox16,6700AAWageningen,TheNetherlands.Email:[email protected]
VolumeX | August2018 | 1
© The Author(s) 2013. Published by the Genetics Society of America.
aimsofthispaperaretodevelopthetheoryofgeneflowinafixedpedigreewitharbitrarystructuretocalculate
thepriordistributionofrecombinationpoints,andtoapplythetheoryforancestralinferenceanddetectionof
recombinationpointsfromgenotypicdatainmultiparentalpopulations.
Thetheoryofone-locusgeneflowinapedigreehasbeenwelldeveloped. Thehaploidgenomesofpedigree
foundersaredesignatedbydistinctlabels,calledfoundergenomelabels(FGLs). Asetofgenesareidenticalby
descent (IBD) if they carry the same FGLs. The identity state for two genes is either IBD or non-IBD. The IBD
probabilityfortwodistinctgeneswithinanindividualistheinbreedingcoefficient,anditisthekinshipcoefficient
forthetwogenesrandomlysampledfromtwodistinctindividuals. Rostron(1978)developedarecursivealgorithm
forcalculatingthetwo-genecoefficientsofinbreedingandkinship. Karigl(1981)developedarecursivealgorithm
forcalculatingthecoefficientsofthefifteengeneidentitystatesforfourgenesbetweentwoindividuals. Thompson
(1983)showedthatasimilaralgorithmappliestotheprobabilitiesofjointdescentofmultiplegenesfromaspecific
FGL. Bauman et al. (2008) and Zhou et al. (2012) developed a recursive algorithm for calculating the ancestral
coefficientssuchastheprobabilityofonegenedescendingfromagivenFGLandtheprobabilityoftwogenes
descendingfromtwogivenFGLs(distinctornot).
The theory of gene flow at two linked loci in a pedigree has also been developed. Weir and Cockerham
(1969)andCockerhamandWeir(1973)developedarecursivealgorithmforcalculatingthetwo-locusinbreeding
coefficientofanindividualthatisdefinedasalinearfunctionoftheprobabilitiesofthefifteenidentitystatesfor
fourgenes,twoateachoftwolinkedloci. Thompson(1988)developedarecursivealgorithmforcalculatingthe
two-locuskinshipbetweentwoindividualsdefinedastheprobabilityofIBDatbothloci,seeHillandWeir(2007)
forcalculatingmulti-locusIBDprobabilitiesinrandommatingpopulations. Incontrasttotheidentitycoefficients,
theancestralcoefficientattwolociisdefinedasthetwo-locusdiplotypeprobabilities,andthereare4L possible
diplotypesforfourgenesinanindividualwithrespectto LdistinctFGLs. Thus,thecalculationofthetwo-locus
ancestral inferences has been developed only for simple breeding systems including selfing, brother-sister (or
sibling)mating,andparent-offspringmating(HaldaneandWaddington1931;JohannesandColome-Tatche2011;
Broman2012).
Therehasbeenmuchinterestindevelopingthetheoryofgeneflowinapedigreewithchromosomesassumed
tobeagenomiccontinuum. Donnelly(1983)developedaformalmathematicalframework,wherethecrossover
processesinachromosomepedigreefollowjointlyacontinuoustimeMarkovrandomwalkontheverticesofa
hypercube,thetimeparameterbeingpositionalonghomologouschromosomes. Hereavertexrepresentsoneof
possiblegenetransmissionsfromfounderstoallnon-founders,andthenumberofverticesinad-dimensional
2 | ChaozhiZheng etal.
hypercube is 2d, d being the number of non-founders. This framework has been used in many kinds of small
pedigreesystems(e.g.BickebollerandThompson1996;Stefanov2000;MartinandHospital2011). However,itis
impracticaltoapplyDonnelly’sMarkovianframeworktoalargecomplexpedigreesincethenumberofstates
(hypercubevertices)increasesexponentiallywiththepedigreesize.
Alternatively, the theory of junctions has been developed to study the gene flow for a genomic continuum
(Fisher1949,1954). Ajunctionisdefinedasaboundarypoint(recombinationbreakpoint)onachromosomewhere
twodistinctFGLsmeet. Fisher(1949,1954)andBennett(1953,1954)developedthetheoryofjunctionsforsimple
breedingsystemsincludingselfing,brother-sistermating,andparent-offspringmating. Ithasbeenextendedto
populations(Stam1980;Bairdetal.2003;ChapmanandThompson2003;MacLeodetal.2005;Zhengetal.2014;
Zheng2015). Inparticular,Zhengetal.(2014)andZheng(2015)developedaMarkovianframeworkformodeling
ancestraloriginswithinanindividualwiththestatespacebeingthepossiblepairsofFGLs,andthusthenumber
ofstatesismuchsmallerthanthatofDonnelly’sMarkovianframework(Donnelly1983). Ontheotherhand,our
Markovianframework(Zhengetal.2014;Zheng2015)isrestrictedbytwoassumptions: thematingschemefrom
onegenerationtothenextisrandommating,andtheFGLsareassumedtobeexchangeablesothattheprobability
distributionofancestraloriginsinoffspringisinvarianttoallpossiblepermutationsoftheFGLs. However,the
exchangeabilitygenerallydoesnotholdforamappingpopulationproducedviaanarbitrarybreedingpedigree.
Inthispaper,werelaxtherestrictionofourpreviousMarkovianframeworkbyextendingittoafixedpedigree.
Wefirstdescribearecursivealgorithm(denotedbyEXCH)undertheassumptionofFGLsbeingexchangeable,
whichisapplicablefortheconstructionofMarkovianframeworkinsimplebreedingschemes. Thenwedevelop
a recursive algorithm (denoted by NON-EXCH) for modeling ancestral origins to relax the exchangeability
assumption. Thetworecursivealgorithmsapplytobothautosomesandsexchromosomesiftheyexist. Theresults
ofbothrecursivealgorithmsarecomparedwiththosefromextensivesimulationsonaclassicalpedigree. Wefirst
applythetwoalgorithmstocomparedifferentpopulationdesigns(orbreedingpedigrees)priortoexperiments,
forexample,intermsoftheoveralldensityofjunctions(recombinationbreakpoints),animportantfactorofQTL
mappingresolution. Inaddition,thenon-exchangeabilitycanbeillustratedundervariousbreedingdesigns. Then
weapplythetwoalgorithmstoincorporatepedigreeinformationforancestralinferencefromgenotypicdatain
simulatedandrealcollaborativecross(CC)populations,resultinginconditionalprobabilitiesthatarenecessary
fordownstreamQTLmapping.
VolumeX August2018 | Modelingancestraloriginsinpedigrees | 3
RECURSIVE ALGORITHM EXCH
Notationsandoverview
ThesymbolsarebrieflyexplainedinTable1,andsomeofthemareillustratedinFigure1. Pedigreememberswith
unspecifiedmotherandfatherarecalledthefoundersofthepedigree,andtheothermembersarenon-founders.
Therecursivealgorithmpresupposesthatpedigreemembersareorderedundertheconstraintthatparentsalways
precede children. We denote by subscripts a, b, c the members of a pedigree, and denote by a > b individual
a comes after individual b ((cid:54)= a). We denote by superscripts m the maternally derived genes or chromosomes,
and pforthepaternallyderived. Wedenotebysuperscriptso,o ,o ,ando theunspecifiedparentalorigins(m
1 2 3
or p)ofgenesorchromosomes. Forthesakeofbrevity, f(a) = {AA,XX,XY} denotesahorizontalpiece-wise
equation,sothat f(a) = AA,XX,andXYfortheautosomesofindividuala,XXchromosomesoffemalea,andXY
chromosomesofmale a,respectively.
Inthederivationsofrecurrencerelationsofaquantity,wewillfocusonaparticularorderinga > bifthequantity
concerns genes in two individuals; we will always trace the maternally derived gene or chromosome within
non-founder a back to its two parental genes or chromosomes if the maternally derived gene or chromosome
concernsthequantity,andotherwisetracethepaternallyderivedgeneorchromosome. Throughoutthispaper,
♂
and alwaysdenotethefatherandmotherofindividual a,ratherthananyother,respectively.
♀
Theidentitycoefficientφo1o2(12)denotesthattwogenesatasinglelocushaveidentitystate(12),wherethefirst
ab
geneisinindividual aandhasparentalorigino ,andthesecondgeneisinindividualbandhasparentalorigino
1 2
(Figure1A).Thereareonlytwotwo-geneidentitystates: IBD(11)ornon-IBD(12),andφo1o2(11) = 1−φo1o2(12).
ab ab
Similarly,wedenotebyφo1o2o3(123)thethree-geneidentitycoefficientwiththeidentitystatebeingthenon-IBD
abc
state(123),whichisrequiredtogetherwithtwo-geneidentitycoefficientsforderivingtherecursiverelationsfor
junctiondensities.
Anexpectedidentityjunctiondensityisdefinedastheexpectednumberofthespecifiedtypeofrecombination
breakpoints per Morgan along two homologous chromosomes. Here expectation concerns the stochasticity of
geneflowfromfounderstodescendantsonafixedpedigree. Andidentityindicatesthatonlytheidentitypatterns
(notspecificFGLs)onthetwosidesofjunctionsmatter. IntherecursivealgorithmNON-EXCH,wewillintroduce
ancestral junction densities where the specific FGLs do matter. For example, Jo1o2(1232) denotes that the four
ab
genes,twoateachsideofabreakpointalongtwochromosomes,havegeneticidentitystate(1232)(Figure1B).
Herethefirstandthirdgenesareontheleftandrightsidesofabreakpoint,respectively,andtheyareinindividual
aandhaveparentalorigino . Andthesecondandfourthgenesareontheleftandrightsides,respectively,and
1
4 | ChaozhiZheng etal.
theyareinindividualbandhaveparentalorigino . WeadoptthenotationofNadotandVayssiex(1973)foran
2
identitystate: thegenesarelabeledbynaturalintegersstarting1,andthesameintegerisassignedtothegenethat
isIBDwithapreviousgene.
Following the previous framework (Zheng et al. 2014; Zheng 2015), we model ancestral origins along two
homologouschromosomeswithinanindividualbyacontinuoustimeMarkovprocess,whichcanbedescribed
by an initial distribution π of ancestral origins and a rate matrix Q. Under the assumption of exchangeability
amongFGLs,theinitialdistributionπ ofindividual aisdeterminedbytheidentitycoefficientφmp(11),andthe
aa
ratematrixQofindividual aisdeterminedbythefiveexpectedidentityjunctiondensities Jmp(1122), Jmp(1211),
aa aa
Jmp(1213), Jmp(1222),and Jmp(1232);seeZheng(2015)foranillustrativeconstructionofratematrixQfromthe
aa aa aa
fivejunctiondensities.
Inthefollowing,wedescribearecursivealgorithmforcalculatingtheidentitycoefficientsandtheexpected
junctiondensitiesforanindividualinagivenpedigree. Inshort,let1 beanindicatorfunctionthatequals1if
S
statementSistrueand0otherwise,andlet0S2 = 1−1 1 beanindicatorfunctionthatequals0ifstatementsS1
S1 S1 S2
andS2aretrueand1otherwise.
Identitycoefficients
Therecurrencerelationsforthetwo-andthree-geneidentitycoefficientsaresimilartotherecursivealgorithm
of Karigl (1981). However, the calculation proceeds by tracing back genes instead of individuals, see also e.g.
Jacquard(1974),NadotandVayssiex(1973),Denniston(1974),andGarcia-Cortes(2015). Thus,wecanaccountfor
theasymmetrybetweenmaternallyandpaternallyderivedchromosomes,andaccountforautosomesandsex
chromosomessimultaneously. Throughoutthispaper,weassumethattherearenocrossoversbetweenXandY
sexchromosomeswithinamale. Inaddition,wefocusonnon-IBDstate(12)insteadofIBDstate(11)tosimplify
therecurrencerelations.
WederivetherecurrencerelationsaccordingtotheMendelianinheritancerules: (1)amaternally(orpaternally)
derived autosomal gene descends from the two genes within the mother (or father, respectively) with equal
probability1/2,(2)similarto(1)foramaternallyderivedX-linkedgene,and(3)apaternallyderivedsex-linked
genewithinafemale(ormale)mustdescendfromtheX-linked(orY-linked,respectively)geneofthefather. The
recurrencerelationsofthetwo-geneidentitycoefficientφo1o2(12)fornon-founder aaregivenby
ab
1 (cid:104) (cid:105)
φmo(12) = φmo(12)+φpo(12) 0m=o,a ≥ b
ab 2 b b a=b
φpo(12) =(cid:26)1 (cid:104)♀φmo(12)+♀φpo (12)(cid:105),φmo(12),φpo (12)(cid:27)0p=o,a ≥ b,
ab 2 b b b b a=b
♂ ♂ ♂ ♂
VolumeX August2018 | Modelingancestraloriginsinpedigrees | 5
φo1o2(12) =φo2o1(12),a < b,
ab ba
for o,o ,o ∈ {m,p}. Here the first equation follows from the rules 1 and 2 for the maternally derived gene
1 2
withinindividual a,thesecondequationfollowsfromtherules1and3forthepaternallyderivedgenewithin
individual a,andthelastequationisobtainedbyreversingtheorderingoftwogenes. Thesymmetriessuchas
φo1o2o3(123) = φo1o3o2(123) = ... = φo3o2o1(123) apply to three-gene identity coefficients by permuting the three
abc acb cba
genes,andthusweconsideronlyaparticularordering a ≥ b ≥ c,andwehave
1 (cid:104) (cid:105)
φmo1o2(123) = φmo1o2(123)+φpo1o2(123) 0m=o10m=o20o1=o2,
abc 2 bc bc a=b a=c b=c
(cid:26) (cid:27)
φapboc1o2(123) = 12 (cid:104)♀φmob1co2(123)+♀φpo1boc2(123)(cid:105),φmob1co2(123),φpo1boc2(123) 0ap==bo10ap==co20bo1==co2,
♂ ♂ ♂ ♂
for o ,o ∈ {m,p},wherethefirst(second)equationtracesthematernally(orpaternally,respectively)derived
1 2
genewithinindividual a.
TheboundaryconditionsoftheserecurrencerelationsaregivenaccordingtotheassignmentofFGLstothe
foundersinapedigree. IfweassigndistinctFGLstoeachhaploidgenomeofanoutbredfounder,allmulti-gene
non-IBDprobabilitiesare1ifanypairofgenesisnotfromthesamehaploidgenome,and0otherwise. Four-gene
or higher order identity coefficients may be derived similarly, but they are not required for the derivations of
junctiondensities.
Expectedidentityjunctiondensities
Let Rm (Rp)bethematernal(paternal)mapexpansion,theexpecteddensityofrecombinationbreakpointsonthe
a a
maternally(paternally)derivedchromosomeofindividual a. Theimplicittwo-geneidentitystateforthemap
expansionsis(12). Itholdsthat
Rm =Jmp(1122)+ Jmp(1121)+ Jmp(1222)+ Jmp(1232),
a aa aa aa aa
Rp =Jmp(1122)+ Jmp(1112)+ Jmp(1211)+ Jmp(1213),
a aa aa aa aa
where Jmp(1121) = Jmp(1222)and Jmp(1112) = Jmp(1211)accordingtothereversibilityofchromosomedirection
aa aa aa aa
(Zheng 2015). The expected overall junction density for individual a is given by ρmp = Rm +Rp − Jmp(1122).
aa a a aa
In the following, we derive the recurrence relations for the five expected junction densities Rm, Rp, Jmp(1122),
a a aa
Jmp(1213),and Jmp(1232),fromwhich Jmp(1211)and Jmp(1222)canbederivedaccordingtotheaboveequations
aa aa aa aa
6 | ChaozhiZheng etal.
ofmapexpansions. Therecurrencerelationsforthetwomapexpansionsofnon-founder aarerelativelysimple,
andtheyaregivenby(Zheng2015)
1 (cid:16) (cid:17)
Rm = Rm+Rp +φmp(12),
a 2
(cid:26)1 (cid:16)♀ ♀ (cid:17) ♀♀ (cid:27)
Rp = Rm +Rp +φmp (12),Rm,Rp ,
a
2
♂ ♂ ♂♂ ♂ ♂
accordingtothetheoryofjunctionsthatanewidentityjunctionisformedwheneverarecombinationeventoccurs
betweentwohomologouschromosomesthatarenon-IBDatthelocationofacrossover(Fisher1954).
Since chromosomes are modeled as a genomic continuum, the shared junction type (1122) between two
chromosomescanbeformedonlybyduplicatingthechromosomesegmentharboringtherecombinationbreakpoint.
Fortheexpectedjunctiondensity Jo1o2(1122)ofnon-founder a,wehave
aa
Joo(1122) =Ro,
aa a
1 (cid:104) (cid:105)
Jmp(1122) =Jpm(1122) = Jpm(1122)+ Jpp(1122) ,
aa aa 2 a a
1 (cid:104) ♀ (cid:105) ♀
Jmo(1122) = Jmo(1122)+ Jpo(1122) ,a > b,
ab 2 b b
(cid:26)1♀(cid:104) ♀ (cid:105) (cid:27)
Jpo(1122) = Jmo(1122)+ Jpo (1122) ,Jmo(1122),Jpo (1122) ,a > b,
ab 2 b b b b
♂ ♂ ♂ ♂
Jo1o2(1122) =Jo2o1(1122),a < b,
ab ba
foro,o ,o ∈ {m,p},whereandthelastequationisobtainedbyreversingtheorderingofthetwohaplotypes. In
1 2
thefirstequation, Joo(1122)referstothesamechromosomeinindividual a,andthusequals Ro bydefinition. We
aa a
tracethematernallyderivedhaplotypewithinnon-founder abacktoitstwoparentalhaplotypesifthematernally
derivedhaplotypeconcernsthejunctiondensity,andotherwisetrackthepaternallyderivedhaplotype.
We derive the recurrence relations for Jo1o2(1213) and Jo1o2(1232) jointly. The recombination breakpoint for
ab ab
(1232)occursonthefirstchromosomeofahomologouspair,anditisonthesecondchromosomefor(1213). The
junctiontype(1213)becomes(1232)whenreversingtheorderingoftwohaplotypeofjunctiontype(1213). We
have
1 (cid:104) (cid:105)
Jmo(1232) = Jmo(1232)+ Jpo(1232) 0m=o+φmpo(123)0m=o,a ≥ b,
ab 2 b b a=b b a=b
Jpo(1232) =(cid:26)1♀(cid:104)Jmo(1232)+♀Jpo (1232)(cid:105)+φmp♀o♀(123),Jmo(1232),Jpo (1232)(cid:27)0p=o,a ≥ b,
ab 2 b b b b b a=b
♂ ♂ ♂♂ ♂ ♂
Jo1o2(1232) =Jo2o1(1213),a < b,
ab ba
VolumeX August2018 | Modelingancestraloriginsinpedigrees | 7
1 (cid:104) (cid:105)
Jmo(1213) = Jmo(1213)+ Jpo(1213) 0m=o,a ≥ b,
ab 2 b b a=b
Jpo(1213) =(cid:26)1♀(cid:104)Jmo(1213)+♀Jpo (1213)(cid:105),Jmo(1213),Jpo (1213)(cid:27)0p=o,a ≥ b,
ab 2 b b b b a=b
♂ ♂ ♂ ♂
Jo1o2(1213) =Jo2o1(1232),a < b,
ab ba
foro,o ,o ∈ {m,p}. Therearenocontributionsfromthethree-genenon-IBDprobabilitiesintheequationsfor
1 2
Jmo(1213)and Jpo(1213),becausecrossoversbetweenthetwoparentalhaplotypesofthetracinghaplotypewithin
ab ab
individual aareinvisible.
TheboundaryconditionsaregivenbytheassignmentofFGLstothefounders. Thewithin-andbetween-founder
identityjunctiondensitiesare0ifasingleFGLisassignedtothewholehaploidgenomeofeachfounder.
RECURSIVE ALGORITHM NON-EXCH
Notationsandoverview
Ω
Let denote the set of distinct FGLs assigned to the founders of a pedigree. We adopt the same notations in
therecursivealgorithmwithFGLexchangeabilityexceptforthefollowingchanges. AsshowninFigure1,we
replaceidentitystatesbyancestralstates: (1122) → (iijj),(1211) → (ijii),(1213) → (ijik),(1222) → (ijjj),and
(1232) → (ijkj)forthetwo-chromosomeexpectedjunctiondensities,andRm → Rm(ij)andRp → Rp(ij)explicitly
a a a a
forthemapexpansions,wherei,j,k ∈ Ωaredifferentfromeachother(denotedbyi (cid:54)= j (cid:54)= k).
We represent a two-gene ancestral state by (ij), and a three-gene ancestral state by (ijk), where i,j,k ∈ Ω
are not necessary different from each other. In addition we introduce one-gene ancestral coefficient φo(i), the
a
probability that the maternally (o = m) or paternally (o = p) derived gene in individual a carries FGL i ∈ Ω.
The identity coefficients can be obtained by summing the corresponding ancestral coefficients. For example,
φaob1o2(11) = ∑i∈Ωφaob1o2(ii). Similar relationships hold between the expected identity junction densities and the
expectedancestraljunctiondensities,forexample, Jaob1o2(1122) = ∑i,j∈Ω,i(cid:54)=j Jaob1o2(iijj)and Roa = ∑i,j∈Ω,i(cid:54)=jRoa(ij).
Similarly, the initial distribution π of the Markov chain can be constructed from the two-gene ancestral
coefficients, and the rate matrix Q can be constructed from the expected ancestral junction densities, without
assumingtheexchangeabilityofFGLs.
Ancestralcoefficients
Baumanetal.(2008)andZhouetal.(2012)derivedtherecurrencerelationsoftheone-andtwo-geneancestral
coefficientsforautosomesbytracingtwodistinctgenessimultaneouslyintheirparents. Wederiveequivalent
8 | ChaozhiZheng etal.
recurrence relations for both autosomes and sex chromosomes by tracing one gene once in its parent, instead
ofsimultaneouslytracingtwogeneswithinanindividual. Inaddition,wederivetherecurrentrelationsofthe
three-geneancestralcoefficientsthatarerequiredintherecurrencerelationsoftheexpectedancestraljunction
densities.
The relations of the ancestral coefficients are very similar to those of the identity coefficients, and they are
based on the same rules of Mendelian inheritance. For example for the one-gene ancestral coefficient φo(i) of
a
non-founder a,wehave
1 (cid:104) (cid:105)
φm(i) = φm(i)+φp(i) ,
a 2
(cid:26)1 (cid:104)♀ ♀ (cid:105) (cid:27)
φp(i) = φm(i)+φp (i) ,φm(i),φp (i) .
a
2
♂ ♂ ♂ ♂
SeeAppendixAfortherecurrencerelationsofthetwo-geneancestralcoefficientφo1o2(ij)andthree-geneancestral
abc
coefficientφo1o2o3(ijk).
abc
TheboundaryconditionsoftherecurrencerelationsoftheancestralcoefficientsaregivenbytheFGLsassigned
tothefoundersinapedigree.
Expectedancestraljunctiondensities
Theancestralmapexpansionscanbeexpressedintermsoftheexpectedancestraljunctiondensitiesasfollows
Rm(ij) =Jmp(iijj)+ Jmp(iiji)+ Jmp(ijjj)+ ∑ Jmp(ikjk),
a aa aa aa aa
k∈Ω,i(cid:54)=j(cid:54)=k
Rp(ij) =Jmp(iijj)+ Jmp(iiij)+ Jmp(jijj)+ ∑ Jmp(kikj).
a aa aa aa aa
k∈Ω,i(cid:54)=j(cid:54)=k
Here Ro(ij),o ∈ {m,p} refers to the junctions on the single chromosome of a with parental origin o, while
a
the terms on the right-hand side refer to the junctions on the two homologous chromosomes of a. We have
Jmp(iiji) = Jmp(jiii) (cid:54)= Jmp(ijjj) and Jmp(iiij) = Jmp(ijii) (cid:54)= Jmp(jijj), where the equal signs are based on the
aa aa aa aa aa aa
reversibilityofchromosomedirection,andtheunequalsignsrefertothenon-exchangeabilityofFGLsiand j.
Therecurrencerelationsfortheancestralmapexpansionsaregivenby
1 (cid:104) (cid:105) 1 (cid:104) (cid:105)
Rm(ij) = Rm(ij)+Rp(ij) + φmp(ij)+φpm(ij) 1 ,
a 2 2 i(cid:54)=j
(cid:26)1 (cid:104)♀ ♀ (cid:105) 1♀(cid:104)♀ ♀♀ (cid:105) (cid:27)
Rp(ij) = Rm (ij)+Rp (ij) + φmp (ij)+φpm (ij) 1 ,Rm (ij),Rp (ij) ,
a i(cid:54)=j
2 2
♂ ♂ ♂♂ ♂♂ ♂ ♂
VolumeX August2018 | Modelingancestraloriginsinpedigrees | 9
where the contributions of the two-gene ancestral coefficients account for the asymmetry of FGLs i and j, for
example,φmp(ij) = φpm(ji) (cid:54)= φpm(ij). Anewancestraljunctionisformedwheneverarecombinationeventoccurs
♀♀ ♀♀ ♀♀
betweentwohomologouschromosomesthathavetheunorderedFGLsiand jatthelocationofacrossover.
Therecurrencerelationsfor Jo1o2(iijj)arethesameasthosefor Jo1o2(1122),exceptthatidentitystatesarereplaced
ab ab
byancestralstates. Wehave
Joo(iijj) =Ro(ij),
aa a
1 (cid:104) (cid:105)
Jmp(iijj) =Jpm(iijj) = Jpm(iijj)+ Jpp(iijj) ,
aa aa 2 a a
1 (cid:104) ♀ (cid:105) ♀
Jmo(iijj) = Jmo(iijj)+ Jpo(iijj) ,a > b,
ab 2 b b
(cid:26)1♀(cid:104) ♀ (cid:105) (cid:27)
Jpo(iijj) = Jmo(iijj)+ Jpo (iijj) ,Jmo(iijj),Jpo (iijj) ,a > b,
ab 2 b b b b
♂ ♂ ♂ ♂
Jo1o2(iijj) =Jo2o1(jjii),a < b,
ab ba
for o,o ,o ∈ {m,p}, where the last equation is obtained by reversing the ordering of the two haplotypes. See
1 2
AppendixBfortherecurrencerelationsof Jo1o2(ijkj), Jo1o2(ijik) Jo1o2(ijjj),and Jo1o2(ijii).
ab ab ab ab
Asbefore,thewithin-andbetween-founderancestraljunctiondensitiesare0ifasingleFGLisassignedtothe
wholehaploidgenomeofeachfounder.
Dataavailability
AncestralinferenceusingthetworecursivealgorithmsisimplementedinthepreviousdevelopedRABBITsoftware,
wherethefunctionmagicReconstructisextendedtoincorporatepedigreeinformation. RABBITisavailableat
https://github.com/chaozhi/RABBIT.git, and it is offered under the GNU Affero general public license, version 3
(AGPL-3.0). Examplescriptsforusingtherecursivealgorithms,simulatinggenotypicdata,andancestralinference
areincluded. TherealCCdatahavebeendescribedbyDurrantetal.(2011).
APPLICATION TO MULTIPARENTAL POPULATIONS
Simulationevaluation
BeforeapplyingthetworecursivealgorithmsEXCHandNON-EXCHtomultiparentalpopulations,weevaluate
theiraccuracybyforwardsimulationsontheclassicalpedigreeofNativeAmericans(Figure3)(Chapmanand
Jacquard1971). Thepedigreehasbeenpreviouslyusedforevaluatingrecursivealgorithms(Jacquard1974;Karigl
1981;Garcia-Cortes2015). Wesimulatetwolinkagegroups: onepairofhomologousautosomesandonepairof
10 | ChaozhiZheng etal.
Description:aims of this paper are to develop the theory of gene flow in a fixed pedigree probability for two distinct genes within an individual is the inbreeding . In the derivations of recurrence relations of a quantity, we will focus on a .. Similarly, the initial distribution 7 of the Markov chain can be