Browsing the database
Genealogy of the French in North America
DNA and Genealogy
computed (by male descendance only) :
Haplogroup from SNP : E-L793>L117
DYS393=13; DYS390=25; DYS19=13;
The haplogroup can also estimated from other tests (12 to
111 Y-STR markers). This kind of prediction is based on
statistics. In both cases, if Y-ST markers are known, they
are displayed after the haplogroup.
The nodal notation is used to be sure 2 different
mutations are compatible. For instances, the mutation
E-L117 is E1b1b1 in nodal notation while E-L793
corresponds to E1b1b1b2a1d. You can see they are matching
: E1b1b1b2a1d, the first being contained by the
other. This nodal notation is not stable, however. Indeed,
the hierarchy of mutations is revised from time to time
and this nodal mutation may change. It is thus best to use
the other format, i.e. E-L793 in this example.
If the nodal notation is short (like R1b or J2), this usually means the haplogroup was estimated from the STR while a larger tag indicates often that the SNP were searched.
computed (by male descendance only) :
Haplogroup from Y-STR : R-M269
DYS393=13; DYS390=24; DYS19=14; DYS391=10;
To be sure no adoption occured, a signature must be triangulated, that is it must be computed from 2 descendants in male line (or female line for mtDNA). These lines are integrated into the GFNA database. The persons along the path from the common ancestor to the tested persons have "computed" Y-DNA (green background) while the other have a "predicted" DNA (yellow background_ (with a error rate around 10% for 10 generations because of a possible adoption.
predicted (by female descendance only) :
Haplogroup : A10
Signature : A73G A235G A263G 315.1C C522- A523- C544T C16223T A16227C C16290T T16311C G16319A T16519C
Also, from release 2016a, possible signatures were added, so that more people will send to the French Heritage project their lines for validation or will be tested. Only the haplogroup is shown in these cases. Reliability ia based on the number of tested persons. In many cases, the pioneer can't be identified and the signature is not used.
presumed by convergence (by male descendance only) :
Haplogroup from SNP : R-M269>P312>Z220
Reliability : 90%
If the signature is obtained by triangulation, it
comes from 2 tested persons with a common ancestor (line
was tested and completely male or female except the tested
person if mtDNA). In some cases, the common ancestor is
unknown but the lineas share the same family name and the
same area of origin. The risk for an error is almost zero
(the pioneer could have adopted 2 children having the same
father for example).
If the signature is obtained by convergence, it
comes from 1 or more tested persons for which the line
wasn't validated to the pioneer. There is always a risk of
error, particularly if the descendant made an error during
the research or if there was an adoption or not-parental
Lets see that again, with more technical terms.
Markers form two categories, SNP (single-nucleotide
polymorphisms or changes at a specific point) and STR
(Short Tandem Repeats or number of repetitions of
The information obtained from the SNPs is summarized as
the haplogroup. Its name begins with a letter. These
groups are divided by adding alternately letters and
numbers to form sub-clades (nodal notation). To simplify
the presentation, we will talk about haplogroups only, but
do not be surprised by other terms such as subclad
found in the literature. As more and more DNA is analyzed,
some haplogroups were more numerous and therefore, for
purposes of presentation in charts, some haplogroups are
identified by a letter (A, B, C, etc..) and others by a
group of letters R1a, R1b, etc. These groups are then
divided using mutations in the genes. A mutation may be
rare (or slow) or common (or fast). Rare mutations are
used for first divisions of groups, then more frequent for
more divisions. In addition, a haplogroup is usually a
subdivision of another haplogroup.
For Y-DNA (male), haplogroup is often displayed without
detailing the SNP markers
Y haplogroups follow two conventions : by successive mutations (the more recent SNP) and by categories (nodal notation).
The Quebec's Jarret dit Beauregard have the haplogroup R1b1a2 (nodal notation). Over the centuries, there has been changes that have passed by haplogroup R1b and R1b1, R1b1a etc.. This chain of mutations shows that a group is derived from another, but the chain can become quite long. A panel of experts then turns these strings into categories. Our R1b1a2 is also called R-M269 (or haplogroupe R and mutation M269) and the signatures of the database containing the two forms.
The P group (DNA-Y) produced subgroups Q and R. R formed R1 and R2 and R1 gave R1a and R1b. So, there is also a hierarchy of haplogroups.
With mtDNA (women), the SNP markers are displayed but
there are also two conventions RSRS and CRS (or rCRS), the
CRS form being used in the database. On the FTDNA web
site, the form RSRS looks like A73G while CRS is like 73G,
but actually, the prefix is an option and no prefix
doesn't mean the CRS form is in use. DNA is formed by 4
basis summarized by pairs of letters A and T, C and G. The
prefix is the default value for that position. The
difference between A73A and A73G is that A73A is the
default value (never shown anyway) while A73G has a change
from A to G for the position 73. If you want to compare,
you must then be sure all data follows the same
On the other hand, it is not necessary to check all the
SNP markers to find the haplogroup. For example, the Geno2
test is based on about 3000 mt markers (they are not
identified) and the result is limited to markers that are
different from the reference values. If the reference
value is A73A (or 73A, the prefix is an option) and the
measured value is A, you won't see that value in the
results file. But if the measure is A73G, the file will
include the line 73 G G. Since there are 22 pairs of
chromosomes, there are 2 fields for the letter. For X and
Y chromosomes of men, the field is repeated, like for the
Timeline of haplogroups is computed from recent tests and
knowledge of history and human migrations. Datation is
thus approximated. Moreover, the sequence in which
mutations occured is not always obvious. Y-DNA subclades
are often identified by many possible mutations. Lets look
as an example to the chart of group R on http://www.isogg.org/tree/ISOGG_HapgrpR.html.
The mutation P310 is copied to the next line:
R1b1a2a1a - L151/PF6542, L52/PF6541, P310/PF6546/S129, P311/PF6545/S128
Haplogroup R1b1a2a1a can be identified by finding
one of the following mutations L151, PF6542, L52, PF6541,
P310, PF6546, S129, P311, PF6545, S128. It can be
summarized by R-L151, R-P310, etc. Note : the nodal
notation is not fixed. It can change when mutations are
moved from one line to another. In this project, it is
used to compare tests. On the other hand, nodal notation
is no more used in the GFNA database.
STR data is relatively independent from SNP data. There are two different measurements and it is theoretically possible that two people have the same STR markers but SNP data (or haplogroup) different (or actually not-compatible). That is why these two pieces of information are important when you want to compare signatures. STR markers (or Y-STR) can be found for the Y-DNA. Here is a sample result.
predicted (by male descendance only) :
Haplogroup : R1b1a2/R-M269
DYS393=14; DYS390=24; DYS19=14; DYS391=11; DYS385=11-15; DYS426=12; DYS388=12; DYS439=13; DYS389i=13; DYS392=13; DYS389ii=30;
Here are the data from 12 markers ("DYS385 = 11-15" means
DYS385a = 11 DYS385b and = 15, so 12 items). The number of
markers and their choice depends on the lab. The FamilyTreeDNA
site offers tests with 12, 25, 37, 67 and 111 markers.
Instead, AncestryDNA provided 33 and 46 markers (but this
product is discontinued) while Genebase has 20, 44, 67 and
91 markers. The ISOGG
web site compares some laboratories.
Sometimes, one can deduce the haplogroup from STR markers, but an error is possible. Actually, some laboratories don't measure any SNP and derive their results only from STR. Also, there are tests to compute directly a SNP.
Because there are adoptions and other NPE, it is always
possible that the markers found are not those of the more
distant known ancestor. This is why we try to triangulate
lineages, or to find two or even three descendants from
the common ancestor and having the same markers. The Most
Recent Common Ancestor (MRCA) is the first common male
ancestor (in a male line) and he have the same Y-DNA
signature as the two or three tested people if the markers
are identical. Same applies to mtDNA and female lineages.
Signatures are considered as reliable for the tested men, the MRCA and the lineages between the MRCA and the tested men. For all other persons in the family (brothers, cousins, uncles, etc.), there is risk someon was adopted. Reliability increases if there are more signatures and more lineages from sons of the MRCA. This also applies to female lineages and mtDNA.
In this database, the signatures are identified according to the triangulation.
The signatures in family sheets between the MRCA and those tested are identified as cation as Y-DNA computed or mtDNA computed. The signatures are propagated to other families where the same signature is predicted if there is no adoption or other NPE. They are then known as Y-DNA predicted or mtDNA predicted. Warning: this database has family sheets. The Y-DNA is that of the father (and his sons) and mtDNA, of the mother (and her children, but only transmitted by females to the next generation).
Most immigrants arrived alone in New France, Acadia or Louisiana. MRCA is this immigrant if he had several married son (Y-DNA) (or, for mtDNA, the female immigrant with many married daughters). If the immigrant has one married child, you must go down to the next generation or the one that follows, until we find enough children to triangulate. Conversely, if the brothers or sisters immigrated together, you can have the signature of the previous generation, although in some cases, the name of that person may be unknown. For example, the descendants of the sisters Langlois mtDNA have the same signature, but we do not know the name of the mother of Marguerite Langlois wife of Abraham Martin nor the mother (of course it is the same person) of her sister Françoise Langlois who married Pierre Desportes.
The haplogroup is often used to estimate the region of
origin of the lineage in question (i.e. 5,000 or 25,000
years ago). To understand what this means, we must know
how these haplogroups are created. First of all, the Y-DNA
haplogroups and mtDNA are independent.
map shows the Y-DNA haplogroups (at the top) and
ANDmt (at the bottom) in year 1500 (before the arrival of
Europeans in America). It shows the composition of the
population in some areas. For example, the Y-DNA in the
Cheyenne population (CY) in the center of the United
States, has approximately 75% A, 8% B and 17% C. In
Central Asia, the SL group contains 65% A, 17% R1a, etc..
In France, it is 55% of R1b, 25% I, 10% R1a, J 8% and 6%
E3b. So we can see that, on one side, a haplogroup is in
several places in varying proportions, and on the other
side, at each location, the local population is not
uniform. Moreover, these figures are for 2005 and are
estimated from graphs. Current figures may be different.
Also, the tests were conducted recently while these are
estimates for populations living 500 years ago. Thus, this
is a projection and not a direct measurement.
Concentrations are different for mtDNA. The Eastern
Native Americans have 40% A, 28% X, 25% C, etc.. The same
haplogroup A is also found in lesser amounts in the Uzbeks
of central Asia. The human trend being to simplify, many
confuse the facts. Thus, the haplogroup A is common among
the Indians but it is also found in Asia.
In this regard, let's talk about the controversy about
the mtDNA of the king's daughter Catherine Pillat, wife of
Pierre Charon (GFAN family sheet 812). Her haplogroup is
A10. We see the A is typical in Amerindian populations.
However, A is also found in Asia and this group includes
divisions derived from rare mutations. We have the A2
subgroups typically Native American, while A10 is common
in some populations of Central Asia, including some tribes
people who invaded Europe more than a millennium ago.
Catherine descends from these tribes and is not Native
American. On the FamilyTreeDNA site, a search for A10
shows only the descendants of Catherine, while A2 displays
several Amerindian (French_Heritage_DNA project) and other
groups A1 and A3 to A9 show no data. Mutations associated
to groups A10 and A2 are differentes (see
this table). From our research, Native American
haplogroups include, for Y-DNA, P-M45, C-P39
and Q1a, and for mtDNA, A2,
B2, C1 and D1 (X is not sure and
some searchers believe it is European).
The same markers are used to group families, but they do
not say where the link (common ancestor) is located. See
the following chart. Lets consider three people as an
example, i.e. Albert (A), Basile (B) and Charles (C). A
and B have a common ancestor (Abraham, born about 1600) 10
generations from them (Abraham had two sons, Adam and
Bertrand), and similar markers (red lines). C has the same
markers (green line) but his genealogical line is
different. Odds are good that the C line includes an
adoption (or a child born out of wedlock), the father
being from the same family as A and B. But, it is not
possible to know who is the biological father or when the
adoption occured. It is even possible that the common
ancestor is a cousin living before these families migrated
to New France !
It may be approximated by some clues. We can find that
Daniel (D), a cousin of C having the same marker as A and
B (green line) to find the generation where the adoption
occured, and Ernest (E), descendant from the adoptive
father (Étienne) having different markers (violet line).
At the point F, we conclude that the ancestor of A
(André), B (Bertrand) or their cousin (Henri) would be the
biological father of the ancestor of C and D (or Cédric),
while at the point G, documents say that Ernest, the
ancestor of E is the father of Cédric, the ancestor of C
To summarize, documents (the paper trail) tell us that :
Abraham is born around 1600 and had 2 sons, Adam and
Bertrand. Adam is the ancestor of Albert, and Bertrand the
ancestor of Basile. Abraham is also the ancestor of Henri.
Then, Étienne, born about 1650, arrived in the colony. He
would be the father of Cédric. Cédric is the ancestor of
Charles and Daniel. Étienne is the ancestor of Ernest.
DNA teaches us that actually, Étienne adopted Cédric or that Henri a traveler made a child to the wife of Étienne (it is usually not possible to know what happened). However, Étienne is the biological ancestor of Ernest (providing another descendant of Étienne was tested), while Abraham is the biological ancestor of Albert, Basile, Charles and Daniel. DNA won't tell who is Henri, if he is in line A or B or another line.
DNA tests are very popular among Americans. Documentary
evidences are less available than in Quebec and this helps
to explain some of its popularity in USA. In addition, the
anglicized names after immigration makes it more difficult
to find the ancestral line. The new name can be very
different from the old one. If it is obvious to see the
change from Fairfield to Deschamps (champ means
field) or Ashley to Dufresne (frêne means ash),
for example, but Deshan descends from Deschênes in Maine
and Sharray were Charest and Jarret before moving to
In other cases, a DNA test helps to find an error in the
evidences. For example, while working on the Catherine
Pillat case, there was another pioneer with the same
unexpected haplogroup A10. After this extended study, it
was found that the actual ancestor was also the same
Most signatures come from
of the French
Heritage DNA project
that cumulates DNA signatures for the last 10 years to
achieve triangulations to validate signatures included in
the database of the Genealogy of French in North America
are from the projects ADNy
Québec and ADNmt
Québec associated to the FAFQ, the Mothers
of Acadia project, and from other
projects. The author would appreciate to get
access to other projects about the French in
America so as to help to solve dead ends and to
improve the quality of data in the GFNA
database. In some cases, data are not from a
project, but from matches to someone in a
project, when the participant has no forwarded
his/her male or female line. When
validating your ancestry in the light of the predictions
made about your DNA based on GFAN data, consider sending
me a copy of your line at DNA@francogene.com !