800: Allow declaring that multiple Entities should be created/updated#818
800: Allow declaring that multiple Entities should be created/updated#818lindsay-stevens wants to merge 46 commits intoXLSForm:masterfrom
Conversation
- survey_element.py: add "type" (if available) to the repr for debugging - entity_declaration.py: add missing "type" field which appears in the dict prepared by entity_parsing.py, and in general set that type argument by default - survey.py: similar to the above add type field to init signature, and remove unicode/repr in favour of survey_element implementation
- works the same but could allow builder.py to be compatible with Survey/SurveyElement tree as well as the workbook_to_json output dict
- allowed to put entities in repeats (v1) - allowed to declare more than one entity (v2) - pyxform places the entity instead of user-specified (v2)
- previous equality check would only emit for a direct repeat parent - also nest under `if` to avoid potentially adding `None` to children
c28cb5f to
2348363
Compare
- NAMES_011 is about period in the name - NAMES_012 is about reserved words name/label
- test plan not fully implemented yet but the idea is to do a chunk of it to check that the approach seems sensible and usable in practice
- old: test_entities_columns__multiple_unexpected - new: test_unexpected_column__multiple__error
- old: test_entities_columns__one_unexpected - new: test_unexpected_column__single__error
- old: test_dataset_with_reserved_prefix__errors - new: test_dataset_name__reserved_prefix__error
- old: test_dataset_with_period_in_name__errors - new: test_dataset_name__period__error
- old: test_dataset_with_invalid_xml_name__errors - new: test_dataset_name__xml_identifier__error
- old: test_worksheet_name_close_to_entities__produces_warning - new: test_sheet_name_misspelling__warning
- old: test_saveto_without_entities_sheet__errors - new: test_no_entity_declarations__error
- old: test_reference_name_not_found__error - new: test_unresolved_variable_reference__error
- needed to refactor/move workbook_to_json survey type parsing because previous saveto check would raise a false positive error if the type contained "group" or "repeat" but that can occur for selects etc, so control type parsing is moved earlier so that result can be used, and so it's possible to catch a save_to in an end_group / end_repeat.
- converted `{dataset_name: list[ReferenceSource]}` to an object:
- used in many functions so it's now clearer what that object is
- allows it can carry the row_number for error messages
- may make sense to move some function(s) to it as class methods
- test cases are somewhat light since the intention is to cover the
solver behaviour more thoroughly in the topology test suite
- would be caught by names validation later anyway, but that names error is not particularly clear for this problem (which seems plausibly common) because the name check happens after the delimiter split.
- not explicitly tested before but existing tests alternate usages - swapped tests in test_entities.py over to list_name since that's the current/documented alias
- also add some requirements tags to match the xpath assertions
- the docstring usage of google-style "attributes" is required in this case since a namedtuple defines properties at class level not init. - linting
- variable references can float between containers in a scope boundary
but save_tos must be in the nearest container
- combined get_container_scopes and get_allocation_requests into
EntityReferences.get_allocation_request since these are two parts of
the same process and operate on single EntityReferences instance.
- also makes it a bit clearer what allocate_entities_to_containers is
doing to group the allocation requests by scope boundary
- updated test assertions using ENTITY_011/012 since this new func
will flag the first/highest reference as the error rather than the
lowest. The higher node is perhaps more incorrect since looking
down the tree results in multiple nodes. But then again users tend
to write forms top to bottom so either could be preferable.
- changed ReferenceSource.get_scope_boundary default to the /survey
since that is always the default scope boundary (rather than empty).
- deleted ENTITY_010 error since it's now covered by ENTITY_011 and
the latter more clearly shows both scopes.
- previously it was constrained to the nearest container, but the spec says "nearest ancestor container that has an Entity declaration" - added tests for topology error scenarios involving save_to and/or vars - changed requests.sort to sorted() since the sort result is only needed for the allocation loop.
- fold allocate_entities_to_paths into allocate_entities_to_containers
since it doesn't get used elsewhere and relies on the external
all_allocations dict anyway
- also replace somewhat confusing `while` loop that just is based on
ints with hopefully clearer range loop over requested_path
- replace get_allocation_request repeated max/min calls with equivalent
single loop over the references, clearer and avoids repeated loops
|
This is feeling really magical, I must say! I think users will find it "just works." I'm starting to get myself familiar with the code and coming up with more interesting forms to see how they match up with your tests. Let me know if there's anything I should pay particular attention to. |
|
@ln thanks! I think something that would be good to review initially is the traceability test suite plan at the top of Another thing that would be good to review early is the |
- as described in XLSForm#819 if an entity_id is defined then the default setvalue node which targets the entity_id should not be emitted. - included cases that show behaviour when a dynamic default for the referenced entity_id is included i.e. the setvalue for the default targets the question rather than the entity_id attribute - as discussed on forum thread 55163, in Collect this duplicate setvalue/bind sees the bind "win" whereas in Webforms the setvalue "wins" so it is a functionality problem as well as being redundant. - added / moved test cases for implicit update/upsert modes - many of the tests that are still in the old modules include assertions for save_to which is not part of test_entities yet - added requirement and assertion for the instance csv since while that instance is essential for updating, it's not currently intended to be emitted by default - added requirements for not emitting namespace/version since these were covered in previous test cases
- L560: the allocations are derived from the entity declarations so
the dataset_name should always be found
- L567: it seems like the has_repeat_ancestor flag is sufficient
- L568: the id_attr is added by default on L286 and the actions attr
defaults to an empty list so it's safe to assume both exist
- to help keep the default function in sync though, the action value
for the repeat setvalue is copied from the top-level action
- the top-level action is only added if there is no entity_id L234.
- L424: the container path was being calculated from the stack for each
row, but it is a property of each stack entry which can be calculated
in advance.
- added a ContainerPath class to represent the collection of
container nodes used in many places, and used it to calculate
the container path for each stack entry when adding to the stack
- also the path can assume the previous stack entry has a container
path as well, so that value is used instead of re-iterating
- L696: deleted branch since it's not reachable - if there are multiple entity declarations, then the code would go through L690 -> L691
| | | end_group | | | | | ||
| | | end_repeat | | | | | ||
|
|
||
| | entities | |
There was a problem hiding this comment.
🤯 You really went for the most diabolical cases!
| - dataset_2: xpath - where the meta/entity block should be for requestor_2 | ||
|
|
||
|
|
||
| *Stress test* |
There was a problem hiding this comment.
Wow, nice one. That's a very big survey, if things are working for that, we're in great shape.
There was a problem hiding this comment.
(I think) I'm most of the way through the traceability suite, but this stress test is part of the solver suite, so I'm not sure it passes yet, but then again the expectation for the stress test is pretty loose.
|
Thanks for the guidance on how to approach early review. I've looked at tests and messages Edit: these do work!! I've updated them with some minor unrelated fixes that made conversion failed. I must have forgotten to install. |
It may not have worked earlier but using the latest commit 6b525bb, conceptually this is OK. There's a test |
- add a test for the error that odk_validate raises when the instance referenced in update/upsert mode isn't there (EB019) - update other such tests to use csv-external so that the instance is present - not a core part of the test as such, but required for update/upsert to work without an odk_validate error
- depth 1: survey/group - depth 2: repeat/group - depth 3: repeat/group + var outside repeat
|
That’s amazing to read, thank you! I wonder whether I forgot to rebuild? I’ll try to get to the bottom of it and keep working through the test cases. 🚀 |
- previously, not adding any references to or from an entity would
result in them being silently ignored, but now such entities are
considered as targeting the survey
- allocation currently only considers ancestors so only one such
unreferenced entity can exist in the form, i.e. even if there are
spare valid groups the entities won't be pushed down into them
- removed EB011 since it is covered by EB010 in that the row number is
what makes it stable by breaking ties using a distinct value
- locally the @baseVersion seemed to reliably be the one in the error message, but on GH Actions it seems to prefer @branchId, so the test assertion uses generic message fragments, which should still be selective enough for this error and not match others
fe4457b to
6eb21a2
Compare
|
Confirmed those three work and updated the source above to fix the issues you mentioned. So good! I'm continuing my review looking specifically at risk to existing functionality we may not have explicitly tested and common patterns we expect. I think we should merge a version sooner than later so QA and the broader community can start pushing on it. |
- combinations of: - create, update, multiple properties, multiple entities - survey, group, repeat, repeat/group, group/repeat - delete tests covered by these newer ones
- previously did not handle a blank list_name / dataset where other columns had a value, but now there is an error for that - the dataset_name checks assumed only one entity is possible so had hardcoded row=2 in all errors, but now it uses the actual row number
- should help keep what is parsed in sync with current entity cols - still need outer get_entity_variable_references to index on question_name to avoid having to do nested iteration on every row
- fixes odk_validate failures
|
Maybe it's from looking at it for so long, but this PR seems to be in decent enough shape for merging. The remaining work is mainly the solver suite which is searching for edge cases. |
Closes #800
Closes #819
Why is this the best possible solution? Were any other approaches considered?
Initial implementation that passes existing relevant tests and a handful of new ones but still needs:
As noted in below comments, the index to this PR is the docstring at the top of
test_entities.pywhich assigns a short code to each requirement from the XForm spec, for validations, and for general behaviour. These short codes are then used to tag the relevant test cases.What are the regression risks?
New feature but main regression risk is breaking forms for the old entity specs
Does this change require updates to documentation? If so, please file an issue here and include the link below.
Yes, there's a spec doc update already but XLSForm side not documented yet
Before submitting this PR, please make sure you have:
testspython -m unittestand verified all tests passruff format pyxform testsandruff check pyxform teststo lint code